Mohmine

I recently encountered a permission error Insufficient Lake Formation permission(s): Required ASSOCIATE on TagKey and it highlighted how crucial the ASSOCIATE permission is for data governance in AWS Lake Formation. This permission is a fundamental building block for anyone managing data in a data lake.

What is the ASSOCIATE Permission?

The ASSOCIATE permission is the key that lets you link LF-tags to your data resources in the AWS Glue Data Catalog. Think of LF-tags as labels (e.g., data-classification: confidential or project: finance) that define your data's characteristics and are used to enforce security policies.

This permission grants you the right to:

Tag Glue databases, tables, and columns with these labels.
Implement data governance policies by correctly classifying data.
Automate the tagging process.

Without this permission, your automated workflows cannot properly classify new data, which breaks the entire governance chain.

Granting ASSOCIATE: Console vs. Code

You can grant this permission in two main ways, depending on your environment's needs for automation and reproducibility.

1. The AWS Console (Manual)

This method is ideal for quick fixes or for managing permissions in a non-automated environment. It's a straightforward process:

Go to the Lake Formation console.
Under Permissions, select LF-Tags.
Choose the tag you need to manage.
Click Grant, select the principal (user or role), and check the ASSOCIATE permission box.

2. Infrastructure as Code (Automated)

This is the recommended approach for any production environment. By defining permissions in code with frameworks like AWS CDK, or CloudFormation you ensure your security posture is consistent, repeatable, and version-controlled. This is especially useful when granting the permission to an IAM role used by a service (like a Lambda function) to automatically tag new ressource.

Here's an example using AWS CDK in Python to grant the ASSOCIATE permission to a data pipeline's IAM role:

import aws_cdk as cdk
from aws_cdk import (
    aws_iam as iam,
    aws_lakeformation as lakeformation,
)

class DataGovernanceStack(cdk.Stack):
    def __init__(self, scope: cdk.App, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # Create an IAM role that a data pipeline (e.g., a Lambda function) would assume
        data_pipeline_role = iam.Role(
            self, "DataPipelineRole",
            assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
        )

        # Grant the 'ASSOCIATE' permission to the data pipeline role for a specific LF-tag
        lakeformation.CfnPrincipalPermissions(
            self,
            id="LFTagAssociatePermission",
            principal=lakeformation.CfnPrincipalPermissions.DataLakePrincipalProperty(
                data_lake_principal_identifier=data_pipeline_role.role_arn,
            ),
            permissions=["ASSOCIATE"],
            resource=lakeformation.CfnPrincipalPermissions.ResourceProperty(
                # The resource on which permissions are being granted is the LF-tag itself
                lf_tag=lakeformation.CfnPrincipalPermissions.LFTagKeyResourceProperty(
                    catalog_id=self.account,
                    tag_key="data-classification",
                    tag_values=["confidential", "public"]
                )
            ),
        )

By defining permissions in code, you ensure that data governance policies are not just a set of rules, but an automated part of your infrastructure. This guarantees, for example, that Glue databases and tables are consistently tagged from the moment they are created, a crucial step for maintaining a secure and compliant environment.

TIL #5: A Look into ASSOCIATE permission for your AWS Data Lake

What is the ASSOCIATE Permission?

Granting ASSOCIATE: Console vs. Code