GCP BigQuery | Anjana Data Documentación

Integration model

Metadata extraction

For metadata extraction, a BigQuery connection is used to access the definition of structures.

The plugin extracts the following attributes which must have the same names in the attribute_definition table, field name, so that they appear in the template:

catalog with the catalog value in the database
schema with the schema value in the database
physicalName and name with the same value, the table name
path with the concatenation of catalog, schema and table values
infrastructure with the selected value
technology with the selected value
zone with the selected value
tags are the view-level labels that the tables have.

It also sends the following attributes related to the fields of the requested resource:

name and physicalName with the field value
defaultValue with the default value defined for the field
fieldDataType with the data type defined for the field
length with the field size
incrementalField indicating whether it is an incremental field
position position of the field
precision with the precision value of the field
nullable indicating whether the field is nullable
pk indicating whether the field is a pk
description with the value for the field
tags are the column-level labels that the tables have.

The attributes to be created in Anjana must have the following types:

Attribute name	Attribute type
catalog	INPUT_TEXT
schema	INPUT_TEXT
physicalName	INPUT_TEXT
path	INPUT_TEXT
infrastructure	SELECT
technology	SELECT
zone	SELECT
tags	ARRAY_ALPHANUMERICAL
name	INPUT_TEXT
defaultValue	INPUT_TEXT
fieldDataType	INPUT_TEXT
length	INPUT_NUMBER
incrementalField	INPUT_CHECKBOX
position	INPUT_NUMBER
precision	INPUT_NUMBER
nullable	INPUT_CHECKBOX
pk	INPUT_CHECKBOX
description	ENRICHED_TEXT_AREA_INTERNATIONAL

Data sampling

Using a BigQuery connection with the configured credential, a query with a record limit is executed on the fields inventoried in Anjana Data, in which, additionally, the values of sensitive fields are replaced by the configured string (asterisks by default).

Fields that are modified after creating the object in Anjana (i.e., that are defined in the metadata but have not been incorporated into the physical structure) will appear as unavailable in the sampling.

Active governance

Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs.

This plugin will associate those custom roles with users and table-level access conditions following the manufacturer's recommendation: https://cloud.google.com/bigquery/docs/table-access-controls#api

Object editing

The plugin allows managing the activation or deactivation of non-native entities included in DSAs, so that when a non-native entity is activated the corresponding permissions will be granted on the tables and when it is deactivated the permissions will be removed.

Required credentials

The required credentials must be configured in the yaml file in the "credentialsContent" section of each configured instance.

Service account creation

For GCP it is necessary to create a service account in IAM for each plugin individually and, after that, assign the necessary permissions for the execution of the specific tasks of each plugin.

To customize permissions appropriately, it is necessary to create custom roles that encompass the permissions which are then associated with the service accounts.

Metadata extraction

The permissions used are the following:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.list

Data sampling

The permissions used are the following:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.create

Active governance

Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs. The permissions this plugin needs to carry out active governance are the following:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getIamPolicy
bigquery.tables.setIamPolicy

In summary, the permissions used for the custom role are the following:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.create
bigquery.tables.getIamPolicy
bigquery.tables.setIamPolicy

To assign permissions to the BigQuery service account, it is necessary to assign the role with the permissions to the user:

It should be noted that Anjana Data only handles granting access to BigQuery assets governed in Anjana Data Platform. To be able to run queries on them, nominal users need to have a series of prior permissions, which Anjana Data does not manage:

bigquery.jobs.create
bigquery.datasets.get
bigquery.jobs.list
bigquery.models.list
bigquery.tables.list
resourcemanager.projects.get

Object editing

The permissions this plugin needs to carry out the activation or deactivation of a non-native entity are the following:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getIamPolicy
bigquery.tables.setIamPolicy

To be able to run queries on the resources, regular users must previously have the following permissions:

bigquery.jobs.create
bigquery.datasets.get
bigquery.jobs.list
bigquery.models.list
bigquery.tables.list
resourcemanager.projects.get

BigQuery limitations

The maximum number of bindings for users on a table is 1500, which means that Anjana can have a maximum of 1500 users between owners and adherents in DSAs containing a particular table.