Integrations
Breadcrumbs

GCP BigQuery

Integration model

Metadata extraction


For metadata extraction, a BigQuery connection is used to access the definition of structures.

The plugin extracts the following attributes which must have the same names in the attribute_definition table, field name, so that they appear in the template:

  • catalog with the catalog value in the database

  • schema with the schema value in the database

  • physicalName and name with the same value, the table name

  • path with the concatenation of catalog, schema and table values

  • infrastructure with the selected value

  • technology with the selected value

  • zone with the selected value

  • tags are the view-level labels that the tables have.


It also sends the following attributes related to the fields of the requested resource:

  • name and physicalName with the field value

  • defaultValue with the default value defined for the field

  • fieldDataType with the data type defined for the field

  • length with the field size

  • incrementalField indicating whether it is an incremental field

  • position position of the field

  • precision with the precision value of the field

  • nullable indicating whether the field is nullable

  • pk indicating whether the field is a pk

  • description with the value for the field

  • tags are the column-level labels that the tables have.


The attributes to be created in Anjana must have the following types:

Attribute name

Attribute type

catalog

INPUT_TEXT

schema

INPUT_TEXT

physicalName

INPUT_TEXT

path

INPUT_TEXT

infrastructure

SELECT

technology

SELECT

zone

SELECT

tags

ARRAY_ALPHANUMERICAL

name

INPUT_TEXT

defaultValue

INPUT_TEXT

fieldDataType

INPUT_TEXT

length

INPUT_NUMBER

incrementalField

INPUT_CHECKBOX

position

INPUT_NUMBER

precision

INPUT_NUMBER

nullable

INPUT_CHECKBOX

pk

INPUT_CHECKBOX

description

ENRICHED_TEXT_AREA_INTERNATIONAL


Data sampling

Using a BigQuery connection with the configured credential, a query with a record limit is executed on the fields inventoried in Anjana Data, in which, additionally, the values of sensitive fields are replaced by the configured string (asterisks by default).

Fields that are modified after creating the object in Anjana (i.e., that are defined in the metadata but have not been incorporated into the physical structure) will appear as unavailable in the sampling.

Active governance

Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs.

This plugin will associate those custom roles with users and table-level access conditions following the manufacturer's recommendation: https://cloud.google.com/bigquery/docs/table-access-controls#api

Object editing

The plugin allows managing the activation or deactivation of non-native entities included in DSAs, so that when a non-native entity is activated the corresponding permissions will be granted on the tables and when it is deactivated the permissions will be removed.

Required credentials

The required credentials must be configured in the yaml file in the "credentialsContent" section of each configured instance.

Service account creation

For GCP it is necessary to create a service account in IAM for each plugin individually and, after that, assign the necessary permissions for the execution of the specific tasks of each plugin.


att_1_for_171999410.png


To customize permissions appropriately, it is necessary to create custom roles that encompass the permissions which are then associated with the service accounts.


att_4_for_171999410.png

Metadata extraction

The permissions used are the following:

  • bigquery.datasets.get

  • bigquery.tables.get

  • bigquery.tables.list


Data sampling

The permissions used are the following:

  • bigquery.datasets.get

  • bigquery.tables.get

  • bigquery.tables.getData

  • bigquery.tables.list

  • bigquery.jobs.create


Active governance

Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs. The permissions this plugin needs to carry out active governance are the following:

  • bigquery.datasets.get

  • bigquery.tables.get

  • bigquery.tables.getIamPolicy

  • bigquery.tables.setIamPolicy


In summary, the permissions used for the custom role are the following:

  • bigquery.datasets.get

  • bigquery.tables.get

  • bigquery.tables.getData

  • bigquery.tables.list

  • bigquery.jobs.create

  • bigquery.tables.getIamPolicy

  • bigquery.tables.setIamPolicy


att_2_for_171999410.png


To assign permissions to the BigQuery service account, it is necessary to assign the role with the permissions to the user:

att_5_for_171999410.png

It should be noted that Anjana Data only handles granting access to BigQuery assets governed in Anjana Data Platform. To be able to run queries on them, nominal users need to have a series of prior permissions, which Anjana Data does not manage:

  • bigquery.jobs.create

  • bigquery.datasets.get

  • bigquery.jobs.list

  • bigquery.models.list

  • bigquery.tables.list

  • resourcemanager.projects.get

Object editing

The permissions this plugin needs to carry out the activation or deactivation of a non-native entity are the following:

  • bigquery.datasets.get

  • bigquery.tables.get

  • bigquery.tables.getIamPolicy

  • bigquery.tables.setIamPolicy


To be able to run queries on the resources, regular users must previously have the following permissions:

  • bigquery.jobs.create

  • bigquery.datasets.get

  • bigquery.jobs.list

  • bigquery.models.list

  • bigquery.tables.list

  • resourcemanager.projects.get

BigQuery limitations

The maximum number of bindings for users on a table is 1500, which means that Anjana can have a maximum of 1500 users between owners and adherents in DSAs containing a particular table.