Integration model
Metadata extraction
For metadata extraction, a BigQuery connection is used to access the definition of structures.
The plugin extracts the following attributes which must have the same names in the attribute_definition table, field name, so that they appear in the template:
-
catalog with the catalog value in the database
-
schema with the schema value in the database
-
physicalName and name with the same value, the table name
-
path with the concatenation of catalog, schema and table values
-
infrastructure with the selected value
-
technology with the selected value
-
zone with the selected value
-
tags are the view-level labels that the tables have.
It also sends the following attributes related to the fields of the requested resource:
-
name and physicalName with the field value
-
defaultValue with the default value defined for the field
-
fieldDataType with the data type defined for the field
-
length with the field size
-
incrementalField indicating whether it is an incremental field
-
position position of the field
-
precision with the precision value of the field
-
nullable indicating whether the field is nullable
-
pk indicating whether the field is a pk
-
description with the value for the field
-
tags are the column-level labels that the tables have.
The attributes to be created in Anjana must have the following types:
|
Attribute name |
Attribute type |
|
catalog |
INPUT_TEXT |
|
schema |
INPUT_TEXT |
|
physicalName |
INPUT_TEXT |
|
path |
INPUT_TEXT |
|
infrastructure |
SELECT |
|
technology |
SELECT |
|
zone |
SELECT |
|
tags |
ARRAY_ALPHANUMERICAL |
|
name |
INPUT_TEXT |
|
defaultValue |
INPUT_TEXT |
|
fieldDataType |
INPUT_TEXT |
|
length |
INPUT_NUMBER |
|
incrementalField |
INPUT_CHECKBOX |
|
position |
INPUT_NUMBER |
|
precision |
INPUT_NUMBER |
|
nullable |
INPUT_CHECKBOX |
|
pk |
INPUT_CHECKBOX |
|
description |
ENRICHED_TEXT_AREA_INTERNATIONAL |
Data sampling
Using a BigQuery connection with the configured credential, a query with a record limit is executed on the fields inventoried in Anjana Data, in which, additionally, the values of sensitive fields are replaced by the configured string (asterisks by default).
Fields that are modified after creating the object in Anjana (i.e., that are defined in the metadata but have not been incorporated into the physical structure) will appear as unavailable in the sampling.
Active governance
Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs.
This plugin will associate those custom roles with users and table-level access conditions following the manufacturer's recommendation: https://cloud.google.com/bigquery/docs/table-access-controls#api
Object editing
The plugin allows managing the activation or deactivation of non-native entities included in DSAs, so that when a non-native entity is activated the corresponding permissions will be granted on the tables and when it is deactivated the permissions will be removed.
Required credentials
The required credentials must be configured in the yaml file in the "credentialsContent" section of each configured instance.
Service account creation
For GCP it is necessary to create a service account in IAM for each plugin individually and, after that, assign the necessary permissions for the execution of the specific tasks of each plugin.
To customize permissions appropriately, it is necessary to create custom roles that encompass the permissions which are then associated with the service accounts.
Metadata extraction
The permissions used are the following:
-
bigquery.datasets.get
-
bigquery.tables.get
-
bigquery.tables.list
Data sampling
The permissions used are the following:
-
bigquery.datasets.get
-
bigquery.tables.get
-
bigquery.tables.getData
-
bigquery.tables.list
-
bigquery.jobs.create
Active governance
Access management requires the "Tot plugin GCP IAM" plugin to generate the custom roles (functions) representing the DSAs. The permissions this plugin needs to carry out active governance are the following:
-
bigquery.datasets.get
-
bigquery.tables.get
-
bigquery.tables.getIamPolicy
-
bigquery.tables.setIamPolicy
In summary, the permissions used for the custom role are the following:
-
bigquery.datasets.get
-
bigquery.tables.get
-
bigquery.tables.getData
-
bigquery.tables.list
-
bigquery.jobs.create
-
bigquery.tables.getIamPolicy
-
bigquery.tables.setIamPolicy
To assign permissions to the BigQuery service account, it is necessary to assign the role with the permissions to the user:
It should be noted that Anjana Data only handles granting access to BigQuery assets governed in Anjana Data Platform. To be able to run queries on them, nominal users need to have a series of prior permissions, which Anjana Data does not manage:
-
bigquery.jobs.create
-
bigquery.datasets.get
-
bigquery.jobs.list
-
bigquery.models.list
-
bigquery.tables.list
-
resourcemanager.projects.get
Object editing
The permissions this plugin needs to carry out the activation or deactivation of a non-native entity are the following:
-
bigquery.datasets.get
-
bigquery.tables.get
-
bigquery.tables.getIamPolicy
-
bigquery.tables.setIamPolicy
To be able to run queries on the resources, regular users must previously have the following permissions:
-
bigquery.jobs.create
-
bigquery.datasets.get
-
bigquery.jobs.list
-
bigquery.models.list
-
bigquery.tables.list
-
resourcemanager.projects.get
BigQuery limitations
The maximum number of bindings for users on a table is 1500, which means that Anjana can have a maximum of 1500 users between owners and adherents in DSAs containing a particular table.