Basic Concepts | Anjana Data Documentación

This section describes the basic concepts that users can work with in Anjana Data. However, you can find a Glossary at the end of this manual with the main definitions.

Data Domain, Organizational Unit or Business Unit

It is the mechanism by which data custody is established within the Business Glossary or Data Catalog, allowing the reality of the organization to be represented in terms of functional data domains or semantics.

It is important not to confuse the organizational hierarchy or org chart of an organization with data domains. The hierarchical structure of organizations changes constantly and, on the contrary, data domains should be fairly static.

Data assets are classified into the different Organizational Units (or data domains) that have been identified in the organization. In this way, users who hold a role in an organizational unit are responsible for the data assets that belong to their unit/domain.

https://lh7-rt.googleusercontent.com/docsz/AD_4nXceD-fUODiNMdyqqhJ7Ur7Sj80RSE8vNk5xAneC46tSgTmApaz5mP3LB2Hrp3HaXDlf3vPv7t57UzmsZCsa4xQqpIX_Mm3sxC5wQZ2kWrUuXQLet07TE834sV8YfAOqQm_4KNMh1TvEu7ZQaAf_8Dksq_MX?key=eE4OxRa9KEXEmq0Gh5OpzA

Anjana Metamodel

The set of entities and relationships that can be governed from the Anjana Data Platform is what is known as the Metamodel. This metamodel is fully configurable, meaning the organization must undertake the exercise of defining a governance strategy in which, among other things, it defines which entities it wants to govern and what types of relationships it wants to establish between the different entity types to offer an end-to-end view of the data.

Entity

It is the representation of any data element within the organization's semantic map. Depending on the nature of the entities, they are classified as Business Glossary entities (in the case that they are more related to Business) or Data Catalog entities (more related to IT).

In Anjana Data it is possible to create as many entity types as desired.

Some examples of entities can be business terms, metrics, dimensions, reports, business processes or data quality rules.

https://lh7-rt.googleusercontent.com/docsz/AD_4nXdqxqFb14eTz91okLEDRew6_AWqjg0rJhzDyRCBUwe6LRkA32E2a2e18dMqb5hvYI9XXMM6Db6z6lSvhGDQ1yks6kMsbTddHjFh0w6yxLZioy9minUlUf3cI5DBIxuZLMLD6895UjYvRi3xSbAd1za_YS-N?key=eE4OxRa9KEXEmq0Gh5OpzA

Additionally, it is also possible to define entities that represent physical assets such as datasets, processes or DB schemas.

https://lh7-rt.googleusercontent.com/docsz/AD_4nXehx8D220ZpbarW68iSdWC2s8seGUCzrGLIv-Xyl0bF9l9GSrATWrQqbOcL8Ik47K0Lby-CNK2W6Wfc24WJLr5v410bIL9DpigpSXAkpZPBUiPHpt99ORHjBljHY6fjs5rlNDwbkbRVTSmAl3hXlGDU-5w?key=eE4OxRa9KEXEmq0Gh5OpzA

Some of these entities are native to Anjana and the application carries out certain logic on them, specified further ahead:

Dataset and dataset_fields
DSA
Process and process instance
Solution

Relationship

It is the mechanism provided by Anjana Data to establish a link between entities that are somehow connected, whether by association, involvement, membership, etc. For example, relationships between business terms that allow calculating a metric, or the data quality rules that apply to a specific report.

Through relationships it is possible to establish the end-to-end data lifecycle, that is, it is possible to govern technical lineage (how data flows through the different IT systems and what transformations it undergoes), how data is used for analytical purposes, the semantics associated with the data, where quality checkpoints are applied to verify data conformity, or even which users are consuming certain data.

Relationships allow linking entities within the Business Glossary to each other, but also relating Business Glossary entities to Data Catalog entities to locate, for example, where a particular business term is stored.

Just as with entities, in Anjana it is possible to configure as many relationship types as needed, which will coexist with the set of native (or internal) relationships of the application. These native relationships associate native entities with each other and are not created through the object creation wizard but through functionalities specific to these entities. These relationships are:

STRUCTURE: Relationship between a dataset and its dataset_fields
DSA_CONTENT: Relationship between a DSA and the entities it contains
INSTANCE_PROCESS: Relationship between a process and its instances
INSTANCE_DATASET_IN: Relationship between an instance and its input datasets
INSTANCE_DATASET_OUT: Relationship between an instance and its output datasets
SOLUTION_RELATED_INSTANCE: Relationship between a solution and its related instances
SOLUTION_OWNED_INSTANCE: Relationship between a solution and its own instances

Data Catalog Metamodel Objects

Despite the flexibility of the Anjana Data metamodel and the ability, therefore, to create as many entities as desired to represent, the Data Catalog metamodel is based on the following main objects: datasets (and their dataset_fields), DSAs, processes (and their instances) and solutions along with their relationships, metadata and lineage.

Dataset

It is any physical asset that contains or represents data, both structured and unstructured, persisted or non-persisted. It can be a file, a table, a document, of any type and format.

It is the main object to govern within the Data Catalog. What differentiates the Catalog from any Data Dictionary is that it can be enriched with technical and functional metadata.

Dataset_field

In the case that the dataset is structured, it is composed of a set of dataset_fields. Each dataset_field is a field or column of the dataset that contains its own metadata.

It is a logical asset that can include one or more entities to facilitate the grouping of physical data assets at a level closer to information consumption.

It is the mechanism by which access to governed data is facilitated and allows data to be shared between providers and consumers through the signing of a contract by which the user commits to complying with the conditions of data use.

It is enriched with business metadata.

Process

It corresponds to groups of functions defined to extract data from sources, move them, transport them, transform them, exploit them and/or generate new data (ETLs, transformation scripts, quality control executions, report generation…).

Process Instance

A process can have one or more instances depending on the different execution scenarios based on possible configurations it can take, in order to execute software modules developed in different situations or platforms.

An instance is a specific execution of a process with a parameterization and a set of input and output datasets. It allows the definition of framework pieces without needing to register a process for each one of them. Once defined, each execution is auditable.

Solution

The solution represents the “data contract” that authorizes the movement of data through processes. In this way, it is guaranteed that authorizations exist for the movement of data between systems or applications. Therefore, it is a logical asset that encompasses several process instances along with their related datasets to have an end-to-end view of the executions.

A solution has a person responsible for its administration and maintenance who guarantees its execution.

The solution metadata can be enriched with business metadata.

The difference between solutions and DSAs is that solutions enable consumption by IT processes and applications while DSAs govern consumption by consumer users.

Object Lifecycle

The native objects of Anjana Data have their own lifecycle, so that governed objects go through different states that allow everything from traditional passive governance to active governance with impact management.

The lifecycle through which objects pass in Anjana is presented below.

Lifecycle of Anjana Native Entities

Anjana's native entities have a different lifecycle from non-native entities. This section presents the states that a native entity can go through:

https://lh7-rt.googleusercontent.com/docsz/AD_4nXfpzmnFHKzdQbwBRQfRcEbiD57UWNOUAu7hajHdH0Ca-8UkwlfFiaBx5YmvGsQ2O6oesoU_9oPQWReiLCl_wxlqSLAn6ZpiDEbb2nAbo34YH0SmT-8FPivu6PRccMm2SA3iXlXJbw?key=eE4OxRa9KEXEmq0Gh5OpzA

Imported: Entities created using Automatic Metadata Extraction remain in the Imported state.
Draft: If an Imported entity is modified to fill in its data, it moves to the Draft state.

If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a native entity is created in the Draft state.

If an Approved entity is modified, a new version is created with the Draft state.

If a Rejected entity is modified, it moves to the Draft state.

Pending: Once the entity is submitted for validation, it moves to the Pending state.
Approved: If all validators approve an entity, it moves to the Approved state. If it is edited again, an entity with those changes is generated in the Draft state.

An entity with the Deactivated state can return to Approved after its activation.

Rejected: If any validator rejects the validation of an entity, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
Deprecated: If significant modifications are made to a native entity (enough for versioning to occur), once the new version of the entity is approved, the previous version automatically moves to the Deprecated state.
- The rules that determine which changes generate a new version are fully configurable, as will be seen later.

Additionally, it is possible to deprecate an entity with the aim of indicating that, after a period of time, it will expire.

Expired: Once the expiration date set on a native entity is reached, it automatically moves to the Expired state.

Versioning

The versioning of assets has two fundamental objectives:

To have an impact control mechanism from the point of view of interoperability, quality, security, data protection, etc.
To have a version history with the most significant changes.

The organization establishes which changes in the metadata templates of Anjana's native entities should generate a new version of it.

It is possible to configure any of the following cases:

That a new version of an object is generated when any change occurs in significant attributes of an object's template.
- That a DSA is versioned when the entities it provides access to change.
- That an instance is versioned when the input or output datasets of the instance change.
- That a solution is versioned when its related instances change.
That versioning occurs when an attribute acquires a specific value.
That a new version of the dataset is generated when a specific attribute of a dataset field changes or a dataset field is added or removed.
That a deprecated or expired entity for which there is no more current approved version is edited.

Deprecation

The deprecation of Anjana's native entities triggers a series of changes in the objects and the sending of notifications to users related to them.

DATASET:
- Its dataset_fields will be deprecated
- The DSAs in which it is included will NOT be deprecated
- The instances it is associated with will NOT be deprecated
- A notification will be sent to the dataset owners
- A notification will be sent to DSA subscribers and their owners
- A notification will be sent to the owners of related instances
DSA:
- A notification will be sent to DSA owners
- A notification will be sent to the owners of the entities the DSA contains
- A notification will be sent to DSA subscribers
PROCESS:
- Instances associated with the process will be deprecated
- A notification will be sent to process owners
PROCESS INSTANCE:
- The process related to an instance will NOT be deprecated
- The own solution will NOT be deprecated
- Related solutions will NOT be deprecated
- A notification will be sent to the process owners
- A notification will be sent to the owners of related solutions
- A notification will be sent to the owners of own solutions (who are the instance owners)
- A notification will be sent to the owners of the datasets the instance writes to (DATASET_OUTPUT)
- A notification will be sent to the subscribed users of the datasets the instance writes to (DATASET_OUTPUT)
SOLUTION:
- Own instances will NOT be deprecated
- Related instances will NOT be deprecated
- A notification will be sent to solution owners

Expiration

The expiration of Anjana's native entities triggers a series of changes in the objects and the sending of notifications to users related to them.

DATASET:
- If it is governed, its access permissions that have been included with DSAs will be removed
- Its dataset_fields will be expired
- The DSAs in which it is included will NOT be expired
- The instances it is associated with will NOT be expired
- A notification will be sent to the dataset owners
- A notification will be sent to subscribers of the DSA containing the dataset and their owners
- A notification will be sent to the owners of related instances
- A notification will be sent to the owners of entities related to the dataset and of DSAs containing the dataset
DSA:
- If it contains governed entities, the group and all its permissions will be removed
- A notification will be sent to DSA owners
- A notification will be sent to the owners of the entities the DSA contains
- A notification will be sent to DSA subscribers
- A notification will be sent to the owners of entities related to the DSA
PROCESS:
- Instances associated with the process will be expired
- A notification will be sent to process owners
- A notification will be sent to the owners of entities related to the process
PROCESS INSTANCE:
- The process related to an instance will NOT be expired
- The own solution will NOT be expired
- The related solution will NOT be expired
- A notification will be sent to the process owners
- A notification will be sent to the owners of related solutions
- A notification will be sent to the owners of own solutions (who are the instance owners)
- A notification will be sent to the owners of the datasets the instance writes to (DATASET_OUTPUT)
- A notification will be sent to the subscribed users of the datasets the instance writes to (DATASET_OUTPUT)
- A notification will be sent to the owners of entities related to the instance
SOLUTION:
- Own instances will NOT be expired
- Related instances will NOT be expired
- A notification will be sent to solution owners
- A notification will be sent to the owners of entities related to the solution

Lifecycle of Non-Native Anjana Entities

Non-native Anjana entities go through the following states in their lifecycle:

https://lh7-rt.googleusercontent.com/docsz/AD_4nXch2kAut53j2MkaEKkWSbx0-oJtUiZZVcj3jJusK3rdFQRiaEGsz9dzll_Cbs8ibPD58Hq27ourxnn-f8r-mjPhjWsSvb2ulZ61YeTq90kyVIFnO9Q1cyhmO8r3X4ZcWBMP4xazVQ?key=eE4OxRa9KEXEmq0Gh5OpzA

Imported: Entities created using Automatic Metadata Extraction remain in the Imported state.
Draft: If an Imported entity is modified to fill in its data, it moves to the Draft state.

If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a non-native entity is created in the Draft state.

If an Approved entity is modified, a new version is created with the Draft state.

If a Rejected entity is modified, it moves to the Draft state.

Pending: Once the entity is submitted for validation, it moves to the Pending state.
Approved: If all validators approve an entity, it moves to the Approved state. If it is edited again, an entity with those changes is generated in the Draft state.

An entity with the Deactivated state can return to Approved after its activation.

Rejected: If any validator rejects the validation of an entity, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
Deactivated: If the non-native entity no longer makes sense it is possible to deactivate it, moving from the Approved state to the Deactivated state.

Relationship Lifecycle

Relationships go through the following states in their lifecycle:

Imported: Relationships created using Automatic Metadata Extraction remain in the Imported state.
Draft: If an Imported relationship is modified to fill in its data, it moves to the Draft state.

If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a relationship is created in the Draft state.

If an Approved entity is modified, a new version is created with the Draft state.

If a Rejected entity is modified, it moves to the Draft state.

Pending: Once the relationship is submitted for validation, it moves to the Pending state.
Approved: If all validators approve a relationship, it moves to the Approved state. If it is edited again, a relationship with those changes is created in the Draft state.

A Deactivated relationship can return to Approved after its activation.

Rejected: If any validator rejects the validation of a relationship, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
Deactivated: If the relationship no longer makes sense it is possible to deactivate it, moving from the Approved state to the Deactivated state.

Workflows

What are they?

Workflows represent the sequence of steps that must be followed for an action started in Anjana by a user to be validated by other users, establishing a collaborative working environment based on roles.

Workflows allow the procedures of the organization's governance model to be materialized and ensure that all roles involved in the processes exercise their responsibility, are informed and/or consulted.

These are some of their characteristics:

Workflows are configurable and each one of them has its own state diagram
When a user submits a request in Anjana, the workflow corresponding to the user's role and the request made is automatically generated if there are no configured rules preventing it
In each workflow, a user of each assigned role must validate the request for it to move to approved. If a user rejects the validation, the workflow will be automatically cancelled
Each state of the workflow involves the review and validation of the request by one or more users depending on their roles and the type of request. This review is materialized with the approval or rejection of the validator to the workflow
Validations will always be accompanied by a comment from the user indicating the reason for their response
It is possible that workflows are launched in Anjana with more steps than will ultimately be validated. The participants in the approval flow can vary depending on:
- The role of the user who launches the validation flow
- The user who launches the validation
- The type of action under validation
- The type of object under validation
- The subtype of object under validation
- Attributes of the template of the object under validation
- The validator's decision
- The organizational unit of the object
- The version of the object
- Identifiers of the objects
- Name of the objects
All actions carried out in workflows will be audited allowing identification of the different validators, dates, comments and responses.

Thanks to the use of workflows in Anjana, Governance procedures can be automated in an agile and simple way, guaranteeing full tracking and traceability of each request.

Workflows to configure

The workflows used in Anjana are configurable by object type (entity type or relationship type), action (creation, modification, subscription…) and by the role that submits the request.

In order to implement the complete logic of Anjana, it is necessary to define the following workflows:

Creation workflows for any defined ENTITY and RELATIONSHIP
Manual deprecation workflows for DATASET, DSA, PROCESS, PROCESS INSTANCE AND SOLUTION (Anjana native entities)
DSA subscription workflows
Modification workflows (with or without versioning) for any defined ENTITY and RELATIONSHIP
Organizational unit change workflows for any defined ENTITY
Activation and deactivation workflows for any defined ENTITY and RELATIONSHIP

Workflow types

The validation steps that make up the workflow are identified at the moment it begins to execute and mark the order of acceptance of an element by different roles.

A sequential workflow is a workflow in which there are no branches and the order of validations is always the same. This is why, at the moment this type of workflow starts, it is known in advance which roles will participate in the validation. Below is an example of a linear sequence of the hierarchical acceptance operation in the Anjana Data tool.

In the case that the workflow includes conditions to evaluate based on the object to be validated, it is not sequential. When a workflow of this type starts, all the roles involved are known but the roles that will need to validate and those that will not will be discovered as the workflow progresses.

For cases where multiple users must validate a step of a workflow because their role depends on a specific business unit and they are not cross-cutting to the organization, the validations in that step will be executed in parallel so that the step will be considered completed when all users have validated.

You can find more information about workflow configuration in the Functional Configuration Guide and in the Workflow Configuration Guide.

Contracts

A contract is a written agreement between data producers and consumers that facilitates the sharing, regulated use and consumption of data.

They specify the license terms, incorporate the conditions of data use and the additional requirements established by both parties (e.g.: quality conditions, availability, etc).

The DSAs where contracts are included contain an expiration date for lifecycle management along with versions and impacts.

Before subscribing to a DSA, the consumer user is required to download the contract, read it and sign it.

Transfer of rights, responsibilities and data use

The establishment of legal contracts for DSAs and Solutions allows data owners to transfer both rights and responsibilities over information under legal coverage, while at the same time facilitating the exchange, use and consumption of data.

Use of contracts

A contract applies to both DSAs and Solutions and represents a legal agreement for data use.

Each contract has two groups of counterparties (providers and consumers), a responsible party and a legal figure that validates the contract, if so defined in the data governance model.

The contract includes the terms of use to be applied and a validity date of the contract for the counterparties' signature. This information remains immutable during the life of the contract under the signed conditions and, if a modification is desired, it will require the acceptance of each and every counterparty.

Characteristics of DSA contracts

When creating a DSA, the owners of the contained entities (data structures such as datasets, reports, KPIs, quality rules…) ‘transfer’ the responsibility for them to the DSA manager.

When a user subscribes to a DSA, the user accepts compliance with the contract under the license terms specified in the creation of the DSA. The DSA manager is responsible for ensuring that the contract conditions are met by all parties.

It is possible to group several data structures in the same DSA for access requests through a single subscription request under the same legal agreement.

Furthermore, an entity can be included in different DSAs covered by different legal contracts (for different use cases, different license terms, different validity dates…)

Data Governance

Passive data governance is an approach to data management and oversight characterized by observation, recording and auditing of information without actively interfering in the data flow or in users' daily operations.

Active governance, in contrast, seeks to implement policies, rules and controls that intervene directly and immediately in the use, access and manipulation of data within an organization. In this way, data management tasks are declared in the governance platform and, once the corresponding approvals are obtained, they are executed in data platforms or identity management systems.

Anjana Data offers different native functionalities for structure governance through plugins, which are the connectors that allow communication with data systems.

Passive Governance

Metadata Extraction

This functionality allows metadata to be extracted from structures located in data systems to incorporate them into the Data Catalog or Business Glossary by generating the corresponding objects.

You can find more information about how to extract metadata in the section Creation Wizard > Automatic Metadata of this User Guide.

External Audit

The external audit is the result of monitoring the audit logs returned by data platforms and allows visualizing, for example, ungoverned accesses or changes in data structures that have not been previously declared in the tool.

This audit is not provided by Anjana, it only enables its exploitation.

You can find more information in the sections Dataset > Audit, Process Instance > Audit, Features Menu > Audit and User Profile > Audit of this User Guide.

DSA Reuse

Although Anjana can manage the groups of identity managers, as can be seen in the Data Structure Management section, it is possible that in the environment with which Anjana interacts, groups may already exist that were previously created.

In that case, and to allow Anjana to manage permissions automatically through subscriptions, DSAs can be “reused” by including in the physicalName attribute of their template the name of the group that already exists. With this, when the DSA is approved in Anjana, the corresponding structure will not be created but will be used when users request subscription and it is granted to them.

It should be noted that already existing groups used in this way by Anjana enter the natural lifecycle of groups governed by DSAs, which includes their deletion once the DSA using them expires.

Active Governance

Data Structure Management

This functionality allows automation in data structures by retrieving their metadata from Anjana.

In this way, for example, when a new dataset located in a SQLServer is created in the application and the validation flow with approvals is completed, the Anjana plugin can automatically create the corresponding structure.

Or, when creating a DSA that includes a dataset and a report marked as governed in their template and it is approved in the validation flow, the Anjana plugin can create the corresponding group in the identity manager. In this way, Anjana creates the group in Azure AD to which permission is granted to the data of the dataset and report in Azure Storage, for example.

It should be noted that, due to Anjana's operational model, if you want to modify the group corresponding to the DSA it is necessary to version the DSA so that a new group is created and permissions are granted. That is, if in an existing DSA data structures are added or if any of the included ones changes to governed without having been so before, it is necessary for a new version of the DSA to be created and, with it, the new group with the new data access permissions is created. The created group will be named the same as the value entered for the “name” attribute of the DSA concatenated with “_V” and its version.

Data Sample

This is a very useful functionality for data sharing. If the owner of a governed dataset enables the check in its template to allow data sampling and the triplet contemplates it, Anjana users will be able to view a sample of the dataset's data in which fields marked as PI will appear obfuscated.

You can find more information about this functionality in the section Dataset > Sample Data of this User Guide.

Data Access Permission Management

This functionality enables data sharing by managing data access permissions directly at the source, without virtualization, copying or movement of data.

In Anjana, permission management is carried out through the shopping cart from where the user requests access to data structures or DSAs.

When the subscription request is approved, it is possible, therefore, to include the requesting user in the group corresponding to the DSA they have subscribed to so that, in this way, they “inherit” the permissions that the group has. Following the example introduced in the Data Structure Management section for creating a DSA in Azure AD, the user would be added to that same group so they gain access permissions to the data from the Azure Storage datasets.

In order to carry out active governance of structures it is necessary for them to have, in their templates, the attributes infrastructure, technology, zone, path and is_governed configured.

You can find more information in the section Features Menu > Shopping Cart of this User Guide.

Other integrations

It is also possible to integrate Anjana with data ingestion services, data platforms, identity management systems and data consumption services through plugins.

https://lh7-rt.googleusercontent.com/docsz/AD_4nXf3w88lbPXS0_wP1YwzA0otSVib2s78Ujri8YguXJHUxssMxI0pLzHrcKZAFS74Z2D73xw4Ue-hpDEq03wLG4ROLOruAdcbd8Tykzl4AG7YY7IOD6Ah1AEV6rg7V72d9U5F1pTIKh8b1kSWUw-531555Adc?key=eE4OxRa9KEXEmq0Gh5OpzA

Roles and Permissions

The definition of user roles and their capabilities in the organization are configurable in Anjana Data so that each organization establishes the governance model according to its needs.

The roles defined must match those defined within the organization's Data Governance.

These roles can be:

Vertical: people who hold the role exercise it only over a specific data domain (organizational unit). For other domains, other people will hold the same role as them having the same responsibilities but over another set of data assets
Cross or transversal: people who hold the role will always have the same permissions over the assets of all the organization's data domains.

Each of the defined roles must have a set of permissions. Permissions can be applied by object type (metrics, dataset, terms, DSAs…), by action type (creation, modification, searches…) and by specific modules (audit, lineage, history…).

In the event that an Anjana user does not have a specific role assigned from those configured on the platform, it is possible to grant the user the permissions corresponding to a default role that is configured and will be assigned to anyone who enters the Data Portal.

Data Domain, Organizational Unit or Business Unit

Anjana Metamodel

Entity

Relationship

Data Catalog Metamodel Objects

Dataset

Dataset_field

DSA (Data Sharing Agreements)

Process

Process Instance

Solution

Object Lifecycle

Lifecycle of Anjana Native Entities

Versioning

Deprecation

Expiration

Lifecycle of Non-Native Anjana Entities

Relationship Lifecycle

Workflows

What are they?

Workflows to configure

Workflow types

Contracts

Transfer of rights, responsibilities and data use

Use of contracts

Characteristics of DSA contracts

Data Governance

Passive Governance

Metadata Extraction

External Audit

DSA Reuse

Active Governance

Data Structure Management

Data Sample

Data Access Permission Management

Other integrations

Roles and Permissions