This section describes the basic concepts that users can work with in Anjana Data. However, you can find a Glossary at the end of this manual with the main definitions.
Data Domain, Organizational Unit or Business Unit
It is the mechanism by which data custody is established within the Business Glossary or Data Catalog, allowing the reality of the organization to be represented in terms of functional data domains or semantics.
It is important not to confuse the organizational hierarchy or org chart of an organization with data domains. The hierarchical structure of organizations changes constantly and, on the contrary, data domains should be fairly static.
Data assets are classified into the different Organizational Units (or data domains) that have been identified in the organization. In this way, users who hold a role in an organizational unit are responsible for the data assets that belong to their unit/domain.
Anjana Metamodel
The set of entities and relationships that can be governed from the Anjana Data Platform is what is known as the Metamodel. This metamodel is fully configurable, meaning the organization must undertake the exercise of defining a governance strategy in which, among other things, it defines which entities it wants to govern and what types of relationships it wants to establish between the different entity types to offer an end-to-end view of the data.
Entity
It is the representation of any data element within the organization's semantic map. Depending on the nature of the entities, they are classified as Business Glossary entities (in the case that they are more related to Business) or Data Catalog entities (more related to IT).
In Anjana Data it is possible to create as many entity types as desired.
Some examples of entities can be business terms, metrics, dimensions, reports, business processes or data quality rules.
Additionally, it is also possible to define entities that represent physical assets such as datasets, processes or DB schemas.
Some of these entities are native to Anjana and the application carries out certain logic on them, specified further ahead:
-
Dataset and dataset_fields
-
DSA
-
Process and process instance
-
Solution
Relationship
It is the mechanism provided by Anjana Data to establish a link between entities that are somehow connected, whether by association, involvement, membership, etc. For example, relationships between business terms that allow calculating a metric, or the data quality rules that apply to a specific report.
Through relationships it is possible to establish the end-to-end data lifecycle, that is, it is possible to govern technical lineage (how data flows through the different IT systems and what transformations it undergoes), how data is used for analytical purposes, the semantics associated with the data, where quality checkpoints are applied to verify data conformity, or even which users are consuming certain data.
Relationships allow linking entities within the Business Glossary to each other, but also relating Business Glossary entities to Data Catalog entities to locate, for example, where a particular business term is stored.
Just as with entities, in Anjana it is possible to configure as many relationship types as needed, which will coexist with the set of native (or internal) relationships of the application. These native relationships associate native entities with each other and are not created through the object creation wizard but through functionalities specific to these entities. These relationships are:
-
STRUCTURE: Relationship between a dataset and its dataset_fields
-
DSA_CONTENT: Relationship between a DSA and the entities it contains
-
INSTANCE_PROCESS: Relationship between a process and its instances
-
INSTANCE_DATASET_IN: Relationship between an instance and its input datasets
-
INSTANCE_DATASET_OUT: Relationship between an instance and its output datasets
-
SOLUTION_RELATED_INSTANCE: Relationship between a solution and its related instances
-
SOLUTION_OWNED_INSTANCE: Relationship between a solution and its own instances
Data Catalog Metamodel Objects
Despite the flexibility of the Anjana Data metamodel and the ability, therefore, to create as many entities as desired to represent, the Data Catalog metamodel is based on the following main objects: datasets (and their dataset_fields), DSAs, processes (and their instances) and solutions along with their relationships, metadata and lineage.
Dataset
It is any physical asset that contains or represents data, both structured and unstructured, persisted or non-persisted. It can be a file, a table, a document, of any type and format.
It is the main object to govern within the Data Catalog. What differentiates the Catalog from any Data Dictionary is that it can be enriched with technical and functional metadata.
Dataset_field
In the case that the dataset is structured, it is composed of a set of dataset_fields. Each dataset_field is a field or column of the dataset that contains its own metadata.
DSA (Data Sharing Agreements)
It is a logical asset that can include one or more entities to facilitate the grouping of physical data assets at a level closer to information consumption.
It is the mechanism by which access to governed data is facilitated and allows data to be shared between providers and consumers through the signing of a contract by which the user commits to complying with the conditions of data use.
It is enriched with business metadata.
Process
It corresponds to groups of functions defined to extract data from sources, move them, transport them, transform them, exploit them and/or generate new data (ETLs, transformation scripts, quality control executions, report generation…).
Process Instance
A process can have one or more instances depending on the different execution scenarios based on possible configurations it can take, in order to execute software modules developed in different situations or platforms.
An instance is a specific execution of a process with a parameterization and a set of input and output datasets. It allows the definition of framework pieces without needing to register a process for each one of them. Once defined, each execution is auditable.
Solution
The solution represents the “data contract” that authorizes the movement of data through processes. In this way, it is guaranteed that authorizations exist for the movement of data between systems or applications. Therefore, it is a logical asset that encompasses several process instances along with their related datasets to have an end-to-end view of the executions.
A solution has a person responsible for its administration and maintenance who guarantees its execution.
The solution metadata can be enriched with business metadata.
The difference between solutions and DSAs is that solutions enable consumption by IT processes and applications while DSAs govern consumption by consumer users.
Object Lifecycle
The native objects of Anjana Data have their own lifecycle, so that governed objects go through different states that allow everything from traditional passive governance to active governance with impact management.
The lifecycle through which objects pass in Anjana is presented below.
Lifecycle of Anjana Native Entities
Anjana's native entities have a different lifecycle from non-native entities. This section presents the states that a native entity can go through:
-
Imported: Entities created using Automatic Metadata Extraction remain in the Imported state.
-
Draft: If an Imported entity is modified to fill in its data, it moves to the Draft state.
If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a native entity is created in the Draft state.
If an Approved entity is modified, a new version is created with the Draft state.
If a Rejected entity is modified, it moves to the Draft state.
-
Pending: Once the entity is submitted for validation, it moves to the Pending state.
-
Approved: If all validators approve an entity, it moves to the Approved state. If it is edited again, an entity with those changes is generated in the Draft state.
An entity with the Deactivated state can return to Approved after its activation.
-
Rejected: If any validator rejects the validation of an entity, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
-
Deprecated: If significant modifications are made to a native entity (enough for versioning to occur), once the new version of the entity is approved, the previous version automatically moves to the Deprecated state.
-
The rules that determine which changes generate a new version are fully configurable, as will be seen later.
-
Additionally, it is possible to deprecate an entity with the aim of indicating that, after a period of time, it will expire.
-
Expired: Once the expiration date set on a native entity is reached, it automatically moves to the Expired state.
Versioning
The versioning of assets has two fundamental objectives:
-
To have an impact control mechanism from the point of view of interoperability, quality, security, data protection, etc.
-
To have a version history with the most significant changes.
The organization establishes which changes in the metadata templates of Anjana's native entities should generate a new version of it.
It is possible to configure any of the following cases:
-
That a new version of an object is generated when any change occurs in significant attributes of an object's template.
-
That a DSA is versioned when the entities it provides access to change.
-
That an instance is versioned when the input or output datasets of the instance change.
-
That a solution is versioned when its related instances change.
-
-
That versioning occurs when an attribute acquires a specific value.
-
That a new version of the dataset is generated when a specific attribute of a dataset field changes or a dataset field is added or removed.
-
That a deprecated or expired entity for which there is no more current approved version is edited.
Deprecation
The deprecation of Anjana's native entities triggers a series of changes in the objects and the sending of notifications to users related to them.
-
DATASET:
-
Its dataset_fields will be deprecated
-
The DSAs in which it is included will NOT be deprecated
-
The instances it is associated with will NOT be deprecated
-
A notification will be sent to the dataset owners
-
A notification will be sent to DSA subscribers and their owners
-
A notification will be sent to the owners of related instances
-
-
DSA:
-
A notification will be sent to DSA owners
-
A notification will be sent to the owners of the entities the DSA contains
-
A notification will be sent to DSA subscribers
-
-
PROCESS:
-
Instances associated with the process will be deprecated
-
A notification will be sent to process owners
-
-
PROCESS INSTANCE:
-
The process related to an instance will NOT be deprecated
-
The own solution will NOT be deprecated
-
Related solutions will NOT be deprecated
-
A notification will be sent to the process owners
-
A notification will be sent to the owners of related solutions
-
A notification will be sent to the owners of own solutions (who are the instance owners)
-
A notification will be sent to the owners of the datasets the instance writes to (DATASET_OUTPUT)
-
A notification will be sent to the subscribed users of the datasets the instance writes to (DATASET_OUTPUT)
-
-
SOLUTION:
-
Own instances will NOT be deprecated
-
Related instances will NOT be deprecated
-
A notification will be sent to solution owners
-
Expiration
The expiration of Anjana's native entities triggers a series of changes in the objects and the sending of notifications to users related to them.
-
DATASET:
-
If it is governed, its access permissions that have been included with DSAs will be removed
-
Its dataset_fields will be expired
-
The DSAs in which it is included will NOT be expired
-
The instances it is associated with will NOT be expired
-
A notification will be sent to the dataset owners
-
A notification will be sent to subscribers of the DSA containing the dataset and their owners
-
A notification will be sent to the owners of related instances
-
A notification will be sent to the owners of entities related to the dataset and of DSAs containing the dataset
-
-
DSA:
-
If it contains governed entities, the group and all its permissions will be removed
-
A notification will be sent to DSA owners
-
A notification will be sent to the owners of the entities the DSA contains
-
A notification will be sent to DSA subscribers
-
A notification will be sent to the owners of entities related to the DSA
-
-
PROCESS:
-
Instances associated with the process will be expired
-
A notification will be sent to process owners
-
A notification will be sent to the owners of entities related to the process
-
-
PROCESS INSTANCE:
-
The process related to an instance will NOT be expired
-
The own solution will NOT be expired
-
The related solution will NOT be expired
-
A notification will be sent to the process owners
-
A notification will be sent to the owners of related solutions
-
A notification will be sent to the owners of own solutions (who are the instance owners)
-
A notification will be sent to the owners of the datasets the instance writes to (DATASET_OUTPUT)
-
A notification will be sent to the subscribed users of the datasets the instance writes to (DATASET_OUTPUT)
-
A notification will be sent to the owners of entities related to the instance
-
-
SOLUTION:
-
Own instances will NOT be expired
-
Related instances will NOT be expired
-
A notification will be sent to solution owners
-
A notification will be sent to the owners of entities related to the solution
-
Lifecycle of Non-Native Anjana Entities
Non-native Anjana entities go through the following states in their lifecycle:
-
Imported: Entities created using Automatic Metadata Extraction remain in the Imported state.
-
Draft: If an Imported entity is modified to fill in its data, it moves to the Draft state.
If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a non-native entity is created in the Draft state.
If an Approved entity is modified, a new version is created with the Draft state.
If a Rejected entity is modified, it moves to the Draft state.
-
Pending: Once the entity is submitted for validation, it moves to the Pending state.
-
Approved: If all validators approve an entity, it moves to the Approved state. If it is edited again, an entity with those changes is generated in the Draft state.
An entity with the Deactivated state can return to Approved after its activation.
-
Rejected: If any validator rejects the validation of an entity, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
-
Deactivated: If the non-native entity no longer makes sense it is possible to deactivate it, moving from the Approved state to the Deactivated state.
Relationship Lifecycle
Relationships go through the following states in their lifecycle:
-
Imported: Relationships created using Automatic Metadata Extraction remain in the Imported state.
-
Draft: If an Imported relationship is modified to fill in its data, it moves to the Draft state.
If it is created via Excel Upload or Manual creation (whether using the API or the portal itself), a relationship is created in the Draft state.
If an Approved entity is modified, a new version is created with the Draft state.
If a Rejected entity is modified, it moves to the Draft state.
-
Pending: Once the relationship is submitted for validation, it moves to the Pending state.
-
Approved: If all validators approve a relationship, it moves to the Approved state. If it is edited again, a relationship with those changes is created in the Draft state.
A Deactivated relationship can return to Approved after its activation.
-
Rejected: If any validator rejects the validation of a relationship, it moves to the Rejected state. If it is edited again to correct the reasons for rejection, it returns to the Draft state.
-
Deactivated: If the relationship no longer makes sense it is possible to deactivate it, moving from the Approved state to the Deactivated state.
Workflows
What are they?
Workflows represent the sequence of steps that must be followed for an action started in Anjana by a user to be validated by other users, establishing a collaborative working environment based on roles.
Workflows allow the procedures of the organization's governance model to be materialized and ensure that all roles involved in the processes exercise their responsibility, are informed and/or consulted.
These are some of their characteristics:
-
Workflows are configurable and each one of them has its own state diagram
-
When a user submits a request in Anjana, the workflow corresponding to the user's role and the request made is automatically generated if there are no configured rules preventing it
-
In each workflow, a user of each assigned role must validate the request for it to move to approved. If a user rejects the validation, the workflow will be automatically cancelled
-
Each state of the workflow involves the review and validation of the request by one or more users depending on their roles and the type of request. This review is materialized with the approval or rejection of the validator to the workflow
-
Validations will always be accompanied by a comment from the user indicating the reason for their response
-
It is possible that workflows are launched in Anjana with more steps than will ultimately be validated. The participants in the approval flow can vary depending on:
-
The role of the user who launches the validation flow
-
The user who launches the validation
-
The type of action under validation
-
The type of object under validation
-
The subtype of object under validation
-
Attributes of the template of the object under validation
-
The validator's decision
-
The organizational unit of the object
-
The version of the object
-
Identifiers of the objects
-
Name of the objects
-
-
All actions carried out in workflows will be audited allowing identification of the different validators, dates, comments and responses.
Thanks to the use of workflows in Anjana, Governance procedures can be automated in an agile and simple way, guaranteeing full tracking and traceability of each request.
Workflows to configure
The workflows used in Anjana are configurable by object type (entity type or relationship type), action (creation, modification, subscription…) and by the role that submits the request.
In order to implement the complete logic of Anjana, it is necessary to define the following workflows:
-
Creation workflows for any defined ENTITY and RELATIONSHIP
-
Manual deprecation workflows for DATASET, DSA, PROCESS, PROCESS INSTANCE AND SOLUTION (Anjana native entities)
-
DSA subscription workflows
-
Modification workflows (with or without versioning) for any defined ENTITY and RELATIONSHIP
-
Organizational unit change workflows for any defined ENTITY
-
Activation and deactivation workflows for any defined ENTITY and RELATIONSHIP
Workflow types
The validation steps that make up the workflow are identified at the moment it begins to execute and mark the order of acceptance of an element by different roles.
A sequential workflow is a workflow in which there are no branches and the order of validations is always the same. This is why, at the moment this type of workflow starts, it is known in advance which roles will participate in the validation. Below is an example of a linear sequence of the hierarchical acceptance operation in the Anjana Data tool.
In the case that the workflow includes conditions to evaluate based on the object to be validated, it is not sequential. When a workflow of this type starts, all the roles involved are known but the roles that will need to validate and those that will not will be discovered as the workflow progresses.
For cases where multiple users must validate a step of a workflow because their role depends on a specific business unit and they are not cross-cutting to the organization, the validations in that step will be executed in parallel so that the step will be considered completed when all users have validated.
You can find more information about workflow configuration in the Functional Configuration Guide and in the Workflow Configuration Guide.
Contracts
A contract is a written agreement between data producers and consumers that facilitates the sharing, regulated use and consumption of data.
They specify the license terms, incorporate the conditions of data use and the additional requirements established by both parties (e.g.: quality conditions, availability, etc).
The DSAs where contracts are included contain an expiration date for lifecycle management along with versions and impacts.
Before subscribing to a DSA, the consumer user is required to download the contract, read it and sign it.
Transfer of rights, responsibilities and data use
The establishment of legal contracts for DSAs and Solutions allows data owners to transfer both rights and responsibilities over information under legal coverage, while at the same time facilitating the exchange, use and consumption of data.
Use of contracts
A contract applies to both DSAs and Solutions and represents a legal agreement for data use.
Each contract has two groups of counterparties (providers and consumers), a responsible party and a legal figure that validates the contract, if so defined in the data governance model.
The contract includes the terms of use to be applied and a validity date of the contract for the counterparties' signature. This information remains immutable during the life of the contract under the signed conditions and, if a modification is desired, it will require the acceptance of each and every counterparty.
Characteristics of DSA contracts
When creating a DSA, the owners of the contained entities (data structures such as datasets, reports, KPIs, quality rules…) ‘transfer’ the responsibility for them to the DSA manager.
When a user subscribes to a DSA, the user accepts compliance with the contract under the license terms specified in the creation of the DSA. The DSA manager is responsible for ensuring that the contract conditions are met by all parties.
It is possible to group several data structures in the same DSA for access requests through a single subscription request under the same legal agreement.
Furthermore, an entity can be included in different DSAs covered by different legal contracts (for different use cases, different license terms, different validity dates…)
Data Governance
Passive data governance is an approach to data management and oversight characterized by observation, recording and auditing of information without actively interfering in the data flow or in users' daily operations.
Active governance, in contrast, seeks to implement policies, rules and controls that intervene directly and immediately in the use, access and manipulation of data within an organization. In this way, data management tasks are declared in the governance platform and, once the corresponding approvals are obtained, they are executed in data platforms or identity management systems.
Anjana Data offers different native functionalities for structure governance through plugins, which are the connectors that allow communication with data systems.
Passive Governance
Metadata Extraction
This functionality allows metadata to be extracted from structures located in data systems to incorporate them into the Data Catalog or Business Glossary by generating the corresponding objects.
You can find more information about how to extract metadata in the section Creation Wizard > Automatic Metadata of this User Guide.
External Audit
The external audit is the result of monitoring the audit logs returned by data platforms and allows visualizing, for example, ungoverned accesses or changes in data structures that have not been previously declared in the tool.
This audit is not provided by Anjana, it only enables its exploitation.
You can find more information in the sections Dataset > Audit, Process Instance > Audit, Features Menu > Audit and User Profile > Audit of this User Guide.
DSA Reuse
Although Anjana can manage the groups of identity managers, as can be seen in the Data Structure Management section, it is possible that in the environment with which Anjana interacts, groups may already exist that were previously created.
In that case, and to allow Anjana to manage permissions automatically through subscriptions, DSAs can be “reused” by including in the physicalName attribute of their template the name of the group that already exists. With this, when the DSA is approved in Anjana, the corresponding structure will not be created but will be used when users request subscription and it is granted to them.
It should be noted that already existing groups used in this way by Anjana enter the natural lifecycle of groups governed by DSAs, which includes their deletion once the DSA using them expires.
Active Governance
Data Structure Management
This functionality allows automation in data structures by retrieving their metadata from Anjana.
In this way, for example, when a new dataset located in a SQLServer is created in the application and the validation flow with approvals is completed, the Anjana plugin can automatically create the corresponding structure.
Or, when creating a DSA that includes a dataset and a report marked as governed in their template and it is approved in the validation flow, the Anjana plugin can create the corresponding group in the identity manager. In this way, Anjana creates the group in Azure AD to which permission is granted to the data of the dataset and report in Azure Storage, for example.
It should be noted that, due to Anjana's operational model, if you want to modify the group corresponding to the DSA it is necessary to version the DSA so that a new group is created and permissions are granted. That is, if in an existing DSA data structures are added or if any of the included ones changes to governed without having been so before, it is necessary for a new version of the DSA to be created and, with it, the new group with the new data access permissions is created. The created group will be named the same as the value entered for the “name” attribute of the DSA concatenated with “_V” and its version.
Data Sample
This is a very useful functionality for data sharing. If the owner of a governed dataset enables the check in its template to allow data sampling and the triplet contemplates it, Anjana users will be able to view a sample of the dataset's data in which fields marked as PI will appear obfuscated.
You can find more information about this functionality in the section Dataset > Sample Data of this User Guide.
Data Access Permission Management
This functionality enables data sharing by managing data access permissions directly at the source, without virtualization, copying or movement of data.
In Anjana, permission management is carried out through the shopping cart from where the user requests access to data structures or DSAs.
When the subscription request is approved, it is possible, therefore, to include the requesting user in the group corresponding to the DSA they have subscribed to so that, in this way, they “inherit” the permissions that the group has. Following the example introduced in the Data Structure Management section for creating a DSA in Azure AD, the user would be added to that same group so they gain access permissions to the data from the Azure Storage datasets.
In order to carry out active governance of structures it is necessary for them to have, in their templates, the attributes infrastructure, technology, zone, path and is_governed configured.
You can find more information in the section Features Menu > Shopping Cart of this User Guide.
Other integrations
It is also possible to integrate Anjana with data ingestion services, data platforms, identity management systems and data consumption services through plugins.
Roles and Permissions
The definition of user roles and their capabilities in the organization are configurable in Anjana Data so that each organization establishes the governance model according to its needs.
The roles defined must match those defined within the organization's Data Governance.
These roles can be:
-
Vertical: people who hold the role exercise it only over a specific data domain (organizational unit). For other domains, other people will hold the same role as them having the same responsibilities but over another set of data assets
-
Cross or transversal: people who hold the role will always have the same permissions over the assets of all the organization's data domains.
Each of the defined roles must have a set of permissions. Permissions can be applied by object type (metrics, dataset, terms, DSAs…), by action type (creation, modification, searches…) and by specific modules (audit, lineage, history…).
In the event that an Anjana user does not have a specific role assigned from those configured on the platform, it is possible to grant the user the permissions corresponding to a default role that is configured and will be assigned to anyone who enters the Data Portal.