Use Cases | Anjana Data Documentación

Single node environment deployment

For the deployment of a single node Anjana environment it will be necessary to follow the steps described in the Anjana Deployment section. Those steps can be summarized as follows, but it will equally be necessary to follow the section for more detail:

Create or have a virtual machine available according to the recommendations.
Request the accesses and credentials.
Platform the environments according to the documentation.
Clone the appropriate inventory to create the inventory defined in the platforming script:
- localhost: when the ansible kit is located on the same machine where the product will be installed.
- sample: when the ansible node is a different node from the Anjana node.
Make the required adjustments.
Edit in the all.yml file of the chosen inventory the desired version for the Anjana deployment according to the instructions.
Launch ansible for the product deployment with or without sample data according to the need.

AWS S3 as a substitute for MinIO

Buckets in AWS S3 will be managed in the same way as they would be with buckets in MinIO, making it possible to customize the region and name of the buckets to connect to.

More details about the connection to AWS S3 can be found in the Cloud Integrations section, Buckets in AWS S3.

PostgreSQL RDS as a substitute for PostgreSQL

The connection to an RDS will have practically no differences, since it is only necessary to alter the connection host and port.

More details about the connection to RDS can be found in the Cloud Integrations section, PostgreSQL in RDS.

Balanced environment deployment

For the distributed and balanced deployment mode of Anjana it will be necessary to have the following machines:

2 VM Front + Back
1 VM Tot + Plugins
1 VM Solr + Zookeeper
4 VM MinIO (Cluster) or AWS S3
PostgreSQL RDS
1 VM ansible manager (optional)

IMPORTANT:

High availability is not provided by the product, but by the infrastructure. The product only offers load balancing.
Tot and the plugins do not need load balancing
PostgreSQL cannot be load balanced in its self-hosted version due to its technical complexity. The use of PostgreSQL RDS is recommended in case load balancing or high availability is needed.
Solr and Zookeeper do not require load balancing or high availability for their use case

Next, the steps described in the Anjana Deployment section will be followed with the exception of launching Ansible for the Anjana deployment, since it will be necessary to complete some prior steps.

Once the steps have been completed, taking into account that:

The necessary credentials and accesses are already available.
The script has already been downloaded and the nodes that will make up the Anjana environment have already been platformed (all VMs mentioned above).
A customized inventory for the environment is already available.
The hosts.yml file has already been adjusted to include the IPs and other data of all nodes in the environment.
The all.yml file has already been adjusted to determine the version or versions chosen for the deployment as well as the ansible roles that will participate in the environment.

Next, the following sections are followed to adjust the persistences according to what is needed.

It is worth remembering that, according to the Cloud Integrations section, if it has been decided that PostgreSQL and MinIO will be replaced by RDS and AWS S3 respectively, the PostgreSQL and MinIO roles should be disabled to prevent their import and subsequent installation.

MinIO (Cluster)

For MinIO in cluster mode the miniohosts.yml file in the inventory will need to be edited to adjust the volumes to the four available nodes.

The volumes option for Standalone mode will be commented out and the volumes line for MinIO cluster will be uncommented and adjusted.

The data folder of a MinIO Cluster must be a mounted disk or the following error will appear in the execution logs:

Next, it will be necessary to uncomment in the anjanauihosts.yml file the additional nodes for the MinIO load balancer and edit them to correspond to the existing nodes:

Recommended to reduce the deployment complexity of a MinIO cluster and its cost, by using its Cloud equivalent, in this case AWS S3, to have load balancing and high availability.

Now, once everything necessary has been adjusted to deploy the product, the installation proceeds using the ansible kit.

Recommended to launch the Ansible ping command to check connectivity with the environment nodes before starting the installation:

sudo ansible -i /opt/ansible/ansible-inventories/<inventario>/hosts.yml all -m ping

To install Anjana with sample data:

anjana -t anjana-sample

To install Anjana without sample data:

anjana

Multi instance plugin

To generate a second instance of an existing plugin, the following steps will be necessary:

Duplicate the configuration template and the service descriptor of the chosen plugin in the inventory in use as many times as new instances you want to create, adding the appropriate numbering.
It should look as follows:

If both instances will use the same configuration, duplicating the configuration template is not necessary.

Add in the hosts.yml file of the inventory as many new entries of the selected plugin as instances of the same plugin you want to add, changing the port, service name, configuration profile and index (necessary for utilities like start/stop).

The configuration profile name cannot be default in any of the plugin instances, otherwise all instances will retrieve that default configuration profile.

If the configuration of the second instance is different, it will also be necessary to modify the configuration profile so that it is applied correctly.

Once everything has been adjusted, the configuration update will be launched with the command:

anjana -t update-config

To deploy the new instance, it will be necessary to launch the plugin role with the following command:

anjana -t tot-plugin-<plugin>

The following points will need to be considered for the ansible kit when dealing with multi-instances:

It is not possible to disable the role import in the all.yml file for a single instance; when the corresponding plugin tag is executed, the role will be imported once per instance until all have been provisioned.
It is not possible to select a single instance for plugin configuration and jar update operations; they will be carried out on all available instances.
It is not possible to select a single instance for plugin start and stop operations, or the environment; they will be carried out on all available instances.

Anjana + standalone plugins environment deployment

In this use case, the installation of a singlenode Anjana environment is considered, with the exception of Tot+Plugins, which will be hosted on a separate machine.

For this, the steps described in the Anjana Deployment section will be followed as if the deployment were a distributed environment, since it will be so on the Tot+Plugins side, taking into account the additional adjustments, which are detailed below:

The all.yml file will be enabled in the import_role section for all the plugins to be deployed according to what is mentioned in the Role import and configuration section.
The anjana.plugins_config.standalone property in the all.yml file will be edited to set it to true according to what is mentioned in the installation configuration section.

Once these changes have been made, it will be possible to continue with the installation steps.

The following points will need to be considered when working with plugins in standalone format:

It will not be possible to disable horus even if the plugins don't need it, since the rest of the microservices depend on it.
The configuration directory is the one indicated in all.yml according to the installation configuration section. Modifying it will affect both the standalone plugins and Anjana microservices.
The configuration of standalone plugins is placed in the same configuration directory as the rest of the microservices, but on the machine where each plugin is located.
If the configuration directory is located outside the machine(s) where the plugins are located, it will not be reachable by them unless it is an externally mounted storage.

Cloning a PRE environment to PRO

For this use case, a pre-existing PRE environment is taken as the starting point, whose data needs to be migrated to another PRO environment.

It is possible to clone the persistences and configuration using the clone functionality or tag.

Preparation for cloning

The following requirements are taken as the starting point:

installation.mode has been set to manager in the all.yml file.
Credentials and access to the Anjana repositories are available and correctly configured.
All connection strings belonging to the persistences are correctly adjusted. For more information the Connection strings section can be followed.

In case of doubts regarding the all.yml file, the All.yml explained section can be consulted.

Data + configuration cloning

All persistence data and the configuration of Anjana microservices and plugins will be backed up using the following command:

anjana -t clone

All resulting files will be placed on the manager machine in the /opt/export-import directory, by default. A compressed file will be generated so that it can be easily transported.

To transfer the data to the destination machine, the following commands are executed:

scp -i /<user>/.ssh/<key>.pem <user>@<origen>:/opt/export-import/anjana_clone.tgz /<ruta_local>
scp -i /<user>/.ssh/<key>.pem /<ruta_local> <user>@<destino>:/tmp/anjana_clone.tgz
sudo mv /tmp/anjana_clone.tgz /opt/export-import/anjana_clone.tgz

The /opt/export-import directory must exist on the destination machine.

Data restoration

To execute the data restoration, after positioning yourself on the destination machine (PRO), two cases may arise:

CASE 1 - Restoration in a new environment

If the restoration is executed on a new environment or machine it will be necessary to deploy Anjana and all persistences, without data, to subsequently fill them with the cloned data.

To do this, after positioning on the destination machine, it would be necessary to download and adjust the ansible kit to deploy Anjana, using the command without tags:

anjana

CASE 2 - Restoration in an existing environment

If the restoration takes place in an environment with a deployed Anjana, it will be necessary to check the following points:

The destination environment where the restoration will be performed must have the same version of Anjana as the source environment of the data. If there are discrepancies, it will be necessary to adjust the versions in the all.yml file and then update with the command:

anjana -t update

The restoration process does not overwrite or delete data, so it is necessary to perform a prior deletion of data and configuration, launching the following command:

anjana -t delete,delete-config

Once the tasks and requirements according to the case in which the environment was found have been completed, to proceed with the data restoration, the .tgz compressed file generated in the cloning process must be in the path defined in the variable file mentioned above, by default /opt/export-import/anjana_clone.tgz.

The following points will need to be considered:

The S3 buckets must be created beforehand and match the names specified in all.yml in case they have been modified.
The cloning and data restoration process does not transfer changes made to the web server. If modifications or customizations have been made to the apache2 configurations, they will need to be manually transferred or copied.

For the general restoration, the following command will be launched:

anjana -t deploy

For the individual restoration of persistences and/or configuration:

anjana -t deploy-s3
anjana -t deploy-bbdd
anjana -t deploy-config

After data restoration it is recommended to restart the environment with:

anjana -t restart

Loading an accelerators sample after deployment

Anjana Data Platform allows loading, once the solution has been deployed, a sample of accelerator data offered by Anjana Data Platform to facilitate the adoption of data governance and AI. These samples include predefined metadata configurations, templates, taxonomies and asset examples that serve as a starting point for implementation projects.

The samples are not part of the standard product nor are they covered by support. They are offered as free accelerators provided by Anjana Data S.L. to serve as a starting point in implementations, proof of concept (POCs), training and demonstration environments.

A common case is loading the sample associated with the HEALTH DCAT-AP metadata standard, available as a functional accelerator.

Sample loading steps

Access the variables directory of the Ansible inventory for the environment:
```
cd /opt/ansible/ansible-inventories/<inventario>/group_vars
```
Open the global configuration file:
```
vim all.yaml
```
Configure the name of the sample to load.
In this example, for the HEALTH DCAT-AP accelerator, enter:
```
sample: "pub-health-dcatap"
```
Save the changes and close the editor.

The list of available samples can be consulted in the Anjana Data artifact repository, using the access key provided during the implementation.

Execute the task to clean the previous content of the environment persistences:
```
anjana -t delete
```
Launch the insertion of the configured sample:
```
anjana -t insert
```
Unlock the schemas to avoid migration locks when starting the microservices.
This step prevents errors of the type “Waiting for changelog lock”:
```
anjana -t unlock-schemas
```
Wait for the microservices to complete their startup and, once they are operational, access the platform to validate that the sample has been loaded correctly.