Advanced Functionality | Anjana Data Documentación

Full kit functionality

This Annex will cover all the information about possible configurations and utilities not covered in the main tab of the document.

All.yaml explained

The kit includes a file, all.yaml, which contains many variables, some of them not included in the default file, and editable options to facilitate the technical management of the environment.

An all-example.yaml file with all possible configuration variables has been delivered alongside the kit as a reference.

The available functions are detailed below:

Versioning

From kit 25.a1, an automatic selection of the last available artifact is included through product-based versioning.

version.anjana: allows product-based version selection. In this case, by selecting 25.2, the most recent available packages from 25.2 will be automatically downloaded when performing an update.
version.rc: allows downloading release candidate (prerelease) versions before their official publication.

They may not be stable, use responsibly.

version.core / version.plugins: these two keys allow defining individually, versions for core microservices and plugins.

Individually specified versions take precedence over the product version indicated. A microservice or plugin with a fixed version will not be automatically updated beyond the fixed version.

version.utilities.ansible: defines the kit version that will be downloaded when executing the ansible tag. Necessary for the kit update process.
version.utilities.sampledata: encompasses the two keys for defining the version and type of sample data kit that will be downloaded when indicated both in the deployment with sample data and in the reset and data insertion operation.

Connection strings

Default configuration has been included to facilitate its use and reduce the number of modifications needed for the environment to work.

From 25.a2, every time the connection strings are modified in all.yaml, it will be necessary to re-platform to generate the updated environment variables file using the command anjana -t platform.

nexus.url: domain of the Anjana server that provides artifacts during installation and other operations. Configured as a default value, it is not necessary to add this property if you do not want to modify its value.
nexus.user and nexus.pass: nexus connection credentials. Necessary to request them at cs@anjanadata.com.
nexus.external: repositories provided by Anjana for artifact downloads, defined by default.

persistences.s3 allows editing the credentials and customizations needed for the MinIO deployment or connection to the chosen S3 technology.
persistences.s3.type: allows switching between MinIO and AWS S3 as the bucket storage service.
persistences.s3.access_key and persistences.s3.secret_key: access credentials for the technology. Default values provided, can be altered.
persistences.s3.host: connection URL to the S3 technology. MinIO ONLY.
persistences.s3.port: port specified for the connection to the S3 technology. MinIO ONLY.
persistences.s3.region: cloud region where to place the buckets. AWS S3 ONLY.
persistences.s3.endpoint: allows limiting traffic between Anjana and the buckets to the VPC without internet egress using a private gateway-type endpoint. AWS S3 ONLY.
persistences.s3.buckets: Anjana operations buckets. Default values, properties not included in the provided file.

Recommended to leave the default values in buckets.

persistences.s3.dump_bucket: allows through segregation true/false enabling or disabling the segregation of the dump bucket as a data dump medium. This will allow Anjana buckets to be located in MinIO while anjanabackups is in AWS S3, or vice versa.
The rest of the properties will be filled in as mentioned above.

persistences.bbdd allows editing the credentials and customizations needed for the PostgreSQL deployment or connection to RDS.
persistences.bbdd.host: DB connection URL.
persistences.bbdd.port: port specified for the DB connection.
persistences.bbdd.database: database name for the connection.
persistences.bbdd.user and persistences.bbdd.pass: DB access credentials. Default values provided, can be altered.
persistences.bbdd.default_schema: Anjana DB schemas. Default values, properties not included in the provided file.

Recommended to leave the default values in schema.

persistences.index allows editing the connection parameters and some settings for the indexing engine. Currently this section is not included in the provided all.yml file, all values have been set by default.
persistences.index.host: connection URL to the indexer.
persistences.index.port: port specified for the connection to the indexer.
persistences.index.user and persistences.index.pass: indexer access credentials. Default values provided, can be altered.
persistences.index.startup: allows altering the behavior of the indexing engine during the Minerva startup.

Altering the default value may cause undesired results and data loss.

persistences.index.collections: default Anjana collections.

Recommended to leave the default values of the collections provided by default in collections.

Installation configuration

This section allows configuring and customizing the Anjana deployment. By default, the recommended parameters are already set.

anjana.domain: allows setting the access domain for the Anjana instance.

This parameter is required for Anjana to work correctly in a PRE/PROductive environment. The wildcard certificate associated with that domain must be available in /opt/common/anjana-certs/ for the instance to work in case a public certificate is used.

anjana.folder: root folder for deploying all of Anjana. Set by default to /opt.
anjana.configURL: path to the configuration microservice, already set by default.

Not recommended to define this property unless necessary.

anjana.configPath: directory where the configuration will be stored, already set by default.
anjana.plugins_config.standalone: when true, allows plugins to work in standalone mode, they will not need horus to function. The configuration file will be deployed on the same node as the plugin.
anjana.plugins_config.totDomain: when standalone mode is true this value is required. Allows defining the domain of the machine where tot is located for proper communication of the plugins.
anjana.plugins_config.vault.type: corresponds to the type of vault used to host configuration properties of the plugins, with none corresponding to false.

For vault configuration to work, it is required that the config for plugins be standalone.

anjana.plugins_config.vault.host: allows configuring the host or access URL to the vault. Azure or GCP ONLY.
anjana.plugins_config.vault.client_id: allows configuring the client_id for access to the vault. Azure or GCP ONLY.
anjana.plugins_config.vault.secret_id: allows configuring the secret_id for access to the vault. Azure or GCP ONLY. For GCP it will be the path to the required .json credentials file.
anjana.plugins_config.vault.tenant_id: id of the project or tenant to connect to. Azure or GCP ONLY.

anjana.license: allows specifying the Anjana license to enable the use of the product. Necessary to request it at cs@anjanadata.com.
anjana.security.unattended_upgrades: allows enabling automatic security updates provided by the operating system.

Recommended to leave this option enabled.

anjana.security.certificates: allows choosing the type of certificate to use within Anjana between a long-duration self-signed one provided by the installer or a public one provisioned prior to the Anjana environment installation.

Recommended to leave the default value.

To avoid compatibility issues of external technologies with the Anjana self-signed certificate, it is recommended to set up a load balancer or proxy in front of the Anjana core instance or instances.

installation.mode: allows choosing the deployment mode for the Anjana instance. Explained below:
- installation.mode: manager allows deploying Anjana using a controller node. The manager node will download all artifacts and then redirect them to the corresponding nodes. This mode allows only the manager node to have access to the repositories.
- installation.mode: direct allows downloads and deployments to be performed directly on the destination nodes (front, back, persistences, etc…). In this mode all nodes in the environment will require connectivity to the repositories.
- installation.mode: local allows Anjana deployment to occur without requiring connectivity to the repository.

Manager designed for distributed or balanced environments. Leave this default value if no additional customization is needed.

Direct optimal for singlenode environments.

The following should be taken into account:

The deployment will be performed from the ansible node (manager).
Basic connectivity to the operating system packages is required.
It will be necessary to have previously downloaded all the artifacts required for the Anjana deployment using the download tag, and for them to be located in the Anjana temporary directory /tmp/anjana of the manager node.

installation.tmpdir: allows defining the temporary artifact directory for Anjana.

All the content of this folder will be deleted during system restart or at the end of each kit execution.

installation.eurekaPreferIpAddress: allows altering the preference for application registration in SpringBoot.

Recommended to leave the default value.

installation.failFast: allows applications to fail if the configuration microservice cannot be reached.

Recommended to leave the default value. In case of a balanced environment it will be necessary to set it to false.

installation.reportPath: allows defining the directory and execution log file for the ansible kit.
installation.javaPATH: allows defining the default directory where the JAVA installation is located.

It will be necessary to edit the default value if JAVA has been installed in a custom location.

installation.nexus: allows defining the Anjana installation mode.

Recommended to leave the default value.

installation.env: allows defining the environment type to adjust RAM profiles according to the deployed instance.

Recommended to leave the default value.

installation.debug: when set to true allows obtaining additional trace from the kit execution logs as well as accessing debug ports.

When this value is true, all debug ports are exposed after the deployment of the service descriptors. Similarly, all passwords and connection strings will be exposed and recorded in the kit execution logs. Only recommended for maintenance or debug tasks.

installation.owner: allows configuring the details of the user who will be granted ownership of the Anjana installation.

Recommended to leave the default value.

Inter-node connection user

All properties defined under ansibleuser allow configuring an access user to all nodes for proper communication between instances of a distributed or balanced environment.

The tag ansible-user is used for its deployment.

To deploy this user it is first necessary to specify in hosts.yaml a user with sufficient administrator permissions for editing files and directories as root.

Role import and configuration

This section allows editing the roles and configuration to import during the Anjana deployment or kit executions.

import_role.persistences: allows configuring which persistences will be imported during the ansible execution.

It will be necessary to adjust this section according to the persistences being used. Usage example: for an environment using AWS S3, MinIO should be configured to false.

import_role.core: allows configuring which core roles and their configuration will be imported.

Recommended to leave the default value.

import_role.plugins: allows configuring which plugin roles and their configuration will be imported. Adjusted according to the need.

import_role.extras: allows defining which third-party software to import during deployment and execution. Adjusted according to the need.

Log export and monitoring

log.since: allows defining which log range will be exported during the execution of the export-log tag and its subsequent upload to the anjanalogs bucket with the export-log-s3 tag.
monitoring.otlp.enabled: allows enabling Java instrumentation for microservices and plugins.
monitoring.otlp.endpoint: collector URL for sending metrics, logs and traces.
monitoring.otlp.port: collector port.

The monitoring section enables instrumentation of Java microservices and plugins, as well as compatibility with the Open Telemetry Collector.

The Open Telemetry Collector is not included in the installation.

If monitoring is enabled or disabled once the product is installed, it will be necessary to re-platform the environment using the command anjana -t platform.

Governance metrics platform

extras.grafana.user and extras.grafana.pass: allow setting the access user to the Grafana interface hosted at :3000. By default, this utility is not enabled for import, and can be enabled and subsequently deployed with the grafana tag.

Instance platforming

Ansible anjana command

For easier handling of the kit and all its functionality, a command is configured by default that transforms the entire ansible command into a simple anjana followed by any additional parameters to be added.

To restore or configure the command manually, it will be necessary to run these commands:

touch /usr/local/bin/anjana
echo '#!/bin/sh
sudo ansible-playbook -i /opt/ansible/ansible-inventories/<inventario>/hosts.yml /opt/ansible/anjana.yml "$@"' | sudo tee /usr/local/bin/anjana > /dev/null
sudo chmod 755 /usr/local/bin/anjana

Note that the path to the ansible kit or the inventory itself will vary depending on the environment, it will be necessary to adjust according to the need.

Additional platforming

During the deployment of an Anjana environment, the instance on which ansible is running is platformed, to prepare all requirements and install all necessary dependencies.

The section Environment Platforming of the Anjana Deployment covers how to platform the additional instances for a distributed environment.

To recondition or platform new instances added after the Anjana deployment, due to a possible migration of the frontend or persistence server, it would be necessary to adjust the hosts.yaml file of the inventory in use to reflect the latest infrastructure changes.

After that, it would be possible to execute the platforming tag as seen above:

anjana -t platform

Aliases management in /etc/hosts

From 23.1, the Anjana aliases in the /etc/hosts file are managed by the ansible kit in a completely autonomous way as can be seen in the screenshot below:

The anjanadata.local domain is only an example, being replaced by the domain specified in the certificate provided for the Anjana deployment.

If an instance has been added, as occurred in the previous section, or if the infrastructure for the Anjana environment has been modified after the product deployment, it will be necessary to indicate the new machine IPs in the hosts.yaml file of the inventory being used.

Once the file has been updated to reflect the latest changes, the aliases of all machines can be updated automatically via the tag:

anjana -t aliases

An outdated state of these aliases or their manual modification can lead to malfunction of the product and the kit.

If manual entries need to be added, they must be done outside the block managed by ansible so they are not replaced.

Custom connection user

It is possible to create a user as mentioned in the Inter-node connection user section, which will allow communication between nodes (instances) of an Anjana environment, specifically the manager node with the rest of the environment.

To create this user with kit assistance, it is necessary to modify the all.yml section, and indicate the name the user will adopt, as well as its group and key.

Once filled in, it will be necessary to verify that in the hosts.yaml of the inventory there is an administrator user with sufficient permissions for editing files and folders of other users. An example of this is root, but another user is recommended.

Once confirmed, the deployment of the new user proceeds via the tag:

anjana -t ansible-user

Once the operation is completed, the key for the generated user will be available in the directory /home/<ansible-user>/.ssh/<key>.pem.

It will be necessary to replace the user in hosts.yaml with the newly generated one.

Cloud integrations

The kit offers compatibility and features with certain cloud providers, which are detailed below.

Buckets in AWS S3

To use buckets hosted in AWS S3 with the kit, it is necessary to adjust the all.yaml file, to point to buckets available in an AWS region. Some properties will not be available by default and will need to be added:

It will also be necessary to disable MinIO, to prevent it from being deployed unnecessarily:

Once both have been adjusted, Anjana can be deployed normally.

PostgreSQL in RDS

Unlike AWS S3, the kit directly supports RDS given the nature of the connection string, requiring only the modification of the host, and the connection port if necessary, along with the credentials, as shown:

As with AWS S3, it will be necessary to disable the PostgreSQL role to prevent its unintended deployment:

Once both have been adjusted, Anjana can be deployed normally.

Data management

Artifact download

Using the download tag it is possible to download all the artifacts necessary for the deployment execution in an environment without connectivity to the Anjana repository.

Artifact download only works with installation.mode: manager. The artifacts will be placed in /tmp/anjana on the manager node (the node running ansible).

Data operations

The kit provides a series of utilities for safely managing environment data.

The following should be taken into account:

All data operations except delete stop and then start the environment to avoid corruption.
All operations except export/import perform an automatic prior backup of existing data.
No insert or import data operation replaces existing data. It will be necessary to delete all data beforehand for the operation to work.
Delete data operations stop the environment but do not start it again afterwards.

All data operations are performed on the /opt/export-import directory.

The available operations are as follows:

Data export/import, using the tags already mentioned in the Data migration section.
Data deletion: allows deleting all the data from the environment persistences, both individually and collectively (not the configuration) through the following tags:

anjana -t delete
anjana -t delete-s3
anjana -t delete-bbdd
anjana -t delete-solr

Data insertion: allows insertion both collectively and individually, of the data kit selected in the all.yaml file in this section:

The following tags are available:

anjana -t insert
anjana -t insert-s3
anjana -t insert-bbdd

Data reset/restoration: allows deletion and subsequent insertion of the selected data kit in the same way as the previous two points. The available tags are as follows:

anjana -t reset
anjana -t reset-s3
anjana -t reset-bbdd

Dump management and periodicity

The kit allows managing backups in an external bucket and their subsequent restoration, as well as establishing the frequency of backups.

For this purpose, the following functionalities have been defined:

Segregation in the all.yaml file of the anjanabackups bucket, the current bucket designated for backups of Anjana persistences and configuration.

The segregation has the same configurations as the regular S3 section, making it possible to assign different credentials for this purpose, as well as locate the anjanabackups bucket in a different technology.
Usage example: Anjana buckets located on a local MinIO server, anjanabackups bucket located in AWS S3.

Recommended to enable segregation to ensure backups are in a different location from a possible point of failure.

Data + configuration backups: using the anjanabackups bucket as destination. This process will back up all persistence data as well as the microservices configuration, and will be compressed and uploaded to the anjanabackups bucket with a timestamp.
It is possible to enable and define a retention policy for this operation in the persistencesutilityhosts.yaml file of the inventory in use:

To launch the backup in this way, the tag is used:

anjana -t backup-dump

Data restore: using the anjanabackups bucket as data source.
During the backup, all persistences + configuration are backed up, but for restoration it is possible to define from the persistencesutilityhosts.yml file of the inventory in use, which persistences + configuration are to be restored, as can be seen below:

This process will analyze the backups available in the anjanabackups bucket and provide a list of the most recent ones available to select from them which one to restore. If no backup exists, a message will be shown and the process will stop.

To perform the restoration, the tag is launched:

anjana -t restore-dump

Backup cron: it is possible to deploy a cron for backup-dump tasks, defining its frequency through configurable properties in the persistencesutilityhosts.yaml file of the inventory in use:

To deploy the cron or update it, the tag is launched:

anjana -t dump-cron

If it is disabled, the cron will be deleted on the next tag execution.

Security management in Anjana

Certificate management and renewal

From 25.a2 the required certificate is automatically generated by the installation, self-signed and long-lasting.

With each version update of the environment that requires re-platforming, the self-signed certificate is regenerated and therefore updated.

If a public certificate is to be used, it needs to be placed in the /opt/common/anjana-certs/ directory of all instances that make up the Anjana environment.

The certificate WILL NOT be provided by Anjana in the IaaS/PaaS modality, as it requires the creation of a wildcard certificate that covers the domain and subdomains for accessing the environment and will be managed by the infrastructure maintainer.

For a new deployment, if the certificate is public, it must already be present in the mentioned directory before the start of the deployment, for the product and kit to function correctly. Explained in the Anjana Deployment section.

In case of renewal, for the new certificate to be properly communicated between all microservices and persistences, it will be necessary to place them again in the mentioned directory, and launch the following tag:

anjana -t certificates-deploy

To ensure the correct functioning of the application after the certificate update, it will be necessary to restart with:

anjana -t restart

Security updates

Allows enabling or disabling automatic security package and backports updates on all nodes in the environment. The adjustment can be made in all.yaml in this section:

By default they are enabled but can be modified. They apply to all systems supported by the kit. If any modification is made, it will be necessary to launch the following tag to apply the changes:

anjana -t platform

Security in Apache2

It is possible to manage the security configuration of Apache2 and the Anjana frontends; for this, the following settings are available in the anjanauihosts.yaml file of the inventory in use:

anjana_ui.security.IP_redirect: allows redirecting all frontend access via IP to the configured domain.
anjana_ui.security.apache_whitelist: allows enabling and defining through the whitelist property, a list composed of User and IP, which will be used as an exception to all security measures mentioned above.
anjana_ui.security.persistences_whitelist: allows enabling through the enabled property and defining through the whitelist property, a list composed of User and IP as the previous point, which will be used as an exception in the persistence proxies (/minio, /solr).
anjana_ui.security.horus_whitelist: allows enabling through the enabled property and defining through the whitelist property, a list composed of User and IP as the previous point, which will be used as an exception for access to the SpringBoot administration panel.

All security measures except the apache2 whitelist for Anjana frontends (not persistences) are enabled by default to improve the base security of the environment.

Utilities

Connection checkers

The kit includes two connection checkers, which are detailed below:

Artifact repository connection: it is possible to verify the connection against the Anjana artifact server via the following tag:

anjana -t check-repository

Persistence connection: the following tag provides a connection check against the chosen persistences, whether Cloud or hosted in the environment itself:

anjana -t check-connection

Recommended to use both tags for troubleshooting or testing the connection after credential changes, among other scenarios.

Swap

A tag has been incorporated into the kit that allows adding a swap file to extend the available memory by 4GB, it being currently not possible to modify this size via the kit.

To deploy the swap file, it will be necessary to launch the following tag:

anjana -t swap

In order to guarantee environment availability, a swap has been incorporated by default to protect the platform from unexpected memory spikes caused by factors external to the Anjana environment itself.

Anjana Log Rotation

In addition to what was mentioned in the previous section, it is possible to set up log rotation to prevent filling up disk space. This rotation sets a maximum of log files to a total of 2, and each file with a maximum content equal to 2GB.

This rotation is applied by default from kit 25.a1, and it is not possible to disable it through the kit itself.

To configure the rotation on machines that do not have it, it will be necessary to launch the tag:

anjana -t log-rotate

MinIO load balancer

A load balancer has been provided in the anjanauihosts.yml file of the inventory in use, for environments with multiple MinIO nodes. It is enabled by default; to use it, the following is required:

anjana_ui.port.minioProxyPass: to edit the port if necessary.

It will need to be taken into account for the configuration of microservices that use the MinIO service.

anjana_ui.balancer.minio_nodes: to indicate the MinIO nodes to be load balanced. By default the first one is already included, which does not need to be modified, as it is given by the configuration in all.yml. The rest should be adjusted as needed.

Extras

Governance Metrics

Additionally, the Grafana software has been included in the kit, which allows deploying panels for the visualization of metrics related to data governance in Anjana.

The software comes with a sample dashboard.

To deploy Grafana, it will be necessary to enable it in all.yml, import_role section:

And then, launch the tag:

anjana -t grafana

If you want to modify the default dashboard or add more, it will be necessary to access the inventory templates and create the new dashboards in the path /opt/ansible/ansible-inventories/<inventario>/templates/grafana/dashboards/.

The path may not match and will depend on the location of the inventory.

To update the dashboards once placed in the correct path, the tag is launched:

anjana -t update-dashboards