Integrations
Breadcrumbs

Azure Storage

Introduction

This plugin is used to extract, obtain a data sample, grant and revoke accesses (in conjunction with the Entra ID plugin) on assets in Azure Storage storage accounts.

Integration Model

Metadata extraction

Using the libraries provided by Azure, it authenticates to the storage account containing the container to be governed.

Once the connection is established, the container in question is traversed to generate a tree representing all its contents.

For the extraction of metadata of a given object, the same connection and tools provided by the Azure library are used to read the metadata that will subsequently be sent to Anjana to create the object.

The following attributes are those that the plugin will extract as metadata.

If you want the object in Anjana to have these attributes, the corresponding field must be created in the attribute_definition table with the same name as in the document and be present in the object template.

The set of attributes in the extracted object is optional; except for schema, physicalName, path, infrastructure, technology, and zone in the case where the extracted object is to be governed by Anjana.

  • schema with the value of the Azure container.

  • physicalName and name with the same value, the name of the file within Azure.

  • path with the path and name of the resource if it is a file.

  • infrastructure with the selected value

  • technology with the selected value

  • zone with the selected value

  • creationTime with the file creation date

  • lastModified with the date of the last modification of the file

  • eTag with the eTag of the file

  • fileSize with the size of the file in bytes

  • contentType with the type of the file

  • contentMd5 with the md5 of the file content

  • contentEncoding with the encoding of the file

  • contentDisposition with the disposition of the file

  • contentLanguage with the language of the file

  • cacheControl with the cache control value of the file

  • leaseStatus with the lock status of the file, can be

    • LOCKED: The file is locked by a service operation.

    • UNLOCKED: The file has no lock.

  • leaseState with the lock state of the file, can be

    • AVAILABLE: No active lock exists.

    • BREAKING: Transition towards BROKEN.

    • BROKEN: Not being used by another service but will be released when the lock expires. Renewing the lock is not allowed.

    • EXPIRED: The previous lock has expired and can be locked again by another service or renewed (in the case where it has not been broken before).

    • LEASED: Locked by another service.

  • leaseDuration with the type of lock the file has, can be

    • FIXED: The current lock has a maximum time

    • INDEFINITE: The current lock has no maximum time; it must be removed manually via a release or break request.

  • copyId with the identifier of the last copy operation on the file (if no copy has ever been performed or the file has been modified, this property has no value)

  • copyStatus with the status of the last copy operation on the file (if no copy has ever been performed or the file has been modified, this property has no value), can be

    • PENDING: An operation is in progress

    • SUCCESS: The last operation completed successfully

    • ABORTED: The last operation was aborted

    • FAILED: The last operation failed.

  • copySource with the path of the source of the last copy operation on the file (if no copy has ever been performed or the file has been modified, this property has no value)

  • copyProgress with the number of bytes copied and total bytes from the source of the last copy operation on the file (if no copy has ever been performed or the file has been modified, this property has no value)

  • copyCompletionTime with the date of the last time a copy operation was performed on the file (if no copy has ever been performed or the file has been modified, this property has no value)

  • copyStatusDescription with the description of the last copy operation on the file if it was aborted or failed (if no copy has ever been performed or the file has been modified, this property has no value)

  • isServerEncrypted indicating whether the file is encrypted

  • isIncrementalCopy indicating whether the file is an incremental copy

  • accessTier with the type of access tier of the file, can be

    • ARCHIVE: The file cannot be read or modified.

    • COOL: The file is expected not to be read or modified frequently.

    • HOT: The file is expected to be read or modified frequently.

    • P4, P6, P10, P15, P20, P30, P40, P50, P60, P70 or P80: These options refer to cases where the medium on which the files are stored is not flexible and has fixed values for both storable capacity and speeds and number of accesses. In some cases they have temporary expansions in case their access is exceeded.

  • archiveStatus with the rehydration status of the file (only applies if the file is at the ARCHIVE access tier and needs to be made accessible again), can be

    • REHYDRATE_PENDING_TO_HOT: In the process of going from ARCHIVE to HOT

    • REHYDRATE_PENDING_TO_COOL: In the process of going from ARCHIVE to COOL

  • encryptionKeySha256 with the key used to encrypt the file

  • accessTierChangeTime with the date when the access tier was last modified

  • isDirectory indicating whether the file is a directory (can only be true in cases of partitioned files such as avro or parquet)

In addition to these values present in every Azure Storage file, extra properties can be added; all included properties will be collected and extracted.

It will also send attributes related to the fields of the requested resource, always depending on the content and type of the resource. For more information https://wiki.anjanadata.com/es/integraciones/25.2/extraccion-de-metadata-de-ficheros.

Data sampling

Using the libraries provided by Azure, it authenticates to the storage account containing the container to be governed.

Once the connection is established, the object to be sampled is located, its content is read (up to the maximum configured number of results) using Apache libraries according to the file type, and the results are returned.

Active governance

Using the libraries provided by Azure, it authenticates to the storage account containing the container to be governed.

The object where permissions are to be manipulated is located, and using ACL manipulation tools, the necessary permissions are added to the group on the objects (Read, Execute at all levels from the root to the file representing the dataset, and in the case of a partitioned file, to all files present at that time) when included in a DSA. Additionally, the ACL of a group is removed when it expires or when the object itself expires.

The plugin only grants access to files or directories within the blobs of the storage account it governs. Access to the storage account for users must be done manually. It is recommended to give users the Reader role on the storage account (this will allow them to see the names of the blobs but not access their content).


Object editing

Using the libraries provided by Azure, it authenticates to the storage account containing the container to be governed; in this case, to manage the activation or deactivation of non-native entities.

When a non-native entity is activated, the necessary permissions will be granted to the group on the objects (Read, Execute at all levels from the root to the file representing the dataset, and in the case of a partitioned file, to all files present at that time); and when the object is deactivated, the ACLs of the group will be removed when it expires or when the object itself expires.

The plugin only grants access to files or directories within the blobs of the storage account it governs. Access to the storage account for users must be done manually. It is recommended to give users the Reader role on the storage account (this will allow them to see the names of the blobs but not access their content).

Required credentials

It is necessary to register an application in Entra ID and generate the necessary clientID and secret so that the plugin can authenticate and acquire the necessary permissions for each functionality.


The plugin is capable of handling only one storage account; therefore, it is necessary to create one instance for each storage account to be managed.

att_1_for_171868322.png


Metadata extraction and data sampling

Read permissions are required on the different storage type categories as well as the general configuration of the storage account itself.

  • Reader: Allows reading all resources but not making any changes.

  • Storage Blob Data Reader: Read access to an Azure Storage Blob Container.

  • Storage File Data SMB Share Reader: Read access to Azure File Share using SMB.

  • Storage Table Data Reader: Read access for Azure Storage tables and entities.

att_4_for_171868322.png

Active access governance and structures

  • User Access Administrator: Management capability over user access to Azure resources.

  • Storage Blob Data Owner: Full access to Azure Storage Blob Containers, the data they contain, and access to them. Necessary to modify ACLs on governed files.

Object editing

  • User Access Administrator: Management capability over user access to Azure resources.

  • Storage Blob Data Owner: Full access to Azure Storage Blob Containers, the data they contain, and access to them. Necessary to modify ACLs on governed files.


att_3_for_171868322.png

Limitations

The maximum number of effective ACLs on a file or directory is 28.

In practical terms, this means that at most a blob can be governed by up to 28 DSAs, provided no other system applies ACLs to that blob.

When granting permissions to partitioned files, if the files in those partitions change or more are added, they will not contain the permissions that other parts of the same file have.

Given the strict technology limitations, it is recommended to deprecate and expire the DSA as soon as possible once all its governed objects have expired, in order to clean up unused ACLs.