Integrations
Breadcrumbs

AWS S3

Introduction

This plugin is used to extract, sample data, grant and revoke access (in conjunction with the AWS IAM plugin) on assets in AWS S3.

Integration model

Metadata extraction

To list the available structures, the list of accessible buckets and all objects are retrieved with the aim of returning a content map. The behavior is slightly different if a bucket is configured in the YML configuration file since, in that case, the listing is limited to the contents of that bucket.


The value returned in the structure listing is an object with the structure name (which is a concatenation of the infrastructure, the technology, and the zone) and a list of objects found within it. In this case the structure corresponds to the bucket and the object corresponds to the object in Amazon S3.

For the extraction of an object's metadata, the same tools are used; the object's content is read to extract its metadata and the result is returned.

After executing the metadata extraction, an object is obtained with a name (elementName), a list of key-value attributes (such as infrastructure, technology and zone) and also a list of fields (fields) associated with it, also with their list of key-value attributes.


For a successful metadata extraction, the attributes must have the same names in the attribute_definition table for the name field in order to appear on screen:

  • physicalName with the name with the same value as the object

  • path with the concatenation of the object values in Amazon S3

  • infrastructure with the selected value

  • technology with the selected value

  • zone with the selected value


The attributes to be created in Anjana must have the following types:

Attribute name

Attribute type

physicalName

INPUT_TEXT

path

INPUT_TEXT

infrastructure

SELECT

technology

SELECT

zone

SELECT

name

INPUT_TEXT



It will also send attributes related to the fields of the requested resource, always depending on the content and type of the resource. For more information https://wiki.anjanadata.com/es/integraciones/25.2/extraccion-de-metadata-de-ficheros.

Data sampling

For data sampling, the object to be sampled is located (up to the configured maximum number of results), the content of its files is read using Apache libraries according to their type and the results are returned.


The value returned in the data sampling is an object that contains headers (headers) and values (values).

The headers (headers) include the attribute names of the objects that should be returned after data sampling.

The values (values) include the list of values for each header for each object to be returned. This will allow the plugin to return, in addition to attributes such as the file name or file content, the rest of the data and metadata.

Active governance of structures

In the S3 protocol, paths are emulated, which means it is not possible to pre-provision these elements.

Active governance of access

Access management for this technology is performed directly in AWS IAM, which is why this type of action is delegated to the plugin for that technology, making the presence of this plugin essential in order to have the functionality available.

Object editing

Object editing for this technology is performed directly in AWS IAM, which is why this type of action is delegated to the plugin for that technology, making the presence of this plugin essential in order to have the functionality available.


Required credentials

Metadata extraction

If you wish to perform metadata extraction actions, a connection to Amazon S3 is required. To establish this connection, an accessKey and a secretKey of the account provided to Anjana to manage data governance are required. Optionally, a proxy is required if the user does not want a direct connection to Amazon S3.

From the Amazon S3 data, the region where it is located is needed, and optionally, if only a single bucket is to be governed, the desired bucket (if no bucket is included, all buckets for which the account has permissions are governed).

For its part, Amazon S3 defines a set of actions1 that can be specified in a policy and that will help the user with access to the technology to obtain the desired information.


For this plugin, the following actions are of interest:

  • s3:ListAllMyBuckets to list all buckets of the authenticated user.

  • s3:ListBucket to list the contents of a bucket.

  • s3:GetBucketLocation to return the region where the bucket resides.

  • s3:GetObject to return objects from Amazon S3. To be able to read the object, you must also have read permissions on it.


att_4_for_171933864.png

In case the Amazon S3 bucket uses encryption with KMS-managed keys (SSE-KMS), additional permissions must be granted to allow decryption of the data key associated with the object and to decrypt it.

  • kms:Decrypt → to be able to decrypt the data keys and read content.

  • kms:DescribeKey → to validate the key.


Data sampling

To trigger actions related to data sampling, the same configuration and credentials mentioned above for data extraction are required.


Active governance of structures

In the S3 protocol, paths are emulated, which means it is not possible to pre-provision these elements.


Active governance of access

Access management for this technology is performed directly in AWS IAM, which is why this type of action is delegated to the plugin for that technology, making the presence of the latter essential in order to have the functionality available.


Object editing

Object editing for this technology is performed directly in AWS IAM, which is why this type of action is delegated to the plugin for that technology, making the presence of the latter essential in order to have the functionality available.

File name limitations

There are certain restrictions in AWS on file names for everything to work correctly. The safe characters to use are the following:

att_1_for_171933864.png


There are also characters that may require special handling, although it is recommended not to use them to avoid issues:

att_2_for_171933864.png

And finally there are characters to avoid that are not supported by AWS https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

att_3_for_171933864.png

For more information, refer to the AWS documentation.

1 Allowed actions in Amazon S3: https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations_Amazon_Simple_Storage_Service.html