Ingesting

Ingesting files from external storage

A particularly powerful feature of Ngenea is the ability to ingest existing data into a PixStor file system by "reverse stubbing". This process creates a migrated file stub on the file system which points to any file on a defined external storage target. The file is then immediately accessible via the PixStor file system as if it had been natively created and then migrated via Ngenea.

In this way, it is possible to rapidly and efficiently migrate any existing data into a PixStor file system, without requiring a wholesale copy or move of data. Only metadata records need to be created prior to beginning use of the data via the PixStor file system. Once this initial metadata creation is complete, data will automatically migrate to the file system on access, and can also be brought across as a background process.

Note

If a storage endpoint is not a PixStor filesystem, ngrecall may change the access time of files it is accessing in the storage endpoint while creating reverse stubs for them.

The process for ingesting existing data holdings will vary based on requirements, but the process will typically consist of:

  1. define the Ngenea configuration for the external storage

  2. generate a list of file/object paths to be ingested

  3. create any required directories

  4. create reverse stubs inside the directories with the command ngrecall --stub

Example - ingesting existing NFS storage

In this example, an existing storage system is mounted via NFS at /mnt/legacy on the Ngenea node(s).

All data will be ingested into the legacy/ folder on the PixStor file system at /mmfs1/.

The goal is to eventually move all data from the legacy system into the /mmfs1 file system.

Simultaneously, the /mmfs1 file system will be enabled with Ngenea to migrate data to an S3 storage target, as per a standard Ngenea deployment.

Master configuration files

Here, the external storage is coupled to the /mmfs1/legacy path. Since a different target is being used for subsequent migrations, this is created as a dedicated configuration file for the ingest (/opt/arcapix/etc/ngenea-ingest.conf). The default configuration file (/opt/arcapix/etc/ngenea.conf) defines how to recall data from the target as well as setting the default migration (and recall) target(s) for subsequently migrated data.

/opt/arcapix/etc/ngenea-ingest.conf

This specifies the location of the configuration file for the legacy storage (legacy_nfs.conf), and assigns it as the target to be used for files under the /mmfs1/legacy path, with the relative path set underneath /mmfs1/legacy. For example, a file with path /mmfs1/legacy/folder1/file1 would be mapped to a file on the target storage at /folder1/file1, relative to its root - i.e. /mnt/legacy/folder1/file1 in this scenario.

[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/mmfs1/legacy/(.+)

/opt/arcapix/etc/ngenea.conf

This specifies the configuration file to be used where file data blocks are stored in the legacy_nfs target, and also an AWS object storage target which will be the default used for subsequent migrations for the whole /mmfs1 file system.

The use of READONLY as the LocalFileRegex effectively disables any use of the legacy NFS storage for standard data migrations.

[Storage aws_bucket1]
StorageType=AmazonS3
ConfigFile=/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf
RemoteLocationXAttrRegex=aws_bucket1:(.+)
LocalFileRegex=/mmfs1/(.+)
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/READONLY(.+)

Storage Target configuration files

/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf

This specifies that the legacy storage is mounted at /mnt/legacy. It also enables DeleteOnRecall, as the goal is to move all data off the legacy storage over time.

If it were to be used ongoing as a migration target, DeleteOnRecall would typically be set to False.

[General]
RemoteLocationXAttr=legacy_nfs:$1
RetrieveObjectBasePath=/mnt/legacy
RetrieveObjectName=$1
StoreObjectBasePath=/mnt/legacy
StoreObjectName=$1
EnsureMountPoint=/mnt/legacy
DeleteOnRecall=True
ObjectXAttrManipulationMode=auto

/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf

Note that we use DeleteOnRecall=False here, as this will be the general purpose migration target, and we wish to make use of premigration functionality.

[General]
AccessKeyId=ACCESSKEYID
SecretAccessKey=SECRETACCESSKEY
Bucket=my_ngenea_bucket
Region=eu-west-2
Scheme=HTTPS
SSLVerify=True
RemoteLocationXAttr=aws_bucket1:$1
RetrieveObjectName=$1
StoreObjectName=$1
DeleteOnRecall=False

Ingest command

The following ngrecall command will:

  1. scan the legacy storage;

  2. create any required directories;

  3. create reverse stubs for all files contained;

  4. print the names of created stub files.

It is safe to re-run it multiple times - it will skip past any files which already exist.

Note

Storage existing prior to Ngenea operations does not contain files with UUID suffixes created by ngmigrate. In this case, ingestion speed can be substantially increased by additionally passing the option --all-obj-instances to ngrecall. That option disables scanning files that differ in UUID suffixes to ingest only most recently migrated files.

The command can be executed on a sub-folder basis to perform a selective ingest. For example, to process only the folder /mnt/legacy/folder1:

ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs:folder1

Alternatively, to ingest the whole file system:

ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs