Ingesting¶
Ingesting files from external storage¶
A particularly powerful feature of Ngenea is the ability to ingest existing data into a PixStor file system by "reverse stubbing". This process creates a migrated file stub on the file system which points to any file on a defined external storage target. The file is then immediately accessible via the PixStor file system as if it had been natively created and then migrated via Ngenea.
In this way, it is possible to rapidly and efficiently migrate any existing data into a PixStor file system, without requiring a wholesale copy or move of data. Only metadata records need to be created prior to beginning use of the data via the PixStor file system. Once this initial metadata creation is complete, data will automatically migrate to the file system on access, and can also be brought across as a background process.
Note
If a storage endpoint is not a PixStor filesystem, ngrecall may change the access time of files it is accessing in the storage endpoint while creating reverse stubs for them.
The process for ingesting existing data holdings will vary based on requirements, but the process will typically consist of:
define the Ngenea configuration for the external storage
generate a list of file/object paths to be ingested
create any required directories
create reverse stubs inside the directories with the command
ngrecall --stub
Example - ingesting existing NFS storage¶
In this example, an existing storage system is mounted via NFS at /mnt/legacy
on the Ngenea node(s).
All data will be ingested into the legacy/
folder on the PixStor file system at /mmfs1/
.
The goal is to eventually move all data from the legacy system into the /mmfs1
file system.
Simultaneously, the /mmfs1
file system will be enabled with Ngenea to migrate data to an S3
storage target, as per a standard Ngenea deployment.
Master configuration files¶
Here, the external storage is coupled to the /mmfs1/legacy
path. Since a different target
is being used for subsequent migrations, this is created as a dedicated configuration
file for the ingest (/opt/arcapix/etc/ngenea-ingest.conf
).
The default configuration file (/opt/arcapix/etc/ngenea.conf
) defines how to recall
data from the target as well as setting the default migration (and recall) target(s) for
subsequently migrated data.
/opt/arcapix/etc/ngenea-ingest.conf¶
This specifies the location of the configuration file for the legacy storage (legacy_nfs.conf
), and
assigns it as the target to be used for files under the /mmfs1/legacy
path,
with the relative path set underneath /mmfs1/legacy
. For example, a file
with path /mmfs1/legacy/folder1/file1
would be mapped to a file on the target
storage at /folder1/file1
, relative to its root - i.e. /mnt/legacy/folder1/file1
in this scenario.
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/mmfs1/legacy/(.+)
/opt/arcapix/etc/ngenea.conf¶
This specifies the configuration file to be used where file data blocks are
stored in the legacy_nfs
target, and also an AWS object storage target which
will be the default used for subsequent migrations for the whole /mmfs1
file system.
The use of READONLY
as the LocalFileRegex
effectively disables any use of the
legacy NFS storage for standard data migrations.
[Storage aws_bucket1]
StorageType=AmazonS3
ConfigFile=/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf
RemoteLocationXAttrRegex=aws_bucket1:(.+)
LocalFileRegex=/mmfs1/(.+)
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/READONLY(.+)
Storage Target configuration files¶
/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf¶
This specifies that the legacy storage is mounted at /mnt/legacy
. It also
enables DeleteOnRecall
, as the goal is to move all data off the legacy
storage over time.
If it were to be used ongoing as a migration target, DeleteOnRecall
would
typically be set to False
.
[General]
RemoteLocationXAttr=legacy_nfs:$1
RetrieveObjectBasePath=/mnt/legacy
RetrieveObjectName=$1
StoreObjectBasePath=/mnt/legacy
StoreObjectName=$1
EnsureMountPoint=/mnt/legacy
DeleteOnRecall=True
ObjectXAttrManipulationMode=auto
/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf¶
Note that we use DeleteOnRecall=False
here, as this will be the general
purpose migration target, and we wish to make use of premigration
functionality.
[General]
AccessKeyId=ACCESSKEYID
SecretAccessKey=SECRETACCESSKEY
Bucket=my_ngenea_bucket
Region=eu-west-2
Scheme=HTTPS
SSLVerify=True
RemoteLocationXAttr=aws_bucket1:$1
RetrieveObjectName=$1
StoreObjectName=$1
DeleteOnRecall=False
Ingest command¶
The following ngrecall command will:
scan the legacy storage;
create any required directories;
create reverse stubs for all files contained;
print the names of created stub files.
It is safe to re-run it multiple times - it will skip past any files which already exist.
Note
Storage existing prior to Ngenea operations does not contain files with UUID suffixes created by ngmigrate.
In this case, ingestion speed can be substantially increased by additionally passing the option --all-obj-instances
to ngrecall.
That option disables scanning files that differ in UUID suffixes to ingest only most recently migrated files.
The command can be executed on a sub-folder basis to perform a selective ingest.
For example, to process only the folder /mnt/legacy/folder1
:
ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs:folder1
Alternatively, to ingest the whole file system:
ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs