General Configuration

Ngenea HSM requires the following components to correctly function:

  • a modified GPFS file system policy
  • a master Ngenea HSM configuration file with at least one storage target
  • a configuration file for each storage target
  • active accounts with the appropriate cloud storage providers

The following sections describe how to apply the above prerequisites.

GPFS/Spectrum Scale Configuration

Transparent Recall file system policy

In order to enable transparent recall on a file system, insert the transparent recall rules provided by ArcaStream / Pixit Media at the beginning of the current file system policy.

Note that if no current policy is in place, you will also need to include a default placement policy rule at the end of the policy, ensuring that new files are written to the correct GPFS storage pool. This is not required if you only have a system GPFS storage pool. For example:

/* By default place all data on the sata1 pool */
RULE 'default' SET POOL 'sas1'

Ensure that any future alterations to the GPFS file system policy do not replace, modify or precede any of the Ngenea HSM policy.

Cluster Configuration

In order to provide the maximum number of concurrent transparent recall and migration threads, set the following GPFS cluster configuration parameters on any node which is expected to issue transparent recall requests.

policySystemEvalLimit 64
dmapiWorkerThreads 64

Also, ensure that workerThreads is set to greater than 64:

workerThreads 128

Master Configuration File

The master configuration file defines the available storage targets, and directs migration and recall requests to the appropriate target.

The default location for Ngenea HSM configuration files is within /opt/arcapix/etc/, and the default location for the master configuration file is /opt/arcapix/etc/ngenea.conf.

Any storage configuration files are relative to the location of the master configuration file.

Below is an example configuration file with two storage targets defined.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[Storage bpearl1]
StorageType=BlackPearl
ConfigFile=blackpearl1.conf
LocalFileRegex=/mmfs1/(archive/.+)
RemoteLocationXAttrRegex=blackpearl:(.+)

[Storage awss3]
StorageType=AmazonS3
ConfigFile=awss3.conf
LocalFileRegex=/mmfs1/(active/.+)
RemoteLocationXAttrRegex=awss3:(.+)

Section Keywords

Keyword
[General]
[Storage <storage target name>]
[FileMatch <file match set>]

Section [General]

This optional section specifies general parameters for all storage endpoints.

Section [General] Keywords

Keyword Default Required
MinMigrateEndpointCount 0 No
MaxJobPartFileCount 512 No
MaxTransparentRecallLockCount 19 No

MinMigrateEndpointCount

Specifies the minimum number of successful migrations required in order to report success for a migration to multiple storage endpoints. See Multi-Target Support for an overview of multi-target configuration and operation.

The default value is 0.

Syntax:

MinMigrateEndpointCount=<Minimum number of successful migrations>

Example:

MinMigrateEndpointCount=3

MaxJobPartFileCount

Specifies the maximum number of local files to migrate, recall, or delete remote objects for in one batch. If there are more local files to process, ngmigrate or ngrecall automatically creates subsequent batches.

For file recall and remote object deletion operations, ngrecall automatically adjusts the value of this parameter to not exceed the value of "MaxTransparentRecallLockCount" parameter.

The default value is 512.

Syntax:

MaxJobPartFileCount=<Maximum number of files per batch>

Example:

MaxJobPartFileCount=512

MaxTransparentRecallLockCount

Specifies the maximum number of concurrent locks put on distinct local files accessed via DMAPI in file recall and remote object deletion operations.

If ngrecall exceeds that maximum number when trying to lock a local file, ngrecall terminates with an error.

To minimize the number of such errors, ngrecall automatically adjusts the maximum number of local files to recall or delete remote objects for in one batch specified by the "MaxJobPartFileCount" parameter to not exceed the value of "MaxTransparentRecallLockCount" parameter.

The default value is 19.

Syntax:

MaxTransparentRecallLockCount=<Maximum number of concurrent locks>

Example:

MaxTransparentRecallLockCount=10

Section [Storage <storage target name>]

The Storage section defines how a storage target will be used for migrating and recalling data. Each storage target definition includes a name, type, scope of operation and a reference to the Storage Target configuration file.

Syntax:

[Storage <storage target name>]

Example:

[Storage ngenea-target1]

Section [Storage <storage target name>] Keywords

Keyword Default Required
ConfigFile Null No
LocalFileRegex ".+" Yes
RemoteLocationXAttrRegex None Yes
StorageType None Yes
StorageKey "lc" No

ConfigFile

Defines the location of the associated configuration file for the Storage entry, either specified as the full path and filename, or path and filename relative to the location of this master configuration file.

Syntax:

ConfigFile=<filename>

Example:

ConfigFile=/opt/arcapix/etc/ngenea-target1.conf

LocalFileRegex

Defines the pathname match, in regex, used to determine which filesystem paths will trigger migrations to this storage target. It also controls the generation of the file path used in the remote storage.

Syntax:

LocalFileRegex=/<mountpoint>/([path filter].+)

The following example will cause this storage target to be used as a migration target for all files on file system /mmfs1/, and store the files in remote storage with their full path minus the /mmfs1/ component:

Example: all filesystem objects can be migrated to this target

LocalFileRegex=/mmfs1/(.+)

The following example will cause this storage target to be used as a migration target for all files underneath /mmfs1/data/fileset1/, and store the files in remote storage with the path data/fileset1/path/to/file:

Example: targeted migration

LocalFileRegex=/mmfs1/(data/fileset1/.+)

RemoteLocationXAttrRegex

Defines the APXrmtlc extended attribute regular expression match. This extended attribute is added to a file upon migration. Any file with the APXrmtlc extended attribute matching this regex will be downloaded from this target when issuing ngrecall:

Syntax:

RemoteLocationXAttrRegex=<identifier>:(.+)

Example:

RemoteLocationXAttrRegex=ngenea-target1:(.+)

Note that the regular expressions specified by LocalFileRegex and RemoteLocationXAttrRegex match to the beginning of the text. The regular expressions behave as if they are prefixed by the caret anchor ^ (match beginning of text).

StorageType

Defines the supported mode of storage operation. Available options are:

  • BlackPearl - SpectraLogic Black Pearl Spectra S3 target
  • AmazonS3 - Amazon S3, or compatible, protocol target
  • Azure - Microsoft Azure blob storage target
  • Google - Google Cloud storage target
  • FS - POSIX file system target

Syntax:

StorageType=<AmazonS3|BlackPearl|Azure|Google|FS>

Example:

StorageType=AmazonS3

StorageKey

Defines a custom identifier attributable to the storage endpoint.

See the description of this keyword in Multi-Target Support.

Syntax

StorageKey=<char><char>

Examples

StorageKey=zz

Section [FileMatch <file match set>]

These sections specify stub sizes for various names and paths of local files.

FileMatch Section Keywords

Keyword Default Required
BasenameMask Null No
PathnameRegex ".+" No
StubSize -1 (ie premigrate) No

These sections specify stub sizes for various names and paths of local files.

The program ngmigrate applies those stub sizes to local files when migrating them. The program ngrecall applies those stub sizes to local files when reversely stubbing their remote objects.

The header of every file match section must contain the name of a file match set.

Syntax:

[FileMatch <file match set>]

Example: A file match section describing text files might have the header

[FileMatch text]

The names of all file match sets described in a master configuration file must be different.

The programs ngmigrate and ngrecall process file match sections in the order they are present in a master configuration file. If a section matches a local file, ngmigrate or ngrecall applies the section to the file and does not process subsequent file match sections for that file.

The last file match section described in a master configuration file might specify the default stub size to be applied to a local file when no previous sections matched the file. For example, such default file match section might look like this:

Example: A default file match section, specifying a StubSize of 0 bytes

[FileMatch default]
; Use zero stub size for all other files.
StubSize=0

Every file match section can contain the parameters "BasenameMask", "PathnameRegex", and "StubSize". The parameters "BasenameMask" and "PathnameRegex" specify conditions joined by logical AND.

BasenameMask

Specifies a glob pattern for the base name part of a matched file name.

Separate with '|' (the pipe character) multiple glob patterns joined by logical OR. Example:

Syntax:

BasenameMask=<glob pattern>

If this parameter is absent, all file names will be matched.

Example:

BasenameMask=*.wav|*.mp3|*.ogg

PathnameRegex

Specifies a regular expression for matching against a resolved absolute file name.

Syntax:

PathnameRegex=<regular expression>

If this parameter is absent, all paths will be matched.

Example:

PathnameRegex=.*/images/.*

StubSize

Specifies the stub size in bytes.

If the stub size exceeds the size of a file, or if the stub size is -1, ngmigrate or ngrecall will premigrate the file.

Syntax:

StubSize=<stub size in bytes (or -1)>

The default value is -1; i.e. the file will be premigrated.

Example:

StubSize=1024

Cloud Storage Requirements

The cloud storage providers have specific requirements for configuring access accounts and providing authentication. These requirements are described below for each storage provider.

Amazon S3 and S3 Compatible Storage

An administrator should create a blob storage account through the Azure Portal. A bucket should also be created. The user's authentication details and bucket name must be added to the Amazon S3 configuration file.

The user account should be configured to allow the following permissions for the resource identified by the bucket name:

  • s3:DeleteObject
  • s3:ListBucket
  • s3:GetObject
  • s3:PutObject

The bucket should be configured to allow multipart upload - see Multipart Upload API and Permissions

Microsoft Azure Storage

A storage account and container should be created through the Azure Portal. The account name, account key, and container name will be copied to the Azure configuration file.

Google Cloud Storage

An administrator should create a service account, and enable the Compute Engine API. See the Google Dashboard.

The keys should be downloaded to the local system, and accessed via the Google configuration file.