General Configuration

ArcaPix Ngenea HSM requires the following components to correctly function:

  • a modified PixStor file system policy

  • a master ArcaPix Ngenea HSM configuration file with at least one storage target

  • a configuration file for each storage target

  • active accounts with the appropriate cloud storage providers

The following sections describe how to apply the above prerequisites.

PixStor Scale Configuration

Transparent Recall file system policy

In order to enable transparent recall on a file system, insert the transparent recall rules provided by ArcaStream / Pixit Media at the beginning of the current file system policy.

Note that if no current policy is in place, you will also need to include a default placement policy rule at the end of the policy, ensuring that new files are written to the correct PixStor storage pool. This is not required if you only have a system PixStor storage pool. For example:

/* By default place all data on the sata1 pool */
RULE 'default' SET POOL 'sas1'

Ensure that any future alterations to the GPFS file system policy do not replace, modify or precede any of the ArcaPix Ngenea HSM policy.

Cluster Configuration

In order to provide the maximum number of concurrent transparent recall and migration threads, set the following GPFS cluster configuration parameters on any node which is expected to issue transparent recall requests.

policySystemEvalLimit 64
dmapiWorkerThreads 64

Also, ensure that workerThreads is set to greater than 64:

workerThreads 128

Master Configuration File

The master configuration file defines the available storage targets, and directs migration and recall requests to the appropriate target.

The default location for ArcaPix Ngenea HSM configuration files is within /opt/arcapix/etc/, and the default location for the master configuration file is /opt/arcapix/etc/ngenea.conf.

Any storage configuration files are relative to the location of the master configuration file.

Configuration files should not be world-readable. This applies to the master configuration file and any files referenced by the master configuration file.

Below is an example configuration file with two storage targets defined. It contains the optional [General] section and optional [FileMatch ...] sections.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[General]
MinMigrateEndpointCount=1

[Storage bpearl1]
StorageType=BlackPearl
ConfigFile=blackpearl1.conf
LocalFileRegex=/mmfs1/(archive/.+)
RemoteLocationXAttrRegex=blackpearl:(.+)

[Storage awss3]
StorageType=AmazonS3
ConfigFile=awss3.conf
LocalFileRegex=/mmfs1/(active/.+)
RemoteLocationXAttrRegex=awss3:(.+)

[FileMatch text]
BasenameMask=*.txt
StubSize=-1

[FileMatch video]
PathnameRegex=.*/avi/
StubSize=16777216

[FileMatch default]
StubSize=0

Section Keywords

Keyword

[General]

[Storage <storage target name>]

[FileMatch <file match set>]

Section [General]

This optional section specifies general parameters for all storage endpoints.

Section [General] Keywords

Keyword

Default

Required

MinMigrateEndpointCount

0

No

MaxJobPartFileCount

512

No

MaxTransparentRecallLockCount

19

No

LockLevel

partial

No

WatchdogTimeoutSeconds

None

No

MinMigrateEndpointCount

Specifies the minimum number of successful migrations required in order to report success for a migration to multiple storage endpoints. See Multi-Target Support for an overview of multi-target configuration and operation.

The default value is 0.

Syntax:

MinMigrateEndpointCount=<Minimum number of successful migrations>

Example:

MinMigrateEndpointCount=3

MaxJobPartFileCount

Specifies the maximum number of local files to migrate, recall, or delete remote objects for in one batch. If there are more local files to process, ngmigrate or ngrecall automatically creates subsequent batches.

For file recall and remote object deletion operations, ngrecall automatically adjusts the value of this parameter to not exceed the value of "MaxTransparentRecallLockCount" parameter.

The default value is 512.

Syntax:

MaxJobPartFileCount=<Maximum number of files per batch>

Example:

MaxJobPartFileCount=512

MaxTransparentRecallLockCount

Specifies the maximum number of concurrent locks put on distinct local files accessed via DMAPI in file recall and remote object deletion operations.

If ngrecall exceeds that maximum number when trying to lock a local file, ngrecall terminates with an error.

To minimize the number of such errors, ngrecall automatically adjusts the maximum number of local files to recall or delete remote objects for in one batch specified by the "MaxJobPartFileCount" parameter to not exceed the value of "MaxTransparentRecallLockCount" parameter.

The default value is 19.

Syntax:

MaxTransparentRecallLockCount=<Maximum number of concurrent locks>

Example:

MaxTransparentRecallLockCount=10

LockLevel

Specifies a DMAPI locking level for file I/O operations. The option --lock-level=partial|implicit of ngmigrate and ngrecall overrides the value of this parameter.

Partial (default)

In this mode, ngmigrate obtains a SHARED access right and uses it for file read operations while uploading a file. This access right protects the file from modifying by other programs while it is being migrated. The protection requires the Transparent Recall file system policy to be installed to the filesystem placement policy. The ngmigrate tool can also obtain EXCLUSIVE access rights for short time periods to perform write operations on a file, for example, to update file attributes or punch a hole into the file.

The ngrecall tool obtains EXCLUSIVE access rights for the duration of file write operations while downloading a file. In this mode, ngrecall can also obtain SHARED or EXCLUSIVE access rights to perform other operations on the file.

Using partial mode, file system snapshot operations will block until all ngenea operations are completed.

Implicit

In this mode, ngmigrate and ngrecall do not obtain SHARED and EXCLUSIVE access rights explicitly. DMAPI file I/O functions implicitly lock a file at the beginning of an I/O operation and unlock the file at the end of the I/O operation. The minimum possible locking is achieved when using this locking mode. It is recommended to utilise implicit locking mode in situations whereby ngenea operations interfere with file system snapshot operations.

Syntax:

LockLevel=partial|implicit

Example:

LockLevel=implicit

WatchdogTimeoutSeconds

Defines the number of seconds after which an ngmigrate or ngrecall process will terminate with exit status 1 if none of monitored operations are progressed within the timeout period. Operations are monitored per processed data block of a file. Successfully progressing an operation or completing a data block resets the watchdog timer to 0.

The monitored operations are:

  • high-level operations of communicating with a storage endpoint

  • file I/O when uploading or downloading data to/from a storage endpoint

  • file I/O when calculating file hashes

  • syncing file I/O buffers to the physical medium

It is not required to set this configuration parameter for successful operation. The parameter might be applied if ngmigrate or ngrecall operations consistently fail to complete and/or are observed to be in indefinite exponential backoff due to utilising links observed to be oversubscribed or demonstrating significant jitter or packet loss.

Values less than 300 are not recommended.

Syntax:

WatchdogTimeoutSeconds=<Timeout period in seconds>

Example:

WatchdogTimeoutSeconds=600

Section [Storage <storage target name>]

The Storage section defines how a storage target will be used for migrating and recalling data. Each storage target definition includes a name, type, scope of operation and a reference to the Storage Target configuration file.

Syntax:

[Storage <storage target name>]

Example:

[Storage target1]

Section [Storage <storage target name>] Keywords

Keyword

Default

Required

ConfigFile

Null

No

LocalFileRegex

".+"

Yes

LocalSymlinkTargets

"endpoint"

No

RemoteLocationXAttrRegex

None

Yes

StorageType

None

Yes

StorageKey

"lc"

No

ConfigFile

Defines the location of the associated configuration file for the Storage entry, either specified as the full path and filename, or path and filename relative to the location of this master configuration file.

Syntax:

ConfigFile=<filename>

Example:

ConfigFile=/opt/arcapix/etc/target1.conf

LocalFileRegex

Defines the pathname match, in regex, used to determine which filesystem paths will trigger migrations to this storage target. It also controls the generation of the file path used in the remote storage.

Syntax:

LocalFileRegex=/<mountpoint>/([path filter].+)

The following example will cause this storage target to be used as a migration target for all files on file system /mmfs1/, and store the files in remote storage with their full path minus the /mmfs1/ component:

Example: all filesystem objects can be migrated to this target

LocalFileRegex=/mmfs1/(.+)

The following example will cause this storage target to be used as a migration target for all files underneath /mmfs1/data/fileset1/, and store the files in remote storage with the path data/fileset1/path/to/file:

Example: targeted migration

LocalFileRegex=/mmfs1/(data/fileset1/.+)

LocalSymlinkTargets

Defines the behaviour of symbolic link restoration by ngrecall on reverse stubbing/premigration.

The ngrecall tool processes each symbolic link encountered by:

  • Converting to an absolute path with transposing paths containing multiple consequent slashes, ., and ...

  • Validating the computed path against the defined behaviour.

  • If the computed path is restricted by the defined behaviour, ngrecall does not restore the symbolic link.

The parameter has the following supported values:

  • endpoint --- restricts ngrecall to restoring symbolic links referencing directories and files contained within the namespace of the endpoint target

  • any --- restricts ngrecall to restoring symbolic links referencing directories and files contained within the namespace of all endpoint targets. Symbolic link computed paths are validated against LocalFileRegex parameters for all storage endpoints. A symbolic link is restored if the computed path matches a regular expression for at least one endpoint

  • regex:REGULAR_EXPRESSION --- restricts ngrecall to restoring symbolic links where computed paths match the defined regular expression

The default value is "endpoint".

Syntax:

LocalSymlinkTargets=endpoint|any|regex:<regular_expression>

Example:

LocalSymlinkTargets=regex:/mmfs1(/.*)?

RemoteLocationXAttrRegex

Defines the APXrmtlc extended attribute regular expression match. This extended attribute is added to a file upon migration. Any file with the APXrmtlc extended attribute matching this regex will be downloaded from this target when issuing ngrecall:

Syntax:

RemoteLocationXAttrRegex=<identifier>:(.+)

Example:

RemoteLocationXAttrRegex=target1:(.+)

Note that the regular expressions specified by LocalFileRegex and RemoteLocationXAttrRegex match to the beginning of the text. The regular expressions behave as if they are prefixed by the caret anchor ^ (match beginning of text).

StorageType

Defines the supported mode of storage operation. Available options are:

  • BlackPearl - SpectraLogic Black Pearl Spectra S3 target

  • AmazonS3 - Amazon S3, or compatible, protocol target

  • Azure - Microsoft Azure blob storage target

  • Google - Google Cloud storage target

  • FS - POSIX file system target

Syntax:

StorageType=<AmazonS3|BlackPearl|Azure|Google|FS>

Example:

StorageType=AmazonS3

StorageKey

Defines a custom identifier attributable to the storage endpoint.

See the description of this keyword in Multi-Target Support.

Syntax

StorageKey=<char><char>

Examples

StorageKey=zz

Section [FileMatch <file match set>]

These sections specify stub sizes for various names and paths of local files.

FileMatch Section Keywords

Keyword

Default

Required

BasenameMask

Null

No

PathnameRegex

".+"

No

StubSize

-1 (ie premigrate)

No

These sections specify stub sizes for various names and paths of local files.

The program ngmigrate applies those stub sizes to local files when migrating them. The program ngrecall applies those stub sizes to local files when reversely stubbing their remote objects.

The header of every file match section must contain the name of a file match set.

Syntax:

[FileMatch <file match set>]

Example: A file match section describing text files might have the header

[FileMatch text]

The names of all file match sets described in a master configuration file must be different.

The programs ngmigrate and ngrecall process file match sections in the order they are present in a master configuration file. If a section matches a local file, ngmigrate or ngrecall applies the section to the file and does not process subsequent file match sections for that file.

The last file match section described in a master configuration file might specify the default stub size to be applied to a local file when no previous sections matched the file. For example, such default file match section might look like this:

Example: A default file match section, specifying a StubSize of 0 bytes

[FileMatch default]
; Use zero stub size for all other files.
StubSize=0

Every file match section can contain the parameters "BasenameMask", "PathnameRegex", and "StubSize". The parameters "BasenameMask" and "PathnameRegex" specify conditions joined by logical AND.

BasenameMask

Specifies a glob pattern for the base name part of a matched file name.

Separate with '|' (the pipe character) multiple glob patterns joined by logical OR. Example:

Syntax:

BasenameMask=<glob pattern>

If this parameter is absent, all file names will be matched.

Example:

BasenameMask=*.wav|*.mp3|*.ogg

PathnameRegex

Specifies a regular expression for matching against a resolved absolute file name.

Syntax:

PathnameRegex=<regular expression>

If this parameter is absent, all paths will be matched.

Example:

PathnameRegex=.*/images/.*

StubSize

Specifies the stub size in bytes.

If the stub size exceeds the size of a file, or if the stub size is -1, ngmigrate or ngrecall will premigrate the file.

Syntax:

StubSize=<stub size in bytes (or -1)>

The default value is -1; i.e. the file will be premigrated.

Example:

StubSize=1024

Cloud Storage Requirements

Cloud (Object) storage providers have specific requirements for configuring access accounts and providing authentication. These requirements are described below for each storage provider.

It is advised that security best practice be implemented as detailed and updated time to time by the cloud storage provider.

Amazon S3 and S3 Compatible Storage

A bucket must be created which allows multipart upload - refer to Multipart Upload API and Permissions or the documentation for the S3 compatible storage provider.

The authentication details for the user account with privleges to access the bucket and the bucket name must be added to the Amazon S3 configuration file.

Secure the bucket by applying the principal of least privilege. Grant the account for Ngenea operations the minimal required operations as detailed in Amazon S3 Security.

Microsoft Azure Storage

A container must be created.

The authentication details (account name, account key) for the user account with privleges to access the bucket and the bucket name must be added to the Azure configuration file.

Secure the bucket by applying the principal of least privilege. Grant the account for Ngenea operations the minimal required operations as detailed in Azure Blob Storage Security.

Google Cloud Storage

A bucket must be created.

An administrator should create a service account and enable the the Compute Engine API. Refer to the Google Dashboard.

The service account access keys should be downloaded to the local system as JSON and referenced in the Google configuration file. Ensure the JSON access account keys are only readable by those user(s) which require access to the file, typically root.

Secure the bucket by applying the principal of least privilege. Grant the service account for Ngenea operations the minimal required operations as detailed in Google Cloud Storage Security.