Migrating

Policy Based Migration

A PixStor policy is typically used to migrate files. A policy will select candidates for migration based on various criteria, and then call ngmigrate in the execution phase to migrate files to external storage.

Example Migration Policy

This simple example policy will migrate files which have not been accessed in 180 days to free up space.

define(
    exclude_list,
    (
        PATH_NAME LIKE '%/.ctdb/%'
        OR NAME LIKE 'user.quota%'
        OR NAME LIKE 'fileset.quota%'
        OR NAME LIKE 'group.quota%'
    )
)

define(is_migrated, (MISC_ATTRIBUTES LIKE '%V%'))

/* All files use the ngenea.conf configuration file:*/
RULE EXTERNAL POOL 'NGENEA_DEFAULT'
    EXEC '/var/mmfs/etc/mmpolicyExec-ngenea-hsm'
    OPTS '-v1 --log-target=syslog --config-file=/opt/arcapix/etc/ngenea.conf'
    ESCAPE '%'

RULE 'ngenea_migrate' MIGRATE TO POOL 'NGENEA_DEFAULT'
/* If Filesystem is running out of space (more than 85% full)
    reduce usage to 70% */
THRESHOLD(85,70,70)

/* Choose files least recently accessed */
WEIGHT(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))

/* but only migrate files > 1MB in size */
WHERE KB_ALLOCATED > 1024
/* Don't migrate anything which has been accessed in the last 180 days */
AND (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 180)

AND NOT (is_migrated)
AND NOT (exclude_list)

A more comprehensive example can be found here.

ngmigrate

Synopsis

ngmigrate [-p|--sync-metadata] NAME1 ... NAMEn

ngmigrate [-p|--sync-metadata]
          [--filelist-format=NUL|quoted] -f FILELIST

ngmigrate --default-stub-size=LENGTH NAME1 ... NAMEn

ngmigrate --default-stub-size=LENGTH
          [--filelist-format=NUL|quoted] -f FILELIST

ngmigrate --force-stub-size=LENGTH NAME1 ... NAMEn

ngmigrate --force-stub-size=LENGTH
          [--filelist-format=NUL|quoted] -f FILELIST

where NAMEi is a file, directory, or symbolic link name.

Common options for all use cases:

[-vLEVEL] [--config-file=FILE] [-r] [--no-recursion-remote]
[ --overwrite-remote | --fail-on-mismatch[=all] ]
[--ignore-rmtlc] [--remote-path=PATH] [--update-atime]
[--update-mtime] [--skip-metadata-update] [--no-stamp-live]
[--lock-level=partial|implicit]
[ --log-target=syslog | --log-format=json ]
( -E RESTRICTION_ALIASES[:RESTRICTION_PATHS] |
  --endpoint-exclude=EXCLUSION_ALIASES[:EXCLUSION_PATHS] )*
( -P ALIASES:PARAMETER=VALUE )*

Description

Migrates files from the local GPFS file system to a Storage Target.

Options

--config-file=FILE
                path to a master configuration file.
                Default: /opt/arcapix/etc/ngenea.conf
--default-stub-size=LENGTH
                default approximate length of a beginning file segment that
                should be retained during file migration. This setting can be
                overridden in a configuration file.
                Default: 0 (free up the entire file content).
-E, --endpoint=ALIASES[:PATHS]
                restrict the set of storage endpoints for interaction to
                endpoints with aliases specified by extended glob
                pattern ALIASES.
                Optionally, restrict remote object pathnames at those endpoints
                to pathnames matching extended glob pattern PATHS.
                By default, restrict remote object pathnames to the root path.
                Compatible with the option: --no-recursion-remote
--endpoint-exclude=ALIASES[:PATHS]
                exclude remote object pathnames specified by extended glob
                pattern PATHS at storage endpoints with aliases specified by
                extended glob pattern ALIASES from processing.
                By default, exclude remote object pathnames at the root path.
                Compatible with the option: --no-recursion-remote
-f FILELIST     process files and directories from a filelist file.
--fail-on-mismatch[=content|all]
                fail migrating a file, directory, or symbolic link if a remote
                object (with a matching UUID) or folder exists but:
                "content" - has a different hash (default);
                    "all" - has a different hash or different metadata.
                Conflicts with the option: --overwrite-remote
--filelist-format=LF|NUL|quoted
                format of a filelist file:
                "LF"     - filenames delimited by newlines; a filename cannot
                           contain newline characters;
                "NUL"    - filenames delimited by the NUL (0) byte;
                "quoted" - filenames possibly enclosed in single or double
                           quotes and delimited by newlines.
                Default: "LF".
                Compatible with the option: -f FILELIST
--force-stub-size=LENGTH
                retain a segment of every migrated file starting from its
                beginning and having a specified approximate length in bytes.
                Conflicts with the option: --sync-metadata
--help          display this help and exit.
--ignore-rmtlc  always use local file names to deduce remote object names.
                Default: read remote location xattrs to determine the names of
                         remote objects for migrated, premigrated, or metadata
                         synced files.
--lock-level=partial|implicit
                DMAPI locking level:
                "partial"  - explicitly request a DMAPI shared access right
                             for the duration of migrating a file and
                             explicitly request exclusive DMAPI access rights
                             when updating file xattrs or stubbing a file;
                "implicit" - instruct DMAPI to self-manage access rights per
                             file block when migrating and also self-manage
                             DMAPI access rights when updating file xattrs or
                             stubbing a file.
                Default: "partial"; can be specified in the configuration file
                         or overridden by command line.
--log-format=json
                log messages in JSON format.
                Conflicts with the option: --log-target=syslog
--log-target=syslog
                redirect all logging to the syslog.
                Conflicts with the option: --log-format=json
--no-recursion-remote
                disable recursive interpretation of restriction and exclusion
                extended glob patterns for remote object pathnames.
                The recursive interpretation means matching sub-directories at
                all nesting levels, whereas non-recursive interpretation means
                matching a single directory.
                Compatible with the options: -E, --endpoint; --endpoint-exclude
--no-stamp-live disable stamping the xattrs of a live file and changing its
                status when premigrating a snapshot file.
--overwrite-remote
                overwrite remote objects if they already exist--do not create
                remote object instances with various UUID suffixes.
                Overwrite the metadata of already existing remote folders for
                local directories migrated explicitly.
                Conflicts with the options: --sync-metadata; --fail-on-mismatch
-P, --param-endpoint=ALIASES:PARAMETER=VALUE
                add a parameter with name PARAMETER and value VALUE to
                parameters read from configuration files for storage endpoints
                with aliases specified by extended glob pattern ALIASES.
                If PARAMETER already exists in a configuration file, it takes
                a new VALUE.
-p, --premigrate
                retain the content of every migrated file and do not set the
                OFFLINE flag for the file.
                Conflicts with the option: --sync-metadata
-r, --recursion-local
                if program arguments specify directory names, process files in
                those directories and their sub-directories recursively.
                Default: process specified directories but not their content.
--remote-path=PATH
                migrate a single file, directory, or symbolic link to remote
                object or folder PATH or migrate multiple files, directories,
                or symbolic links to remote folder PATH (ending with `/').
                Default: migrate files, directories, or symbolic links to
                         remote locations deduced from the local paths of the
                         files, directories, or symbolic links.
--skip-metadata-update
                disable updating the metadata of a remote object if its UUID
                and hash are equal to the UUID and hash of a local file.
                Disable updating the metadata of folders and symbolic links.
                Conflicts with the option: --sync-metadata
--sync-metadata update remote object or folder metadata based on the status of
                local files, directories, or symbolic links.
                Conflicts with the options: -p, --premigrate;
                                            --overwrite-remote;
                                            --skip-metadata-update;
                                            --force-stub-size
--update-atime  update the access time and status change time of local files to
                "now" after successful migration.
--update-mtime  update the modification time and status change time of local
                files to "now" after successful migration.
-v, --verbose[=LEVEL]
                verbosity level:
                0 = error and warning messages (also used when this option
                    is absent);
                1 = print the names of successfully migrated files (default);
                2 = debug messages, excluding those related to file locking;
                3 = enable core dump and debug messages related to file locking;
                    print PID and current time with microsecond precision.
-V, --version   display version information and exit.

Exit Status

On successful completion, ngmigrate returns exit status 0. If ngmigrate was called to migrate files, successful completion means that all files were migrated successfully, and there were no warning messages printed.

On unsuccessful completion, ngmigrate returns exit status 1---this means that none of the files were migrated successfully.

On partially successful completion, ngmigrate returns exit status 2---this means that some files were migrated successfully, and some files were not migrated successfully, or that all files were migrated successfully, but ngmigrate printed one or more warning messages.

Examples

To migrate a file to the associated Storage Target:

ngmigrate /mmfs1/data/file1

To migrate a file to a storage target, using a custom configuration file (which may redefine the storage target) to replace the default configuration file:

ngmigrate --config-file=/path/to/custom.conf /mmfs1/data/file1

To migrate multiple files to their associated Storage Targets:

ngmigrate /mmfs1/data/file1 /mmfs1/data/file2

To migrate all files in directory /mmfs1/data/ starting with name file to the associated Storage Target:

ngmigrate /mmfs1/data/file*

To migrate all files starting with name file and name newfile in directory /mmfs1/data/ to the associated Storage Target:

ngmigrate /mmfs1/data/file* /mmfs1/data/newfile*

To migrate all files, except hidden ("dot") files, within a directory to the associated Storage Target:

ngmigrate /mmfs1/data/*

To migrate all files, including hidden ("dot") files, within a directory to the associated Storage Target:

ngmigrate /mmfs1/data/{.??,}*

To migrate all files, except hidden ("dot") files, within two different directories to their associated Storage Targets:

ngmigrate /mmfs1/data/dir1/* /mmfs1/data/dir2/*

Handling a Too Long List of Arguments

If there are too many files in a directory, invoking ngmigrate to process files in the directory using a glob pattern may fail with the "Argument list too long" error. For example, if the directory /mmfs1/data contains too many files, the following command fails:

$ ngmigrate /mmfs1/data/*
bash: /opt/arcapix/bin/ngmigrate: Argument list too long

In this situation, a user can invoke ngmigrate to process all files in the directory recursively by passing the option -r and a directory name instead of passing a glob pattern, for example:

ngmigrate -r /mmfs1/data

In this case, ngmigrate will process all files in the directory /mmfs1/data and all its descending subdirectories. If ngmigrate encounters files or directories with duplicate dev/ino pairs, it will process instances of those files or directories it finds first.

To process files in a too large directory by glob pattern, a user can use the standard find and xargs commands, for example:

find /mmfs1/data -name '*.bin' -print0 | xargs -0 -n100 ngmigrate

This command scans the directory /mmfs1/data and all its descending subdirectories, finds files with names matching the glob pattern *.bin, and executes ngmigrate passing those file names as its arguments. For every ngmigrate invocation, xargs passes no more than 100 arguments, so the "Argument list too long" error shall not occur.

Alternatively, a user can pass a long list of files via a filelist, for example:

find /mmfs1/data -name '*.bin' -print0 | ngmigrate --filelist-format=NUL -f-

This command makes ngmigrate read a NUL-separated list of files to migrate piped from the find command.

Matching Storage Endpoints

The options --endpoint / --endpoint-exclude are used to include / exclude some endpoints.

Example, to migrate files to those Storage Targets which match an extended glob pattern :

ngmigrate --endpoint='awss3-*|fs-*' /mmfs1/data/dir1/*

Only storage endpoints which match the pattern will be taken into consideration.

To migrate files to those Storage Targets which fail to match an extended glob pattern :

ngmigrate --endpoint-exclude='awss3-*|fs-*' /mmfs1/data/dir1/*

Only storage endpoints which don't match the pattern will be taken into consideration.

Premigrating from Snapshots

On passing to ngmigrate directories and symbolic links located in snapshots, ngmigrate uploads them to remote folders and objects with names deduced from the names of corresponding "live" (i.e. located outside of any snapshots) directories and symbolic links.

On passing to ngmigrate files located in snapshots, where the files do not have remote location extended attributes for a storage endpoint, ngmigrate uploads the files to remote objects at the storage endpoint with names deduced from the names of corresponding "live" files.

Example:

ngmigrate -p /mmfs1/path/to/fileset1/.snapshots/snap9/subdir1/subdir2/file2.txt

The above command uploads the specified file to the remote object "path/to/fileset1/subdir1/subdir2/file2.txt" (possibly with a UUID extension). In the above example ".snapshots" is a snapshot directory within the fileset "fileset1", and "snap9" is a snapshot name. Ngenea removes the snapshot directory and name path components upon uploading.

On passing files located in snapshots, where the files have remote location extended attributes (APXrmtXX) for a storage endpoint, ngmigrate uploads the files to remote objects at the storage endpoint with names deduced from the remote location extended attributes. Pass the option --ignore-rmtlc to disable using the remote location extended attributes in this case and upload the files to remote objects with names deduced from the names of corresponding "live" files.

Determining the UUID of a Premigrated Snapshot File

The ngmigrate tool determines a UUID for premigrating a snapshot file by the following rules:

  1. If a "live" file exists, is "normal" (online) or "premigrated", has the same fsid/ino/igen triple as the premigrated snapshot file, and also has an Ngenea APXguuid extended attribute, ngmigrate uses a UUID from the extended attribute.

  2. Otherwise, if the premigrated snapshot file has an Ngenea APXguuid extended attribute, ngmigrate uses a UUID from the extended attribute.

  3. Otherwise, ngmigrate randomly generates a UUID.

If a determined UUID is equal to the UUID metadata of a corresponding remote object, ngmigrate will replace the remote object with the premigrated file.

If a determined UUID is different from the UUID metadata of a corresponding remote object, ngmigrate will upload the premigrated file with a UUID extension in order to avert overwriting the remote object.

If after premigrating files from a snapshot the outcome required is that local and remote paths match exactly, the --overwrite-remote option should be passed ensuring that on encountering a collision, ngmigrate will overwrite the object in the storage endpoint.

Users should ensure that if this option is used, they understand their data workflow, and it is advisable to turn on any versioning support in the storage endpoint in case of inadvertent overwriting of data.

Stamping a "Live" File

If a "live" file exists, is "normal" (online) or "premigrated", and has the same fsid/ino/igen triple as a premigrated snapshot file, then on successful premigration of the snapshot file, ngmigrate sets Ngenea extended attributes for the "live" file and changes its status from "normal" to "premigrated".

Setting Ngenea extended attributes for the "live" file and changing its status from "normal" to "premigrated" can be disabled by passing the option --no-stamp-live to ngmigrate.

See Also

Limitations

Migration failures may be observed when using the ngmigrate tool due to violation of one or more of the following limitations:

Maximum DMAPI Xattr Value Length

Ngenea stores information about remote objects corresponding to a local migrated (i.e. stub or premigrated) file in its DMAPI extended attributes. DMAPI extended attributes must be accessed by DMAPI-specific functions, and the standard commands getfattr and setfattr for manipulating extended attributes cannot access them.

PixStor imposes a limit on the value of a DMAPI extended attribute of a file equal to 1022 bytes. The length of a remote location string of a migrated file stored in its DMAPI extended attribute cannot exceed 1022 bytes. This limitation restricts the length of a name part of a migrated file stored in its DMAPI extended attribute for keeping a remote location.

On attempt to migrate a file with a remote location string longer than 1022 bytes, ngmigrate will report an error and will not migrate the file to that particular remote location.

On migrating files to any storage target, except for the filesystem storage target, ngmigrate percent-encodes special characters in the names of remote objects corresponding to local files (unless this mode is disabled by the configuration parameter EscapeNames=false if that parameter is available). Therefore, if the name of a local file contains special characters, the limit equal to 1022 bytes may be violated even for a short file name.

Where files with very long names have several common path prefixes, additional storage endpoints can be configured for such path prefixes to make name parts of those files stored in the DMAPI extended attributes shorter. This approach also shortens the names of remote objects corresponding to local files---in this way, it is possible to prevent violating a restriction on maximum object name length for a particular storage target.

Maximum Object Name Length

Storage targets impose limits on the length of remote object and folder names. A long local file name or directory path when converted to a remote object name may exceed the allowed character limit of the storage target. Therefore, ngmigrate may fail to create the remote object or folder.

Additionally, ngmigrate may store the metadata of remote objects and folders in separate shadow metadata objects with names formed by prepending . and appending .xattr to an object or folder name. Where a remote object or folder name has allowed length close to the maximum allowed limit, a shadow metadata object name including the .xattr suffix may be longer than the allowed length, and therefore the migration of a local file or directory fails.

Modifying Migrated Files

Modifying files containing DMAPI extended attributes set by Ngenea may remove the extended attributes from those files. Removing DMAPI extended attributes from a file deletes information about remote objects corresponding to the file. This deletion results in the inability to correctly perform operations that require this information, for example, recall the file.

Third-party tools that modify a file by creating a new file with updated content and renaming the new file using the same name as the original file may remove DMAPI extended attributes from the original file. If possible, such third-party tools should be configured to modify a file by writing to it directly, without creating a new file with updated content.

For example, Vim (Vi IMproved, a programmers text editor) may require setting the option bkc to yes by issuing the command :set bkc=yes. The Vim documentation describes that option as

'backupcopy'      'bkc'     make backup as a copy, don't rename the file

The command set bkc=yes can be specified in the file .vimrc to set default behavior.

ACL Support

Ngenea supports saving and restoring the ACLs of files and directories. This behaviour is particularly applicable to 'follow-the-sun' type workflows where security of data is enforced across global sites.

To enable saving ACLs of local files and directories in a storage endpoint via ngmigrate specify the parameter ACLSave=true in the configuration file for the storage endpoint.

To disable restoration of remote ACLs to local files and directories on reverse stubbing/premigration pass the option --no-restore-acl to ngrecall.

Similarly to restoring other metadata of local directories on reverse stubbing/premigration, ngrecall does not restore (I.E. overwrite) the ACLs of local pre-existing directories.

ACL Behaviour

Ngenea ACL behaviour differs depending upon the ACL support of the target endpoint type.

Object Store

  • Local file ACLs are saved to the metadata of remote objects

  • Local directories ACLs are saved as shadow metadata remote objects

GPFS Filesystem

  • Local ACLs are copied to the ACLs of files and directories in the storage endpoint

A GPFS filesystem supports either POSIX or NFS4 ACLs.

The option -k of mmchfs provides configuration of ACL semantics for a GPFS filesystem.

Ngenea requires the ACL type configuration of a GPFS filesystem to be identical to that of the source GPFS filesystem in order to successfully restore the ACLs of local files and directories on reverse stubbing/premigration.

Where the ACL type of the destination filesystem differs from that stored in the endpoint --no-restore-acl can be passed to ngrecall to skip restoration of incompatible ACL types and successfully recall the file(s) and/or directories.

POSIX-compliant Filesystem

  • Local ACLs are saved to the metadata of remote objects and folders in the storage endpoint