Utils

Provides various useful utility functions built on top of the API

Description

arcapix.fs.gpfs.utils.expand_hosts(hosts)

Takes a list of hosts, which can be nodes and/or nodeclasses, and expands nodeclasses into a list of individual nodes (without repetition)

Parameters:hosts (list) – list od names of nodes and/or nodeclasses
arcapix.fs.gpfs.utils.change_migration_threshold(fs, pool, high, low=None, pre=None, update=False, index=0)

Set the migration threshold(s) for a given pool

Parameters:
  • fs (str) – name of the filesystem the pool belongs to
  • pool (str) – name of the pool whose threshold should be changed
  • high (int) – the upper limit at which migration should be triggered
  • low (int) – the lower limit at which migration should stop
  • pre (int) – pre-migration threshold
  • update (bool) – if True, check for any existing placement rules for the specified pool and update the migrate threshold of those. If False, or no relevant placement rules exist, a new one will be created.
  • index (int) – position in the placement polict at which the new migration rule should be inserted. Default = 0, the top of the policy.
arcapix.fs.gpfs.utils.get_filesystem_by_path(path)

Get the filesystem that a path belongs to

Parameters:path (str) – path to a file in a GPFS filesystem
Returns:matching filesystem, or None if one can’t be found
Return type:Filesystem
arcapix.fs.gpfs.utils.get_filesystem_from_target(target)

Get the filesystem corresponding to a target.

Like get_filesystem_by_path but also supports filessystem name and /dev/<fsname>

Parameters:target (str) – a filesystem name or path
Return type:Filesystem object
Raises:ValueError if a matching filesystem can’t be found
arcapix.fs.gpfs.utils.get_default_placement_pool(filesystem)

Returns the default placement pool for the filesystem expanding any macros used

Parameters:filesystem (str) – Filesystem to find the default palcement rule for
Returns:pool name
arcapix.fs.gpfs.utils.get_fileset_placement_pool(filesystem, fileset=None)

Polls the filesystem PlacecmentPolicy to try to figure out what pool files in ‘fset’ are assigned to.

Fileset placement can be specified either as “FOR FILESET …” or “WHERE FILESET_NAME …” If no specific placement rule exists, the default placement rule is used.

Any pool macros are resolved.

Parameters:
  • filesystem (str) – filesystem the fileset belongs to
  • fileset (str) – fileset to find placement pool for

If fileset is None, return default placement pool

Returns:pool name
class arcapix.fs.gpfs.utils.snapshot_rotation(fs, fmt)

Context manager to perform snapshot rotation.

>>> with snapshot_rotation('mmfs1', 'apsync-%Y%d%m%H%M%S') as sr:
...     apsync('mmfs1', sr.oldsnap.name, sr.newsnap.name)

Finds the most recent existing snapshot matching a given format. On context entry, creates a new snapshot according to the same format.

Snapshot objects for these snapshots can be accessed from the rotation object as attributes oldsnap and newsnap respectively.

On context exit, if no errors were raised, the older snapshot is deleted. Else, if there were errors, the newer snapshot is deleted.

The name format should be a valid ‘strptime’ format string. To use an alternate naming scheme, create a subclass which overrides the generate_name and match_name methods.

Parameters:
  • fs – filesystem name or object
  • fmt – a strptime compatible format string
generate_name()

Generate snapshot name based on fmt.

match_name(name)

Check if a name matches fmt.

arcapix.fs.gpfs.utils.parse_policy_options(option_string)

Used to parse the options passed to –policy-options.

Note: unsupported flags are raise a warning, rather than an error

arcapix.fs.gpfs.utils.generate_criteria_from_list(filters, or_operation=False, invert=False)

Creates a list of _Criterion objects that contains all of the provided conditions.

>>> [
>>>     ('like', 'name', 'file', {'caseInsensitive': True}),
>>>     ('gt', 'access', 30, {})
>>> ]

Results in:

>>> [
>>>     Criteria.like('name', 'file', caseInsensitive=True),
>>>     Criteria.gt('access', 30)
>>> ]

Policy output:

>>> WHERE(LOWER(name) LIKE 'file' AND DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 30)
Parameters:
  • filters (list[tuple]) – List of tuples that will be used to generate criteria
  • operator (str) – The logical operator that will be used for the containing operation
  • invert (bool) – Inverts certain criteria to exclude instead of isolate
Returns:

List containing _Criterion

SnapDiff

SnapDiff – Find the differences between two snapshots

arcapix.fs.gpfs.utils.snapdiff.merge_diff(old, new)

Finds the difference between two lists of files. Takes iterables of tuples of the form ((inode, gen), snapid, path, size, type) Assumes iterables are ordered by inode number.

Based on the merge-phase of a merge sort

O = [1, 2]; N = [2, 3]

O  N
----
1      - 1 deleted from N
2  2   - 2 present in both, check snapid and path for modification
   3   - 3 created in N

Acts as a generator, returning tuples of the form (diff_type, files, size, type)

arcapix.fs.gpfs.utils.snapdiff.parse_line(string)

Parse line from work file.

Returns:tuple that can be passed to merge_diff
arcapix.fs.gpfs.utils.snapdiff.check_snapshot_order(filesystem, snap1, snap2, fsetName=None)

Check that two snapshots are in the right order

Parameters:
  • filesystem – Filesystem name or object that the snapshots belong to
  • snap1 (str) – name of the snapshot that should be older
  • snap2 (str) – name of the snapshot that should be newer
  • fsetName (str) – name of the fileset the snapshots belong to (if relevant)
Returns:

True if snap1 is older than snap2, else False

arcapix.fs.gpfs.utils.snapdiff.check_snapshot_compatibility(fs, snap1, snap2, fset=None)

Checks that both snapshots belong to the same fileset (or are both global snapshots)

Returns:True if the snapshots can be snapdiffed
Return type:bool
arcapix.fs.gpfs.utils.snapdiff.check_cache_file(path, fs, snapshot, fset=None)

Check whether the cache file is newer than its snapshot.

If not, that may indicate that the cache file corresponds to an older snapshot which happens to have the same name.

Using an mismatched cache file can result in errors or missing files.

arcapix.fs.gpfs.utils.snapdiff.print_diff(fdiff)

Prints diff tuples, from snapdiff, with colours

>>> for f in snapDiff(...):
...     print_diff(f)
...
+ /path/to/new/file
Parameters:fdiff (FileDiff) – object to print
arcapix.fs.gpfs.utils.snapdiff.get_list_path(fs, snapshot, fsetName=None, exclude=None, storageDir=None)

Get the path for list file of files in a given snapshot.

Path is based on the snapshot scan arguments used to generate the lists

<storageDir>/<fsname>-<fsetname|root>-<snapname>-<hash(exclude)>.list

<storageDir> defaults to <fsmount>/.policytmp/snapdiff

arcapix.fs.gpfs.utils.snapdiff.snapDiff(fsName, old, new=None, fsetName=None, force=False, exclude=None, storageDir=None, filters=None, invertFilters=False, **kwargs)

Finds the differences between the files in two snapshots.

Parameters:
  • fsName (str) – Name of the filesystem to scan snapshots of
  • old (str) – Name of the older snapshot to scan
  • new (str) – Name of the newer snapshot to scan If None, all files in ‘old’ will be returned
  • fsetName (str) – Name of a fileset, for fileset snap diff
  • force (bool) – don’t perform sanity checks. This may lead to unexpected behaviour.
  • exclude (list) – list of exclude patterns
  • storageDir (str) – directory to store list files in default=<fsmount>/snapdiff/
  • filters (list) – List of _Crtierion to be passed for excluding additional files
  • invertFilters (bool) – Flag for inverting filters to search for filters instead of exclude
  • **kwargs – additional options to pass through to policy.run This can include nodes, threadLevel, etc.
Returns:

generator of FileDiff

Return type:

generator

Lazy Clib

The PixStor CLib package provides python binding for the GPFS C API. CLib is used throughout the PixStor Python API to improve performance, but only if it is installed, and in some cases only when the user has sufficient permissions.

However, importing CLib loads libgpfs.so into memory, and it’s not possible to unload it without exiting the python process.

This is a problem if you want to use the Python API to shutdown PixStor, as PixStor won’t shutdown while libgpfs.so is open.

Lazy Clib ensures that clib is only imported at the point when it’s first needed.

It also allows for fully disabling the use of CLib within the Python API. When CLib is disabled, the Python API will fallback on slower, mm-command based functions.

>>> from arcapix.fs.gpfs.cluster import setDisableClib, setEnableClib
>>>
>>> # disable LazyCLib
... setDisableClib()
>>>
>>> # function which uses clib if available
... get_filesystem_by_path('/mmfs1/path/to/file')
>>>
>>> # re-enable LazyClib
... setEnableClib()

Note

setDisableClib only affects LazyClib. CLib can still be imported directly via from arcapix.fs.gpfs import clib.

Even if CLib has been imported directly, setDisableClib will stop the Python API using CLib (via LazyClib).

Once lazy CLib is loaded it can’t be disabled.

Lazy CLib can be used in your own code to take advantage of this functionality

>>> from arcapix.fs.gpfs.utils import LazyClib
>>>
>>> def stat(path):
...     if LazyClib() is not None:
...         return LazyClib().file.stat(path)
...     else:
...         return os.stat(path)

Examples

Get Filesystem a path belongs to

>>> from arcapix.fs.gpfs.utils import get_filesystem_by_path
>>>
>>> fs = get_filesystem_by_path('/mmfs1/data')
>>>
>>> print(fs.name)
'mmfs1'

Find differences between two snapshots

>>> from arcapix.fs.gpfs.utils.snapdiff import snapDiff, print_diff
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap'):
...     print_diff(diff)
...
+ /path/to/new/file
- /path/to/deleted/file

Filter differences between two snapshots

>>> from arcapix.fs.gpfs.utils.snapdiff import snapDiff
>>> from arcapix.fs.gpfs.criteria import Criteria
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap'):
...     print_diff(diff)
...
+ /path/to/new/file
+ /path/to/old/file2
- /path/to/deleted/archive.txt
- /path/to/deleted/file
>>>
>>> filters = [Criteria.Or(Criteria.like("name", "file"), Criteria.like("name", "file2"))]
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap', filters=filters):
...     print_diff(diff)
...
- /path/from/older/archive.txt
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap', filters=filters, inverseFilters=True):
...     print_diff(diff)
...
+ /path/to/new/file
+ /path/to/old/file2
- /path/to/deleted/file