Utils¶
Provides various useful utility functions built on top of the API
Description¶
-
arcapix.fs.gpfs.utils.
expand_hosts
(hosts)¶ Takes a list of hosts, which can be nodes and/or nodeclasses, and expands nodeclasses into a list of individual nodes (without repetition)
Parameters: hosts (list) – list od names of nodes and/or nodeclasses
-
arcapix.fs.gpfs.utils.
change_migration_threshold
(fs, pool, high, low=None, pre=None, update=False, index=0)¶ Set the migration threshold(s) for a given pool
Parameters: - fs (str) – name of the filesystem the pool belongs to
- pool (str) – name of the pool whose threshold should be changed
- high (int) – the upper limit at which migration should be triggered
- low (int) – the lower limit at which migration should stop
- pre (int) – pre-migration threshold
- update (bool) – if True, check for any existing placement rules for the specified pool and update the migrate threshold of those. If False, or no relevant placement rules exist, a new one will be created.
- index (int) – position in the placement polict at which the new migration rule should be inserted. Default = 0, the top of the policy.
-
arcapix.fs.gpfs.utils.
get_filesystem_by_path
(path)¶ Get the filesystem that a path belongs to
Parameters: path (str) – path to a file in a GPFS filesystem Returns: matching filesystem, or None if one can’t be found Return type: Filesystem
-
arcapix.fs.gpfs.utils.
get_filesystem_from_target
(target)¶ Get the filesystem corresponding to a target.
Like
get_filesystem_by_path
but also supports filessystem name and/dev/<fsname>
Parameters: target (str) – a filesystem name or path Return type: Filesystem object Raises: ValueError if a matching filesystem can’t be found
-
arcapix.fs.gpfs.utils.
get_default_placement_pool
(filesystem)¶ Returns the default placement pool for the filesystem expanding any macros used
Parameters: filesystem (str) – Filesystem to find the default palcement rule for Returns: pool name
-
arcapix.fs.gpfs.utils.
get_fileset_placement_pool
(filesystem, fileset=None)¶ Polls the filesystem PlacecmentPolicy to try to figure out what pool files in ‘fset’ are assigned to.
Fileset placement can be specified either as “FOR FILESET …” or “WHERE FILESET_NAME …” If no specific placement rule exists, the default placement rule is used.
Any pool macros are resolved.
Parameters: If
fileset
is None, return default placement poolReturns: pool name
-
class
arcapix.fs.gpfs.utils.
snapshot_rotation
(fs, fmt)¶ Context manager to perform snapshot rotation.
>>> with snapshot_rotation('mmfs1', 'apsync-%Y%d%m%H%M%S') as sr: ... apsync('mmfs1', sr.oldsnap.name, sr.newsnap.name)
Finds the most recent existing snapshot matching a given format. On context entry, creates a new snapshot according to the same format.
Snapshot
objects for these snapshots can be accessed from the rotation object as attributesoldsnap
andnewsnap
respectively.On context exit, if no errors were raised, the older snapshot is deleted. Else, if there were errors, the newer snapshot is deleted.
The name format should be a valid ‘strptime’ format string. To use an alternate naming scheme, create a subclass which overrides the
generate_name
andmatch_name
methods.Parameters: - fs – filesystem name or object
- fmt – a
strptime
compatible format string
-
generate_name
()¶ Generate snapshot name based on
fmt
.
-
match_name
(name)¶ Check if a name matches
fmt
.
-
arcapix.fs.gpfs.utils.
parse_policy_options
(option_string)¶ Used to parse the options passed to –policy-options.
Note: unsupported flags are raise a warning, rather than an error
-
arcapix.fs.gpfs.utils.
generate_criteria_from_list
(filters, or_operation=False, invert=False)¶ Creates a list of _Criterion objects that contains all of the provided conditions.
>>> [ >>> ('like', 'name', 'file', {'caseInsensitive': True}), >>> ('gt', 'access', 30, {}) >>> ]
Results in:
>>> [ >>> Criteria.like('name', 'file', caseInsensitive=True), >>> Criteria.gt('access', 30) >>> ]
Policy output:
>>> WHERE(LOWER(name) LIKE 'file' AND DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 30)
Parameters: Returns: List containing _Criterion
SnapDiff¶
SnapDiff – Find the differences between two snapshots
-
arcapix.fs.gpfs.utils.snapdiff.
merge_diff
(old, new)¶ Finds the difference between two lists of files. Takes iterables of tuples of the form ((inode, gen), snapid, path, size, type) Assumes iterables are ordered by inode number.
Based on the merge-phase of a merge sort
O = [1, 2]; N = [2, 3] O N ---- 1 - 1 deleted from N 2 2 - 2 present in both, check snapid and path for modification 3 - 3 created in N
Acts as a generator, returning tuples of the form (diff_type, files, size, type)
-
arcapix.fs.gpfs.utils.snapdiff.
parse_line
(string)¶ Parse line from work file.
Returns: tuple that can be passed to merge_diff
-
arcapix.fs.gpfs.utils.snapdiff.
check_snapshot_order
(filesystem, snap1, snap2, fsetName=None)¶ Check that two snapshots are in the right order
Parameters: Returns: True if
snap1
is older thansnap2
, else False
-
arcapix.fs.gpfs.utils.snapdiff.
check_snapshot_compatibility
(fs, snap1, snap2, fset=None)¶ Checks that both snapshots belong to the same fileset (or are both global snapshots)
Returns: True if the snapshots can be snapdiffed Return type: bool
-
arcapix.fs.gpfs.utils.snapdiff.
check_cache_file
(path, fs, snapshot, fset=None)¶ Check whether the cache file is newer than its snapshot.
If not, that may indicate that the cache file corresponds to an older snapshot which happens to have the same name.
Using an mismatched cache file can result in errors or missing files.
-
arcapix.fs.gpfs.utils.snapdiff.
print_diff
(fdiff)¶ Prints diff tuples, from snapdiff, with colours
>>> for f in snapDiff(...): ... print_diff(f) ... + /path/to/new/file
Parameters: fdiff (FileDiff) – object to print
-
arcapix.fs.gpfs.utils.snapdiff.
get_list_path
(fs, snapshot, fsetName=None, exclude=None, storageDir=None)¶ Get the path for list file of files in a given snapshot.
Path is based on the snapshot scan arguments used to generate the lists
<storageDir>/<fsname>-<fsetname|root>-<snapname>-<hash(exclude)>.list
<storageDir> defaults to <fsmount>/.policytmp/snapdiff
-
arcapix.fs.gpfs.utils.snapdiff.
snapDiff
(fsName, old, new=None, fsetName=None, force=False, exclude=None, storageDir=None, filters=None, invertFilters=False, **kwargs)¶ Finds the differences between the files in two snapshots.
Parameters: - fsName (str) – Name of the filesystem to scan snapshots of
- old (str) – Name of the older snapshot to scan
- new (str) – Name of the newer snapshot to scan If None, all files in ‘old’ will be returned
- fsetName (str) – Name of a fileset, for fileset snap diff
- force (bool) – don’t perform sanity checks. This may lead to unexpected behaviour.
- exclude (list) – list of exclude patterns
- storageDir (str) – directory to store list files in default=<fsmount>/snapdiff/
- filters (list) – List of _Crtierion to be passed for excluding additional files
- invertFilters (bool) – Flag for inverting filters to search for filters instead of exclude
- **kwargs – additional options to pass through to
policy.run
This can includenodes
,threadLevel
, etc.
Returns: generator of
FileDiff
Return type: generator
Lazy Clib¶
The PixStor CLib package provides python binding for the GPFS C API. CLib is used throughout the PixStor Python API to improve performance, but only if it is installed, and in some cases only when the user has sufficient permissions.
However, importing CLib loads libgpfs.so
into memory,
and it’s not possible to unload it without exiting the python process.
This is a problem if you want to use the Python API to shutdown PixStor,
as PixStor won’t shutdown while libgpfs.so
is open.
Lazy Clib ensures that clib is only imported at the point when it’s first needed.
It also allows for fully disabling the use of CLib within the Python API. When CLib is disabled, the Python API will fallback on slower, mm-command based functions.
>>> from arcapix.fs.gpfs.cluster import setDisableClib, setEnableClib
>>>
>>> # disable LazyCLib
... setDisableClib()
>>>
>>> # function which uses clib if available
... get_filesystem_by_path('/mmfs1/path/to/file')
>>>
>>> # re-enable LazyClib
... setEnableClib()
Note
setDisableClib
only affects LazyClib
.
CLib can still be imported directly via from arcapix.fs.gpfs import clib
.
Even if CLib has been imported directly,
setDisableClib
will stop the Python API using CLib (via LazyClib
).
Once lazy CLib is loaded it can’t be disabled.
Lazy CLib can be used in your own code to take advantage of this functionality
>>> from arcapix.fs.gpfs.utils import LazyClib
>>>
>>> def stat(path):
... if LazyClib() is not None:
... return LazyClib().file.stat(path)
... else:
... return os.stat(path)
Examples¶
Get Filesystem a path belongs to¶
>>> from arcapix.fs.gpfs.utils import get_filesystem_by_path
>>>
>>> fs = get_filesystem_by_path('/mmfs1/data')
>>>
>>> print(fs.name)
'mmfs1'
Find differences between two snapshots¶
>>> from arcapix.fs.gpfs.utils.snapdiff import snapDiff, print_diff
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap'):
... print_diff(diff)
...
+ /path/to/new/file
- /path/to/deleted/file
Filter differences between two snapshots¶
>>> from arcapix.fs.gpfs.utils.snapdiff import snapDiff
>>> from arcapix.fs.gpfs.criteria import Criteria
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap'):
... print_diff(diff)
...
+ /path/to/new/file
+ /path/to/old/file2
- /path/to/deleted/archive.txt
- /path/to/deleted/file
>>>
>>> filters = [Criteria.Or(Criteria.like("name", "file"), Criteria.like("name", "file2"))]
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap', filters=filters):
... print_diff(diff)
...
- /path/from/older/archive.txt
>>>
>>> for diff in snapDiff('mmfs1', 'oldsnap', 'newsnap', filters=filters, inverseFilters=True):
... print_diff(diff)
...
+ /path/to/new/file
+ /path/to/old/file2
- /path/to/deleted/file