Utilities

The utils modules provides convenience methods built for the GPFS C API.

utils provides more complex functionality than CLib’s other classes which are a thin wrapper over the functions available in the GPFS C API.

Note

Most methods require root permission

Description

Miscellaneous Functions

arcapix.fs.gpfs.clib.utils.getfilesetid(pathname, name)

Get the fileset id for a named fileset.

Parameters
  • pathname (str) – path of any file in the filesystem to which the fileset belongs

  • name (str) – name of the fileset to get id for

Filesystem Snapshot Identifier Convenience Functions

arcapix.fs.gpfs.clib.utils.get_fsname_by_path(path)

Get the name of the filesystem a path belongs to.

e.g.

>>> get_fsname_by_path('/mmfs1/data')
'mmfs1'
Parameters

path (str) – path within a GPFS filesystem

Return type

str

arcapix.fs.gpfs.clib.utils.get_snapname_by_path(path)

Get the name of the snapshot a path belongs to.

e.g.

>>> get_snapname_by_path('/mmfs1/.snapshots/snap1/data')
'snap1'
Parameters

path (str) – path within a GPFS filesystem

Return type

str

arcapix.fs.gpfs.clib.utils.get_path_in_snapshot(path, snap, fileset=None)

Get the equivalent of a path within a given snapshot.

e.g.

>>> get_path_in_snapshot('/mmfs1/data', 'snapshot1')
'/mmfs1/.snapshots/snapshot1/data'
Parameters
  • path (str) – path to find in the snapshot

  • snap (str) – snapshot to find path in. Note: snap must be a snapshot of the filesystem or fileset that path belongs to.

  • fileset (str) – when snap is a fileset snapshot this is the name of the fileset the snapshot belongs to

Return type

str

Directory Scan Convenience Functions

class arcapix.fs.gpfs.clib.utils.scandir(path, snapName=None)

scandir is a directory iterator.

Similar to the one in the Python 3.5 stdlib, implemented using GPFS C lib.

scandir() is a generator version of os.listdir() that returns an iterator over files in a directory, and also exposes extra information (such as type and stat information).

When snapName is specified, the returned paths will be children of the specified snapshot’s directory - e.g.

>>> for i in scandir('/mmfs1/data', 'snap1'):
...     print i.path
/mmfs1/.snapshots/snap1/data
Parameters
  • path (str) – a path within a GPFS filesystem

  • snapName (str) – name of a snapshot, can be used to iterate over the version of a directory in the named snapshot

Returns

iterator of GpfsDirEntry objects for given path

class arcapix.fs.gpfs.clib.utils.GpfsDirEntry

Object representing an directory entry, as returned by scandir.

inode(self) gpfs_ino64_t

Returns the inode number of the entry.

is_dir(self) bool

Returns True if the entry is a directory.

is_file(self) bool

Returns True if the entry is a regular file.

Returns True if the entry is a symlink.

name

Returns the name of the entry.

path

Returns the full path of the entry.

stat(self)

Returns stat_result for the entry.

Result comes from arcapix.fs.gpfs.clib.file.stat().

arcapix.fs.gpfs.clib.utils.listdir(path)

List the contents of a directory.

Like Python os.listdir(), implemented using GPFS C Lib. As with Python, this method follows symlinks.

The list is in arbitrary order. It does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

Parameters

path (str) – path to a directory in a GPFS filesystem

arcapix.fs.gpfs.clib.utils.walk(top, bool topdown=True, bool followlinks=False)

Walk a filesystem directory tree.

Like Python os.walk(), implemented using GPFS C Lib

Note: unlike os.walk, clib walk doesn’t ‘see’ the .snapshots directory

Parameters
  • top (str) – path of root path to walk from

  • topdown (bool) – specifies whether to walk top-down or bottom-up

  • followlinks (bool) – specifies whether to follow symlinks

Returns

root path, list of directories in root, list of files in root

Return type

iterator

arcapix.fs.gpfs.clib.utils.parallel_walk(root, mapfn, reducefn=<built-in function iadd>, workers=None)

Perform a parallel walk of a GPFS directory tree.

Parameters
  • mapfn – function to call for each directory entry. Receives a GpfsDirEntry object.

  • reducefn – function to combine results from mapfn Default = addition

  • workers – number of worker processes to spawn Default = CPU count/2, up to a maximum of 8

Note

Requires root permission

Inode Scan Convenience Functions

class arcapix.fs.gpfs.clib.utils.inode_iterator(fsName, snapName=None, prevSnap=None, fromInode=0, toInode=0)

inode_iterator is an iterator object, which allows users to perform inode scans

>>> for i in inode_iterator(...):
...     # do something
>>> iscan = inode_iterator(...)
>>> i = iscan.next()
>>> j = next(iscan)

It acts as a convenience for the various inodescan methods.

Parameters
  • fsName (str) – Name of the Filesystem to be scanned

  • snapName (str) – Name of a snapshot with the named filesystem to scan

  • prevSnap – Name of a previous snapshot, older than snapName If provided, only files that have changed since this snapshot will be returned Pass None to return all inodes from fsName/snapName

  • fromInode (int) – The minimum inode number to scan from

  • toInode (int) – The maximum inode number to scan to. If not specified or 0, all inodes will be returned.

The fromInode and toInode parameters can be used to perform multi-threaded scans.

Returns

iattr namedtuples

close(self)

Close the inode scan.

Reset Times

class arcapix.fs.gpfs.clib.utils.SetTimesError(message)

Exception raised by reset_times()

When precheck is True and times cannot be changed

arcapix.fs.gpfs.clib.utils.reset_times(path, follow=True, precheck=True)

Reset the timestamps on a file on context exit

>>> with reset_times('/mmfs1/file'):
...     # do stuff with file
Parameters
  • path (str) – path of the file whose times should be reset

  • follow (bool) – whether to follow symlinks

  • precheck (bool) –

    Pre-check if we will be able to reset times.

    If this option is True and times can’t be changed, a SetTimesError will be thrown before any code is run inside the context. This ensures that the existing times are preserved.

    If this option is False, then resetting times may fail silently.

(Re)setting times may fail, for example, if you aren’t the file owner or root

ACL Convenience Functions

arcapix.fs.gpfs.clib.utils.acl.get_ace_name(ace)

Get the user or group name associated with an ACE.

ACE is an entry returned by arcapix.fs.gpfs.clib.acl.get_nfs4_acl()

Note

User and group name lookup is performed with pwd.getpwuid and grp.getgrgid.

These may not work for identifying users and groups in an AD environment.

Returns

tuple of (type, name) where type is one of (special, group, user)

Raises

KeyError if the ACE id can’t be translated to a name

arcapix.fs.gpfs.clib.utils.acl.append_nfs4_aces(pathname, aces)

Append one or more entries to the NFSv4 ACL for a path.

This is slightly more efficient than using arcapix.fs.gpfs.clib.acl.get_nfsv4_acl() and arcapix.fs.gpfs.clib.acl.put_nfsv4_acl() since both steps are performed at the C-level

Parameters
  • pathname (str) – path of file or directory to get ACL for.

  • aces – an ace_v4 NFSv4 entry or list of entries to append

Examples

Walk the filesystem

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import walk
>>>
>>> for root, dirs, files in walk("/mmfs1"):
...     for name in files:
...         print(os.path.join(root, name))
...     for name in dirs:
...         print(os.path.join(root, name))
/mmfs1/test
/mmfs1/data
/mmfs1/.policytmp

Get the filesystem a given path belongs to

>>> from arcapix.fs.gpfs import Filesystem
>>> from arcapix.fs.gpfs.clib.utils import get_fsname_by_path
>>>
>>> fs = Filesystem(get_fsname_by_path('/mmfs1/data'))
>>>
>>> print(fs.name)
'mmfs1'

Walk the filesystem for a given snapshot

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir
>>>
>>> def walk(root, snap):
...     for i in scandir(root, snap):
...         yield i.path
...         # recurse into the directory
...         if i.is_dir():
...             for d in walk(i.path, snap):
...                 yield d
...
>>> for i in walk('/mmfs1', 'snap1'):
...     print(i)
...
/mmfs1/.snapshots/snap1/data
/mmfs1/.snapshots/snap1/.policytmp

Calculate the total size of temporary files on the filesystem

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir, inode_iterator
>>>
>>> # iterator of inode numbers for files that end '.tmp'
>>> def find_inodes(root):
...     for i in scandir(root):
...         if i.name.endswith('.tmp'):
...             yield i.inode()
...         # recurse into the directory
...         if i.is_dir():
...             for d in find_inodes(i.path):
...                 yield d
...
>>> # list of inode number of '.tmp' files
>>> inodes = list(find_inodes('/mmfs1'))
>>>
>>> # create iterator - use max and min to limit scope of scan
>>> itr = inode_iterator('mmfs1', fromInode=min(inode), toInode=max(inodes)+1)
>>>
>>> # add up sizes of inodes in the inode list
>>> print(sum(x.ia_size for x in itr if x.ia_inode in inodes))
100421

Count files in a directory tree in parallel

>>> from arcapix.fs.gpfs.clib.utils import parallel_walk
>>>
>>> # define a map function to count files only
>>> def count_files(dirent):
...    if dirent.is_file():
...        return 1
...     return 0
...
>>> # perform a parallel directory tree walk
>>> count = parallel_walk('/mmfs1/data', count_files, workers=4)
>>>
>>> print(count)
1358926

Read a file without updating its atime

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import reset_times
>>>
>>> print(os.stat('/mmfs1/hello.txt').st_atime)
1558002472
>>>
>>> with reset_times('/mmfs1/hello.txt'):
...     with open('/mmfs1/hello.txt', 'r') as f:
...         print(f.read())
...
hello world
>>>
>>> print(os.stat('/mmfs1/hello.txt').st_atime)
1558002472

Add a new entry to a file ACL

Grant read/write permission for the ‘admin’ group

Hint

This may be combined with arcapix.fs.gpfs.clib.utils.walk() to add the new ACE to a directory tree, recursively.

>>> import grp
>>> from arcapix.fs.gpfs.clib.utils.acl import append_nfs4_aces
>>> from arcapix.fs.gpfs.clib.acl import ace_v4, AM_READ, AM_WRITE, AF_GROUP_ID
>>>
>>> # define the new entry
>>> gid = grp.getgrnam('admin').gr_gid
>>> ace = ace_v4(aceWho=gid, aceFlags=AF_GROUP_ID, aceMask=AM_READ|AM_WRITE)
>>>
>>> # append the new entry to the target file
>>> append_nfs4_aces('/mmfs1/test', ace)