Utilities

The utils modules provides convenience methods built for the GPFS C API.

utils provides more complex functionality than CLib’s other classes which are a thin wrapper over the functions available in the GPFS C API.

Note

Most methods require root permission

Description

Filesystem Snapshot Identifier Convenience Functions

arcapix.fs.gpfs.clib.utils.get_fsname_by_path(char *path)

Get the name of the filesystem a path belongs to.

e.g.

>>> get_fsname_by_path('/mmfs1/data')
'mmfs1'
Parameters:path (str) – path within a GPFS filesystem
Return type:str
arcapix.fs.gpfs.clib.utils.get_snapname_by_path(char *path)

Get the name of the snapshot a path belongs to.

e.g.

>>> get_snapname_by_path('/mmfs1/.snapshots/snap1/data')
'snap1'
Parameters:path (str) – path within a GPFS filesystem
Return type:str
arcapix.fs.gpfs.clib.utils.get_path_in_snapshot(char *path, char *snap)

Get the equivalent of a path within a given snapshot.

e.g.

>>> get_path_in_snapshot('/mmfs1/data', 'snapshot1')
'/mmfs1/.snapshots/snapshot1/data'
Parameters:
  • path (str) – path to find in the snapshot
  • snap (str) – snapshot to find path in. Note: snap should be a snapshot of the filesystem that path belongs to.
Return type:

str

Directory Scan Convenience Functions

class arcapix.fs.gpfs.clib.utils.scandir(path, snapName=None)

scandir is a directory iterator.

Similar to the one in the Python 3.5 stdlib, implemented using GPFS C lib.

scandir() is a generator version of os.listdir() that returns an iterator over files in a directory, and also exposes extra information (such as type and stat information).

When snapName is specified, the returned paths will be children of the specified snapshot’s directory - e.g.

>>> for i in scandir('/mmfs1/data', 'snap1'):
...     print i.path
/mmfs1/.snapshots/snap1/data
Parameters:
  • path (str) – a path within a GPFS filesystem
  • snapName (str) – name of a snapshot, can be used to iterate over the version of a directory in the named snapshot
Returns:

iterator of GpfsDirEntry objects for given path

next
class arcapix.fs.gpfs.clib.utils.GpfsDirEntry

Object representing an directory entry, as returned by scandir.

inode(self) → int

Returns the inode number of the entry.

is_dir(self) → bool

Returns True if the entry is a directory.

is_file(self) → bool

Returns True if the entry is a regular file.

Returns True if the entry is a symlink.

name

Returns the name of the entry.

path

Returns the full path of the entry.

stat(self)

Returns stat_result for the entry.

Result comes from arcapix.fs.gpfs.clib.file.stat().

arcapix.fs.gpfs.clib.utils.listdir(char *path)

List the contents of a directory.

Like Python os.listdir(), implemented using GPFS C Lib. As with Python, this method follows symlinks.

The list is in arbitrary order. It does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

Parameters:path (str) – path to a directory in a GPFS filesystem
arcapix.fs.gpfs.clib.utils.walk(char *top, bool topdown=True, bool followlinks=False)

Walk a filesystem directory tree.

Like Python os.walk(), implemented using GPFS C Lib

Note: unlike os.walk, clib walk doesn’t ‘see’ the .snapshots directory

Parameters:
  • top (str) – path of root path to walk from
  • topdown (bool) – specifies whether to walk top-down or bottom-up
  • followlinks (bool) – specifies whether to follow symlinks
Returns:

root path, list of directories in root, list of files in root

Return type:

iterator

Inode Scan Convenience Functions

class arcapix.fs.gpfs.clib.utils.inode_iterator(fsName, snapName=None, prevSnap=None, fromInode=0, toInode=0)

inode_iterator is an iterator object, which allows users to perform inode scans

>>> for i in inode_iterator(...):
...     # do something
>>> iscan = inode_iterator(...)
>>> i = iscan.next()
>>> j = next(iscan)

It acts as a convenience for the various inodescan methods.

Parameters:
  • fsName (str) – Name of the Filesystem to be scanned
  • snapName (str) – Name of a snapshot with the named filesystem to scan
  • prevSnap – Name of a previous snapshot, older than snapName If provided, only files that have changed since this snapshot will be returned Pass None to return all inodes from fsName/snapName
  • fromInode (int) – The minimum inode number to scan from
  • toInode (int) – The maximum inode number to scan to. If not specified or 0, all inodes will be returned.

The fromInode and toInode parameters can be used to perform multi-threaded scans.

Returns:iattr namedtuples
close(self)

Close the inode scan.

next

Examples

Walk the filesystem

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import walk
>>>
>>> for root, dirs, files in walk("/mmfs1"):
...     for name in files:
...         print(os.path.join(root, name))
...     for name in dirs:
...         print(os.path.join(root, name))
/mmfs1/test
/mmfs1/data
/mmfs1/.policytmp

Get the filesystem a given path belongs to

>>> from arcapix.fs.gpfs import Filesystem
>>> from arcapix.fs.gpfs.clib.utils import get_fsname_by_path
>>>
>>> fs = Filesystem(get_fsname_by_path('/mmfs1/data'))
>>>
>>> print fs.name
'mmfs1'

Walk the filesystem for a given snapshot

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir
>>>
>>> def walk(root, snap):
...     for i in scandir(root, snap):
...         yield i.path
...         # recurse into the directory
...         if i.is_dir():
...             for d in walk(i.path, snap):
...                 yield d
...
>>> for i in walk('/mmfs1', 'snap1'):
...     print i
...
/mmfs1/.snapshots/snap1/data
/mmfs1/.snapshots/snap1/.policytmp

Calculate the total size of temporary files on the filesystem

>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir, inode_iterator
>>>
>>> # iterator of inode numbers for files that end '.tmp'
>>> def find_inodes(root):
...     for i in scandir(root):
...         if i.name.endswith('.tmp'):
...             yield i.inode()
...         # recurse into the directory
...         if i.is_dir():
...             for d in find_inodes(i.path):
...                 yield d
...
>>> # list of inode number of '.tmp' files
>>> inodes = list(find_inodes('/mmfs1'))
>>>
>>> # create iterator - use max and min to limit scope of scan
>>> itr = inode_iterator('mmfs1', fromInode=min(inode), toInode=max(inodes)+1)
>>>
>>> # add up sizes of inodes in the inode list
>>> print sum(x.ia_size for x in itr if x.ia_inode in inodes)
100421