Rule

A Rule object allows you to manipulate selections of files on a Filesystem according to specific conditions held in the rule’s Criteria Object

There are currently 11 types of rule:

You can also create Macros, Variable, Includes, and Comments to add to a policy.

Note

Rule names are optional, though recommended. If you do not give a rule a name, a random id will be generated. This id can be useful within the API - E.G. to delete a rule policy.rules.destroy(rule.id)

This id won’t be included in the rule’s PixStor string representation.

Note

By default, items returned by List, ListProcessing and MapReduce rules will be sorted by inode number. The sort option can be used to change the order of the listed items.

However, the order can only be guaranteed when the associated policy is run on a single node. When the policy is run across multiple nodes, it is almost guaranteed that the results will be out of order. In this case, you should apply your own sorting after the fact - but be aware that doing this on a large filesystem is likely to cause resource contention.

Description

class arcapix.fs.gpfs.rule.SetPoolRule(target=None, **kwargs)

Set Pool rules specify into which pool files will be placed if they match the rule’s criteria.

>>> r = SetPoolRule(target='sata1')
>>> r.change(replicate=2)
>>> r.criteria.new(Criteria.like('NAME', '*.mpg'))

All placement policies should have a ‘default’ set pool rule.

target

Returns the target pool (SET POOL)

Return type:str
criteria

Returns the rule’s Criteria object

Return type:Criteria
change(**kwargs)

Change rule options

Parameters:
  • target – Name of a pool into which files should be placed
  • limit (float) – Used to limit the creation of data in a storage pool
  • replicate (int) – Number of replicas to make of matching files (1-3)
  • fileset (str or list of str) – Fileset or list of filesets for the rule to match
  • action – A SQL expression to be evaluated if all other rule clauses are met
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.MigrateRule(target=None, **kwargs)

Migrate Rules move files which match the rule’s criteria from a source pool to a new target pool

>>> r = MigrateRule(source='sata1', target='sata2')
>>> r.change(limit=50, sort='KB_ALLOCATED')
>>> r.change(when=MigrateRule.RunOnDay(1))  # run on sunday
target

Returns the target pool (TO POOL)

Return type:str
source

Returns the source pool (FROM POOL)

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
static show(*args, **kwargs)

Build a rule ‘SHOW’ expression

Parameters:A collection of strings and file attributes,
Returns:a SQL expression that will display the provided strings and attributes. This expression can be passed to change(show=...)

e.g.

>>> print MigrateRule.show("size=", "KB_ALLOCATED")
"([' size=' || VARCHAR(KB_ALLOCATED)])"

>>> rule.change(show=MigrateRule.show("size=", "KB_ALLOCATED"))
static show_all()

Build a rule ‘SHOW’ expression to list all available file attributes.

Returns:a SQL expression to list all file attributes

Use as:

>>> rule.change(show=MigrateRule.show_all())
static show_attributes(*attrs)

Build a rule ‘SHOW’ expression to list one or more specified file attributes.

Parameters:*attrs – a list of file attributes to list
Returns:a SQL expression to show the specified attributes
>>> print MigrateRule.show_attributes('NAME')
"(' name=' || varchar(NAME))"

>>> rule.change(show=MigrateRule.show_attributes('NAME', 'KB_ALLOCATED')
static RunOnDay(day)

Build a WHEN expression to restrict rule applicability to a particular day (Sun=1 to Sat=7)

Parameters:day (int) – day on which to run
Returns:a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=MigrateRule.RunOnDay(1))  # Sunday
Raise:ValueError
static RunOnDays(*days, **kwargs)

Build a WHEN expression to restrict rule applicability to a particular set of days (Sun=1 to Sat=7)

Parameters:
  • days (list of int (values between 1 and 7)) – days on which to run
  • exclude – Pass True to exclude the listed days (default=False)
Returns:

a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=MigrateRule.RunOnDays(7,1))  # weekend
Raise:ValueError
change(**kwargs)

Change rule options

Parameters:
  • when – Sql expression for when the rule should be applied. Use MigrateRule.RunOnDay() or MigrateRule.RunOnDays() to easily build an expression.
  • source – pool from which files should be selected
  • target – Name of a pool into which files should be placed
  • sort – Attribute or expression by which files should be sorted
  • threshold – Occupacy percentage thresholds that should trigger a rule (high, low, premigrate)
  • limit (float) – Used to limit the creation of data in a storage pool
  • replicate (int) – Number of replicas to make of matching files (1-3)
  • fileset – Fileset or list of filesets for the rule to match
  • show – Sql expression of strings and attributes for the rule to show. These expressions can be built using MigrateRule.show(), MigrateRule.show_all(), or MigrateRule.show_attributes()
  • size – expression that defines the size of files (default=kb_allocated)
  • action – A SQL expression to be evaluated if all other rule clauses are met
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.DeleteRule(**kwargs)

Delete Rules delete any files which match the rule’s criteria

>>> r = DeleteRule(source='sata2')
>>> r.change(threshold=(85, 60), sort='FILE_HEAT')
>>> r.criteria.new(Criteria.like('NAME', '*.tmp'))
source

Returns the source pool (FROM POOL)

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
static show(*args, **kwargs)

Build a rule ‘SHOW’ expression

Parameters:A collection of strings and file attributes,
Returns:a SQL expression that will display the provided strings and attributes. This expression can be passed to change(show=...)

e.g.

>>> print DeleteRule.show("size=", "KB_ALLOCATED")
"([' size=' || VARCHAR(KB_ALLOCATED)])"

>>> rule.change(show=DeleteRule.show("size=", "KB_ALLOCATED"))
static show_all()

Build a rule ‘SHOW’ expression to list all available file attributes.

Returns:a SQL expression to list all file attributes

Use as:

>>> rule.change(show=DeleteRule.show_all())
static show_attributes(*attrs)

Build a rule ‘SHOW’ expression to list one or more specified file attributes.

Parameters:*attrs – a list of file attributes to list
Returns:a SQL expression to show the specified attributes
>>> print DeleteRule.show_attributes('NAME')
"(' name=' || varchar(NAME))"

>>> rule.change(show=DeleteRule.show_attributes('NAME', 'KB_ALLOCATED')
static RunOnDay(day)

Build a WHEN expression to restrict rule applicability to a particular day (Sun=1 to Sat=7)

Parameters:day (int) – day on which to run
Returns:a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=DeleteRule.RunOnDay(1))  # Sunday
Raise:ValueError
static RunOnDays(*days, **kwargs)

Build a WHEN expression to restrict rule applicability to a particular set of days (Sun=1 to Sat=7)

Parameters:
  • days (list of int (values between 1 and 7)) – days on which to run
  • exclude – Pass True to exclude the listed days (default=False)
Returns:

a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=DeleteRule.RunOnDays(7,1))  # weekend
Raise:ValueError
change(**kwargs)

Change rule options

Parameters:
  • when – Sql expression for when the rule should be applied. Use DeleteRule.RunOnDay() or DeleteRule.RunOnDays() to easily build an expression.
  • source – pool from which files should be selected
  • sort – Attribute or expression by which files should be sorted
  • threshold – Occupacy percentage thresholds that should trigger a rule (high, low, premigrate)
  • directories_plus – Include non-regular file objects in list (default=False)
  • fileset – Fileset or list of filesets for the rule to match
  • show – Sql expression of strings and attributes for the rule to show. These expressions can be built using DeleteRule.show(), DeleteRule.show_all(), or DeleteRule.show_attributes()
  • size – expression that defines the size of files (default=kb_allocated)
  • action – A SQL expression to be evaluated if all other rule clauses are met
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
class arcapix.fs.gpfs.rule.ExcludeRule(**kwargs)

Exclude Rules exclude any matching files from any Migrate or Delete rules which follow it.

>>> ex = ExcludeRule(fileset='root')
>>> r = DeleteRule(source='sata1')
>>> r.criteria.new(Criteria.lt('FILE_SIZE', 4*1024*1024))
source

Returns the source pool (FROM POOL)

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
static RunOnDay(day)

Build a WHEN expression to restrict rule applicability to a particular day (Sun=1 to Sat=7)

Parameters:day (int) – day on which to run
Returns:a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=ExcludeRule.RunOnDay(1))  # Sunday
Raise:ValueError
static RunOnDays(*days, **kwargs)

Build a WHEN expression to restrict rule applicability to a particular set of days (Sun=1 to Sat=7)

Parameters:
  • days (list of int (values between 1 and 7)) – days on which to run
  • exclude – Pass True to exclude the listed days (default=False)
Returns:

a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=ExcludeRule.RunOnDays(7,1))  # weekend
Raise:ValueError
change(**kwargs)

Change rule options

Parameters:
  • when – Sql expression for when the rule should be applied. Use ExcludeRule.RunOnDay() or ExcludeRule.RunOnDays() to easily build an expression.
  • source – pool from which files should be selected
  • directories_plus – Include non-regular file objects in list (default=False)
  • fileset – Fileset or list of filesets for the rule to match
  • action – A SQL expression to be evaluated if all other rule clauses are met
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
class arcapix.fs.gpfs.rule.ListRule(listname=None, **kwargs)

List rules generate a list of files which match the rule’s criteria.

ListRules need to have a corresponding ExternalListRule with the same listname

>>> ext = ExternalListRule(listname='foo', script='./cleanup.sh')
>>> lst = ListRule(listname='foo')
>>> lst.change(directories_plus=True)
>>> lst.change(show="('filename= ' || varchar(NAME))")
source

Returns the source pool (FROM POOL)

Return type:str
listname

Returns the name of the list rule’s list

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
static show(*args, **kwargs)

Build a rule ‘SHOW’ expression

Parameters:A collection of strings and file attributes,
Returns:a SQL expression that will display the provided strings and attributes. This expression can be passed to change(show=...)

e.g.

>>> print ListRule.show("size=", "KB_ALLOCATED")
"([' size=' || VARCHAR(KB_ALLOCATED)])"

>>> rule.change(show=ListRule.show("size=", "KB_ALLOCATED"))
static show_all()

Build a rule ‘SHOW’ expression to list all available file attributes.

Returns:a SQL expression to list all file attributes

Use as:

>>> rule.change(show=ListRule.show_all())
static show_attributes(*attrs)

Build a rule ‘SHOW’ expression to list one or more specified file attributes.

Parameters:*attrs – a list of file attributes to list
Returns:a SQL expression to show the specified attributes
>>> print ListRule.show_attributes('NAME')
"(' name=' || varchar(NAME))"

>>> rule.change(show=ListRule.show_attributes('NAME', 'KB_ALLOCATED')
static RunOnDay(day)

Build a WHEN expression to restrict rule applicability to a particular day (Sun=1 to Sat=7)

Parameters:day (int) – day on which to run
Returns:a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=ListRule.RunOnDay(1))  # Sunday
Raise:ValueError
static RunOnDays(*days, **kwargs)

Build a WHEN expression to restrict rule applicability to a particular set of days (Sun=1 to Sat=7)

Parameters:
  • days (list of int (values between 1 and 7)) – days on which to run
  • exclude – Pass True to exclude the listed days (default=False)
Returns:

a SQL expression to be used with change(when=...)

e.g.

>>> rule.change(when=ListRule.RunOnDays(7,1))  # weekend
Raise:ValueError
change(**kwargs)

Change rule options

Parameters:
  • when – Sql expression for when the rule should be applied. Use ListRule.RunOnDay() or ListRule.RunOnDays() to easily build an expression.
  • exclude – Pass True to exclude files which match the list criteria
  • source – pool from which files should be selected
  • directories_plus – Include non-regular file objects in list (default=False)
  • sort – Attribute or expression by which files should be sorted
  • threshold – Occupacy percentage thresholds that should trigger a rule (high, low, premigrate)
  • fileset – Fileset or list of filesets for the rule to match
  • show – Sql expression of strings and attributes for the rule to show. These expressions can be built using ListRule.show(), ListRule.show_all(), or ListRule.show_attributes()
  • size – expression that defines the size of files (default=kb_allocated)
  • action – A SQL expression to be evaluated if all other rule clauses are met
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.RestoreRule(target=None, **kwargs)

Restore rules restore files which match the rule’s criteria to some target pool.

>>> r = RestoreRule(target='system')
>>> r.change(limit=75, replicate=1)
>>> r.criteria.new(Criteria.gt('KB_ALLOCATED', 50*1024))
target

Returns the target pool (TO POOL)

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
change(**kwargs)

Change rule options

Parameters:
  • target – Name of a pool into which files should be placed
  • limit (float) – Used to limit the creation of data in a storage pool
  • replicate – Number of replicas to make of matching files (1-3)
  • fileset – Fileset or list of filesets for the rule to match
  • action – A SQL expression to be evaluated if all other rule clauses are met
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.ExternalListRule(listname=None, script=None, **kwargs)

External List rules run some external program on the file lists generated by a corresponding (regular) List rule

The listname for corresponding List and External List rules must match

>>> ext = ExternalListRule(listname='bar', script='./analyse.py')
>>> ext.change(options=time.time())
>>> lst = ListRule(listname='bar')
>>> lst.change(show=ListRule.show_all())
listname

Returns the name of the list rule’s list

Return type:str
script

Returns the rule’s script

Return type:str
change(**kwargs)

Change rule options

Parameters:
  • script – An external script or program to which the file list should be passed
  • opts – optional parameters to be passed to the rule’s script
  • size – limit on the number of bytes on all files in each list passed to script
  • threshold – type of capacity-managed resources the corresponding ListRule threshold applies to
  • escape – Specifies what special characters should be escaped in matches files and show strings
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.ExternalPoolRule(poolname=None, script=None, **kwargs)

External Pool rules define external storage pools.

This allows external storage to be used with GPFS policies.

>>> p = ExternalPoolRule(poolname='fast', script='./controller.sh')
>>> r = MigrateRule(target='fast')
poolname

Returns the name of the rule’s external pool

Return type:str
script

Returns the rule’s script

Return type:str
change(**kwargs)

Change rule options

Parameters:
  • script – An external script or program to which the file list should be passed
  • opts – optional parameters to be passed to the rule’s script
  • size – limit on the number of bytes on all files in each list passed to script
  • escape – which special characters should be escaped in matches files and show strings
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.GroupPoolRule(poolname=None, **kwargs)

Group Pool rule defines a group pool, made up of multiple other pools.

This allows files to be distributed across several disk pools.

>>> r = GroupPoolRule(poolname='scatter')
>>> r.addPool('sas1', limit=50)
>>> r.addPool('sas2', limit=25)
>>> r.addPool('sata1', limit=10)
>>> m = MigrateRule(target='scatter')
>>> m.criteria.new(Criteria.gt('KB_ALLOCATED', 1024*1024))
poolname

Returns the name of the rule’s external pool

Return type:str
addPool(pool, limit=-1)

Add a new pool to the Group Pool

Parameters:
  • pool – Name of the pool to add
  • limit (int) – The percentage limit to which the pool should be filled (default=99)
removePool(pool)

Remove a pool from the Group Pool

Parameters:pool – name of pool to remove
pools

Returns a list of the pools that make up the Group Pool

Return type:list of str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule (string)

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.ListProcessingRule(listname=None, processor=None, **kwargs)

A List Processing Rule takes a list of files and applies the processor function to create some output. This is like the MapReduceRule, but with all the functions combined into a single processor function.

The list-processing process works as below:

>>> return processor(file for file in file_list)

This makes defining a processing scheme easier and more concise that the equivalent Map-Reduce, but it also means the processor can’t be applied in parallel, so may run slower than the map-reduce.

The processor function should accept a generator of AttrDict

type

Returns the (GPFS) type of the rule

Return type:str
processor

Returns the ListProcessing processor function

Return type:callable
change(**kwargs)
Parameters:
  • processor – A callable to be applied by the rule
  • when – Sql expression for when the rule should be applied
  • exclude – Pass True to exclude files which match the list criteria
  • source – pool from which files should be selected
  • directories_plus – Include non-regular file objects in list (default=False)
  • sort – Attribute or expression by which files should be sorted
  • threshold – Occupacy percentage thresholds that should trigger a rule (high, low, premigrate)
  • fileset – Fileset or list of filesets for the rule to match
  • size – expression that defines the size of files (default=kb_allocated)
  • action – A SQL expression to be evaluated if all other rule clauses are met
  • show – Sql expression of strings and attributes for the rule to show. These expressions can be built using ListProcessingRule.show(), ListProcessingRule.show_all(), ListProcessingRule.show_attributes(), or ListProcessingRule.show_performance().
  • escape – RFC3986 encode paths and SHOW string, excluding the specified characters
static show(*args, **kwargs)

Build a rule ‘SHOW’ expression

Parameters:A collection of strings and file attributes,
Returns:a SQL expression that will display the provided strings and attributes. This expression can be passed to change(show=...)

e.g.

>>> print ListProcessingRule.show("size=", "KB_ALLOCATED")
"([' size=' || VARCHAR(KB_ALLOCATED)])"

>>> rule.change(show=ListProcessingRule.show("size=", "KB_ALLOCATED"))
static show_all()

Build a rule ‘SHOW’ expression to list all available file attributes.

Returns:a SQL expression to list all file attributes

Use as:

>>> rule.change(show=ListProcessingRule.show_all())
static show_attributes(*attrs)

Build a rule ‘SHOW’ expression to list one or more specified file attributes.

Parameters:*attrs – a list of file attributes to list
Returns:a SQL expression to show the specified attributes
>>> print ListProcessingRule.show_attributes('NAME')
"(' name=' || varchar(NAME))"

>>> rule.change(show=ListProcessingRule.show_attributes('NAME', 'KB_ALLOCATED')
show_performance()

Build a rule ‘SHOW’ epxression to list only the attributes needed by the rule’s processing function.

Returns:a SQL expression to show the relevant attributes

E.G.

>>> r = ListProcessingRule('test', processor=lambda x : sum(f.filesize for f in x))
>>> r.show_performance()
"(' filesize=' || varchar(FILE_SIZE))"

>>> r.change(show=r.show_performance())

Note

This is an experimental function.

If this function fails to find any file attributes in the processing function, ListProcessingRule.show_all() will be returned.

validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
criteria

Returns the rules criteria object

Return type:Criteria
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
listname

Returns the name of the list rule’s list

Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule (string)

Return type:str
class arcapix.fs.gpfs.rule.MapReduceRule(listname=None, mapfn=None, reducefn=<built-in function iadd>, **kwargs)

A Map Reduce Rule applies the map function to a list of files (for example, getting the file sizes) and then applies the reduce function to combine the map results to a single value (e.g. adding the file sizes to get a total for all files).

You can also specify an initial value and an output function, which will post-process the result of the map-reduce.

The map-reduce process (effectively) works like below:

>>> result = initial
>>> for file in file_list:
...     result = reducefn(result, mapfn(file))
>>> return output(result)

Using map-reduce allows loop section of this process to be run in parallel for greater speed and efficiency.

mapfn should accept an AttrDict-type object, and reducefn should accept whatever type mapfn returns.

If ‘initial’ isn’t specified, the returned value from the first call to mapfn is made the de facto initial value (note, not specifying an initial value may lead to bugs or unexpected behaviour).

Specifying ‘output’ allows you to do post-processing on the result of the map-reduce process. For example, to calculate average filesize, you might use

>>> MapReduceRule( 'average', mapfn=lambda x: [x.filesize],
...                 reducefn=lambda x,y: x+y, initial=[],
...                 output=lambda x: sum(x)/len(x) )

If output isn’t specified, the result from the final call to reducefn will be returned as is.

type

Returns the (GPFS) type of the rule

Return type:str
mapfn

Returns the rule’s map function

Return type:callable
reducefn

Returns the rule’s reduce function

Return type:callable
initial

Returns the rule’s initial value

Return type:Same type as expected return value from map
output

Returns the rule’s output function. Used to post-process map-reduce result

Return type:callable
change(**kwargs)
Parameters:
  • mapfn – Map function
  • reducefn – Reduce function
  • initial – Initial value
  • output – output function
  • when – Sql expression for when the rule should be applied
  • exclude – Pass True to exclude files which match the list criteria
  • source – pool from which files should be selected
  • directories_plus – Include non-regular file objects in list (default=False)
  • sort – Attribute or expression by which files should be sorted
  • threshold – Occupacy percentage thresholds that should trigger a rule (high, low, premigrate)
  • fileset – Fileset or list of filesets for the rule to match
  • size – expression that defines the size of files (default=kb_allocated)
  • action – A SQL expression to be evaluated if all other rule clauses are met
  • show – Sql expression of strings and attributes for the rule to show. These expressions can be built using MapReduceRule.show(), MapReduceRule.show_all(), MapReduceRule.show_attributes(), or MapReduceRule.show_performance().
  • escape – RFC3986 encode paths and SHOW string, excluding the specified characters
static show(*args, **kwargs)

Build a rule ‘SHOW’ expression

Parameters:A collection of strings and file attributes,
Returns:a SQL expression that will display the provided strings and attributes. This expression can be passed to change(show=...)

e.g.

>>> print MapReduceRule.show("size=", "KB_ALLOCATED")
"([' size=' || VARCHAR(KB_ALLOCATED)])"

>>> rule.change(show=MapReduceRule.show("size=", "KB_ALLOCATED"))
static show_all()

Build a rule ‘SHOW’ expression to list all available file attributes.

Returns:a SQL expression to list all file attributes

Use as:

>>> rule.change(show=MapReduceRule.show_all())
static show_attributes(*attrs)

Build a rule ‘SHOW’ expression to list one or more specified file attributes.

Parameters:*attrs – a list of file attributes to list
Returns:a SQL expression to show the specified attributes
>>> print MapReduceRule.show_attributes('NAME')
"(' name=' || varchar(NAME))"

>>> rule.change(show=MapReduceRule.show_attributes('NAME', 'KB_ALLOCATED')
show_performance()

Build a rule ‘SHOW’ epxression to list only the attributes needed by the rule’s map function.

Returns:a SQL expression to show the relevant attributes

E.G.

>>> r = MapReduceRule('test', mapfn=lambda f: f.filesize)
>>> r.show_performance()
"(' filesize=' || varchar(FILE_SIZE))"

>>> r.change(show=r.show_performance())

Note

This is an experimental function.

If this function fails to find any file attributes in the map function, MapReduceRule.show_all() will be returned.

criteria

Returns the rules criteria object

Return type:Criteria
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
listname

Returns the name of the list rule’s list

Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule (string)

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
Raises:IOError if the policy driver script is not executable
Raises:OSError if a work file backup directory is configured and is not writable
class arcapix.fs.gpfs.rule.EventRule(event=None, action=None, **kwargs)
>>> r = EventRule(event='WRITE')
>>> r.change(action="System('/bin/echo ' || 'KB:' || varchar(FILE_SIZE))")

Note

This is a new type of rule introduced in 4.2.0, currently undocumented.

This rule type and its implementation are subject to change.

event

Returns the event associated with the rule

Return type:str
action

Returns the action to be performed by the rule (some SQL expression)

Return type:str
criteria

Returns the rules criteria object

Return type:Criteria
change(**kwargs)

Change rule options

Parameters:
  • event – The event type that triggers the rule
  • action – A SQL expression to be evaluated if all other rule clauses are met
  • directories_plus – Include non-regular file objects in list (default=False)
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.Comment(comment)

Comment objects allow you to add comments to a policy object

>>> c = Comment('hello world')
Parameters:comment – A comment string
value

Returns the comment string

Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule

This gives the rule as it would be written in a policy file

Return type:str
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.Macro(variableName, expression, disabled=False)

Macros allow you to define variables that map to some other value or (m4 macro) expression

>>> m = Macro('LAST_ACCESS', '(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))')
>>> r = DeleteRule()
>>> r.criteria.new(Criteria.gt('LAST_ACCESS', 365))
Parameters:
  • variableName – Name of the macro you’d like to define
  • expression – A SQL expression
name

Returns the name associated with the macro

Return type:str
expression

Returns the SQL expression that defines the macro

Return type:str
validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule (string)

Return type:str
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
class arcapix.fs.gpfs.rule.Include(path, disabled=False)

Include pulls the contents of another file into the policy.

This allows for e.g. having a shared library of m4 macros.

path

Path to a file to include in the policy.

rules

Rules loaded from the included path.

Note

These objects are considered read-only. Any changes will not be saved to file

validate(macros=None)

Check the rule is valid, in particular whether required values are missing

Raises:AttributeError if any required field isn’t set
toGpfsString(excludeComments=False, excludeDisabled=False)

Convert the rule object to a GPFS rule (string)

Return type:str
id

Returns the id of the rule

Returns:name, or an autogenerated name if unnamed
Return type:str
name

Returns the name of the rule

Returns ‘Unnamed <rule_type> rule’ if the rule is unnamed.

Return type:str
type

Returns the (GPFS) type of the rule

Return type:str
arcapix.fs.gpfs.rule.Variable

alias of arcapix.fs.gpfs.common._Variable

Examples

Create a list rule showing file paths, sorted by modification time

>>> from arcapix.fs.gpfs import ListRule
>>>
>>> # Create a list rule
... r = ListRule(listname='mylist')
>>>
>>> # Set the list order
... r.change(sort='CURRENT_TIMESTAMP - MODIFICATION_TIME')
>>>
>>> # Specify what to list
... r.change(show=ListRule.show('PATH_NAME'))
>>>
>>> # Print PixStor string
... print(r.toGpfsString())

RULE LIST 'mylist'
 WEIGHT(CURRENT_TIMESTAMP - MODIFICATION_TIME)
 SHOW([VARCHAR(PATH_NAME)])

Create a rule that deletes files from a temporary pool on Sundays

>>> from arcapix.fs.gpfs import DeleteRule
>>>
>>> # Create new rule
... myrule = DeleteRule(name='del-sun')
>>>
>>> # Set the pool to delete from
... myrule.change(source='temp_pool')
>>>
>>> # Set day of week to run on (Sun = 1)
... myrule.change(when=DeleteRule.RunOnDay(1))
>>>
>>> # Print PixStor string
... print(r.toGpfsString())

RULE 'del-sun' WHEN(DayOfWeek(CURRENT_DATE)=1) DELETE
 FROM POOL 'temp_pool'

Create a rule to find out how much space is being used by ‘.tmp’ files

>>> from arcapix.fs.gpfs import ManagementPolicy, ListProcessingRule, Criteria
>>>
>>> # Create a Management Policy
... p = ManagementPolicy()
>>>
>>> # Create a ListProcessing Rule
... r = p.rules.new(ListProcessingRule, listname='temp_list', processor=lambda lst: sum(x.filesize for x in lst))
>>>
>>> # Add criteria to specify filetype
... r.criteria.new(Criteria.like('name', '*.tmp'))
>>>
>>> # Run policy
... print(p.run('mmfs1'))

{'temp_list': 208457}