Profiling

The APProfile object provides a way to profile code which uses the PixStor Python API. In particular, the profiling stats will include details for code run as part of a list processing policy, for example, a management policy containing a MapReduceRule

The only types of rule which support profiling are ListProcessingRule and MapReduceRule.

Profiling statistics are only generated if profiling is enabled via an APProfile object when a policy is run.

The APProfile object provides profiling info for all code run whilst profiling is enabled, not just list processing rules.

When you look at the stats, you will likely find there is a large disparity in the run time reported for ManagementPolicy.run and the time for the list processing code run_processing_rule. This difference is the time the PixStor policy engine spends not executing the list processing code - i.e. the time it spends scanning for files to process, sorting results, etc.

Description

class arcapix.fs.gpfs.profiling.APProfile

Arcapix-compatible alternative to Python builtin Profile.

Unlike the builtin Profile, this will also capture profiling info for functions run in list processing policies.

Can be used as a context manager - e.g.

>>> with APProfile() as prof:
...     # do stuff
...
>>> prof.print_stats(sort='cumulative')

In addition, you can access rule-specific stats for a policy, which was run in context

>>> with APProfile() as prof:
...     policy.run('mmfs1')
...
>>> prof.rules['myrule'].sort_stats('cumulative').print_stats()

Note

If a policy is run multiple times in context, or if multiple policies are run which define rules with the same list names, only the most recent invocation for a given name will be available via APProfile.rules

While this class has a similar interface to cProfile, it should not be considered a drop-in replacement.

Warning

This context manager is not thread-safe

Note

This captures policy stats by patching ManagementPolicy.run. If the code being profiled also patches ManagementPolicy.run, profiling won’t work.

enable()

Enable profiling.

Note - this will clear any previously collected profiling info.

disable()

Disable profiling.

dump_stats(path)

Dump stats to file.

print_stats(sort=-1)

Print stats.

The sort parameter is the same as for cProfile

Examples

Profile a list processing rule using context manager

>>> from arcapix.fs.gpfs import ManagementPolicy, MapReduceRule
>>> from arcapix.fs.gpfs.profiling import APProfile
>>>
>>> policy = ManagementPolicy()
>>>
>>> rule = policy.rules.new(MapReduceRule, 'size', lambda f: f.size)
>>>
>>> with APProfile() as prof:
...     print(policy.run('mmfs1')['size'])
...
10234562
>>>
>>> # write stats for ONLY the 'size' rule to file
... prof.rules['size'].dump_stats('policy.stats')
>>>
>>> # alternatively, you can access rule stats from the policy itself
... policy.stats()['size'].dump_stats('policy.stats')

Profile code which uses a list procesesing rule

>>> from __future__ import print_function
>>> from arcapix.fs.gpfs import Cluster, ManagementPolicy, ListProcessingRule
>>> from arcapix.fs.gpfs.profiling import APProfile
>>>
>>> prof = APProfile()
>>> prof.enable()
>>>
>>> cluster = Cluster()
>>>
>>> def total_size(files):
...     return sum(f.size for f in files)
>>>
>>> policy = ManagementPolicy()
>>>
>>> rule = policy.rules.new(ListProcessingRule, 'size', total_size)
>>>
>>> for fs in cluster.filesystems:
...     print(fs, policy.run(fs)['size'])
...
mmfs1 10234562
mmfs2 1029343
>>>
>>> prof.disable()
>>>
>>> # print stats for ALL code executed while profiling was enabled
... prof.print_stats(sort='cumulative')

Profile a script using the cli tool

The approfile command line tool should get installed as part of the PixStor Python API rpm

# profile a script, outputting stats to 'sizes.prof'
$ approfile -o sizes.prof sizes.py mmfs1

# pip install gprof2dot

# generate an SVG image of the call graph
$ gprof2dot -f pstats sizes.prof | dot -Tsvg -o stats-graph.svg

Note

The call graph for the rules stats are not properly ‘joined up’ with the outer code. You are likely to see the rules as a separate tree in the generated image.

If you don’t see the rules at all in the image, it’s likely because the time spent executing the rules is much less than the time spent in the outer code.