Command Line Tools

The main command line tool for PixStor Search is searchctl.

This provides sub-commands for performing various operations, such as ingest, plugin development, and administration.

For more details, see the man searchctl

Additional, client-side tools for PixStor Search are available in the arcapix-search-client-utils package.

Ingest Tools

searchctl ingest

Find files on the filesystem and ingest them into the search database.

To ingest a single file, use searchctl add-file instead.

Ingest a directory

$ searchctl ingest /mmfs1/data/sample_data/cats

Re-ingest the whole filesystem with a newly installed plugin

$ searchctl ingest mmfs1 --plugins colours::ColoursPlugin

Regenerate proxies for mov files

$ searchctl ingest mmfs1 --include "*.mov" --plugins @proxy

searchctl expunge

Find files on the filesystem and remove them from the search database.

This will also remove any proxies associated with the removed entries [1]

Note

Since this command scans the filesystem to decide what to expunge from the database, it won’t remove entries for any files which no longer exist on the filesystem.

To remove deleted files from the database, use searchctl admin cleandb instead.

To expunge a single file, use searchctl remove-file or searchctl admin remove-id instead.

Remove DS_Store files from the database

$ searchctl expunge mmfs1 --include .DS_Store

Make sure to add .DS_Store to any ingest excludes to prevent them from being ingested again.

searchctl add-file

Optimised ingest of individual files.

Unlike searchctl ingest, this command doesn’t scan the whole filesystem, so is much faster.

Regenerate proxies for a file

$ searchctl add-file /mmfs1/data/sample_data/cats/cats-01.jpg --plugins @proxy

searchctl remove-file

Optimised removal of individual files.

This will also remove any proxies associated with the removed file [1]

Unlike searchctl expunge, this command doesn’t scan the whole filesystem, so is much faster.

Note

This command requires that the file being removed still exists on the filesystem.

To remove deleted files from the database, you should use searchctl admin cleandb or searchctl admin remove-id instead.

Remove a file from the database

$ searchctl remove-file /mmfs1/data/copyrighted.mp4

apsearch-ingest

In addition to searchctl, there is a standalone apsearch-ingest tool.

Unlike searchctl ingest, which is intended for one-shot ingests, apsearch-ingest is intended for periodic, incremental ingests.

apsearch-ingest is configuration-driven, and handles setting up the ingest environment, such as changing to a designated ingest user and work directory. By comparison, searchctl ingest runs as the user who invoked the command.

apsearch-ingest update

The default configuration for apsearch-ingest is under /opt/arcapix/etc/search/search.yaml. In a PixStor system, this file is managed by PixStor, and should only be changed via the pixstor config command.

A different config can be passed to the command as apsearch-ingest update /path/to/config.yaml

Job Tools

Long running tasks, such as ingest, are treated as ‘jobs’. These ‘jobs’ can be interacted with using the following commands.

searchctl jobs

List long running search jobs.

This includes searchctl ingest and searchctl expunge, as well as apsearch-ingest.

The job listing includes each job’s unique ‘run id’, which can be used to view the job logs or to stop the job early.

List active jobs

By default, all jobs will be displayed, including those that have already finished running. The following will show only jobs which are actively running

$ searchctl jobs --active

RUNID                             TASK    TARGET  STATUS   SINCE          USER
----------------------------------------------------------------------------------
24a08635aae540d2a93d651ab31ae131  ingest  mmfs1   RUNNING  25 minute ago  apsearch

See the exact time when a job was started

In the default table output, the ‘SINCE’ column shows the start time if the job is still running or else the time the job stopped running (completed, failed). This is displayed in a ‘human readable’ format.

To get the exact time that an job was started, refer to the json formatted output

$ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started
1591301522

This returns the time as a unix timestamp. This can be reformatted using, e.g.

$ date -d @$(searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started)
Thu  4 Jun 21:12:02 BST 2020

The time at which the job ended is also provided in the json output as ended

Get the name of the screen session an ingest is running in

It’s typical for an ingest to be run in a screen session, so it can be left running in the background. If you forget the name of the screen session, the following will tell you what it is

$ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .screen
"1670352.ingest"

searchctl logs

Show log entries for one or more jobs.

The logs are displayed in a pager (less)

View the info level logs for an ingest

Use searchctl jobs to determine the unique ‘run id’ for a specific ingest

$ searchctl logs 066b1a53a6fb4e96ac8631c57b5c3e12 --level info

Note

An ingest must be run with APLOGLEVEL=info (or with a more verbose level) for info level messages to be recorded and viewable.

searchctl stop

Stop one or more search jobs.

Stop an ingest

Use searchctl jobs to determine the unique ‘run id’ for a specific ingest

$ searchctl stop 066b1a53a6fb4e96ac8631c57b5c3e12

Stop all jobs running as root

$ searchctl stop --user root

Admin Tools

Admin commands are meant for administrative users. They are nested under the admin subcommand

$ searchctl admin --help
usage: searchctl admin [-h] COMMAND ...

positional arguments:
COMMAND
    status         check the status of apsearch services
    auto-config    suggest configurations for ingest
    locate-proxy   find the path to a proxy for a given file
    verify-file    verify whether a single file has been ingested
    verify-ingest  verify which files have been ingested
    clean-proxies  clean up orphaned proxy files
    cleandb        remove items from db which don't exist on the filesystem
    remove-id      remove a single file from the index by id

optional arguments:
-h, --help     show this help message and exit

Run 'searchctl admin COMMAND --help' for more information on a specific command.

searchctl admin status

Check status of apsearch related services.

This can be used to identify the source of issues, for example if you are seeing 500 errors whilst browsing the PixStor Search UI.

$ searchctl admin status

SERVICE              STATUS
===========================
apsearch-middleware    OK
nginx                  OK
elasticsearch         DOWN
apcore-auth            OK
condor                 OK
gpfs                   OK
elastic-index         DOWN
end-to-end            DOWN

searchctl admin auto-config

Suggest performant ingest settings.

Note

Currently this only supports stat-only ingest

Auto-configuration for apsearch-ingest

Copy the generated configs to search.yaml

$ searchctl admin auto-config mmfs1 --plugins @stat-only
nodes:
  - pixstor-mn-001
policy_options:
  dirThreadLevel: 4
  globalWorkDirectory: /mmfs1/.policytmp/
  iscanBuckets: 1
  iscanThreads: 4
  localWorkDirectory: /mmfs1/.policytmp/
  maxFiles: 1500
  threadLevel: 4

Auto-configuration for searchctl ingest

Format the suggested configurations as CLI flags that can be passed to searchctl ingest

$ searchctl admin auto-config mmfs1 --plugins @stat-only --cli
-N pixstor-mn-001 --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500"

$ searchctl ingest mmfs1 --plugins @stat-only -N pixstor-mn-001 \
    --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500"

searchctl admin locate-proxy

Find the path to a proxy for a given file.

This is useful for debugging - to check whether the proxy has been generated, what its permissions are, etc.

Find the thumbnail for an image file

$ searchctl admin locate-proxy /mmfs1/data/sample_data/cats/cats-01.jpg image.thumbnail
/mmfs1/apsearch/proxies/044/482/549/4448254956900308779.png

Check if a preview video was generate for a mov file

$ searchctl admin locate-proxy /mmfs1/data/sample_data/sample.mov video.preview
MissingField: 'video.preview'

$ echo $?
1

The above may indicate that the asynchronous job generating the video preview is still running or has failed. This can be confirmed by checking condor_q

Note

This plugin only reports on the contents of the search database. The returned path may not exist on the filesystem - e.g. it may have been deleted.

searchctl admin verify-file

Verify whether a single file has been ingested

$ searchctl admin verify-file /mmfs1/data/example.mov --plugins @proxy

Status                                      INCOMPLETE

Last Ingested                      2021-09-18 10:23:03
Modification Time                  2021-09-17 23:19:31

=== PLUGINS ==========================================

default::DefaultPlugin             2021-09-18 10:23:03
videpreview::VideoPreview                NOT  INGESTED
videpreview::VideoThumbnail       *2021-09-17 13:53:20

searchctl admin verify-ingest

Verify which files were successfully ingested.

Summary report

By default, the command will output a summary of the ingest status of files

$ searchctl admin verify-ingest mmfs1

187 files scanned (2GB)
29 ingested for all plugins (2GB)
90 ingested with some plugins missing (71MB)
68 not ingested (9KB)

Migrate ingested files with ngmigrate

Verify-ingest can generate lists of all files that are fully ingested. Those lists can then be passed to ngenea to migrate those files to offline storage.

Note

Requires ngenea 1.9 or newer.

Ngenea accepts newline terminated lists, but only if the listed paths don’t contain newline characters. Therefore, generating null-terminated lists is most safe.

# generate null-terminated lists
$ searchctl admin verify-ingest mmfs1 --write-ingested --list-directory /mmfs1/apsearch --null-terminated

# merge generated lists into a single file
$ /usr/bin/cat /mmfs1/apsearch/ingested/* > /mmfs1/apsearch/tomigrate.list

# migrate to offline storage
$ ngmigrate -f /mmfs1/apsearch/tomigrate.list --filelist-format NUL

If APBackup is being run on the cluster, some files may have already been pre-migrated.

To ensure that any migrations from the generated lists don’t conflict with APBackup’s operation, add --with-xattr user.APXstier ARCHIVED to generating lists of only files which have already been processed by APBackup

searchctl admin clean-proxies

Clean up orphaned proxy files.

This may be necessary if files were removed from the database with the ‘retain proxies’ setting enabled, or if the underlying elasticsearch database was manually altered or dropped.

$ searchctl admin clean-proxies

Running this command periodically my be necessary for space saving or for compliance, e.g. ensuring material which has a copyright claim against it is removed.

searchctl admin cleandb

Clean up db entries for non-existent files.

Under normal operation, if a file was deleted from the filesystem, an incremental ingest with the ‘prune directory’ plugin will detect that the file was deleted and remove the database entry.

If the file no longer exists, it cannot be removed with searchctl expunge or searchctl remove-file.

$ searchctl admin cleandb mmfs1

searchctl admin remove-id

Remove a single file entry from the database by id.

This will also remove any proxies associated with the removed file [1]

Unlike searchctl remove-file, the file doesn’t need to exist on the filesystem to be removed.

$ searchctl admin remove-id 3629588116342303481

Hint

The id for a given path can be found using pxs_file_list

$ pxs_file_list -p /mmfs1/data/sample_data/cats/cats-01.jpg -F _id | cut -d, -f2
3629588116342303481

Plugin Tools

Plugin commands are used for examining plugins. These may be useful to plugin developers. They are nested under the plugins subcommand

$ searchctl plugins --help
usage: searchctl plugins [-h] COMMAND ...

positional arguments:
COMMAND
    list       list installed plugins
    check      check a plugin for potential issues
    benchmark  benchmark a plugin against a test file

optional arguments:
-h, --help  show this help message and exit

Run 'searchctl plugins COMMAND --help' for more information on a specific command.

searchctl plugins list

List or view details about the currently enabled plugins

List enabled plugins

This is the most reliable way to see which plugins are currently enabled. If the output doesn’t match what you expect, you may need to reapply salt state.

$ searchctl plugins list
location::LocationPlugin
videopreview::VideoThumbnail
imagepreview::CoreImageThumbnail
image::ImagePlugin
sha512hash::Sha512HashOfflinePlugin
video::VideoImagePlugin
sha512hash::Sha512HashPlugin
prunedirectory::PruneDirectoryPlugin
desktopconnector::DesktopConnectorFilePlugin
stat::StatPlugin
gpfs::GPFSPolicyPlugin
camera::CameraPlugin
common_attributes::CommonAttributesPlugin
psd_exr_preview::PSDEXRthumbnail
photoshop::PhotoshopMetaDataPlugin
video::VideoPlugin
dpx::DpxPlugin
desktopconnector::DesktopConnectorDirPlugin
videopreview::VideoPreview

Test a set of plugin filters

Prior to running an ingest with plugin filters, it’s a good idea to check which plugins will be selected

$ searchctl plugins list image --exclude @proxy
photoshop::PhotoshopMetaDataPlugin
image::ImagePlugin
allblack::AllBlackImagePlugin
video::VideoImagePlugin

$ searchctl ingest mmfs1 --plugins image --exclude-plugins @proxy

See more details about a plugin

$ searchctl plugins list imagepreview::CoreImageThumbnail --long
name: imagepreview::CoreImageThumbnail
description: General purpose thumbnail and preview generator for image files.
module: imagepreview::
class: CoreImageThumbnail
namespace: image
priority: 0
groups:
- @core
- @all
- @sync
- @proxy
- @offline-unsafe
- @non-lab

searchctl plugins check

Check a plugin for potential issues.

This can also be used for debugging why a plugin generated unexpected or no metadata for a given file.

For an example of ‘check-driven’ plugin development, see Plugin Development Walk-through

Check a plugin for potential issues

$ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg
ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: color_space
 ❗ no metadata extracted
ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: creationtime
 ❗ no metadata extracted
ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: icc_profile
 ❗ no metadata extracted
ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: rendering_intent
 ❗ no metadata extracted

The above indicates that some of the metadata fields defined in the ImagePlugin schema could not be extracted for the given test file. Messages like this might be because the test file doesn’t provide that metadata, or because the plugin has some bug which means those fields aren’t being properly extracted.

To confirm, we can check the plugin against a wider variety of test files.

$ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/*.jpg

See what metadata the plugin extracts from a file

To view the actual extracted metadata, set the logging level to notify (or more verbose)

$ APLOGLEVEL=notify searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg
...
NOTIFY:arcapix.search.metadata.plugins.validation:Extracted metadata:
{
    "bitdepth": 8,
    "orientation": "Horizontal (normal)",
    "megapixels": 0.563,
    "height": 563,
    "width": 1000,
    "aspect_ratio": 1.7761989342806395,
    "resolution": 72.0
}
...

Check proxies that a plugin generates for a file

As with the above, set the logging level to notify, and run with --keep-proxies

$ APLOGLEVEL=notify searchctl plugins check imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/cats-01.jpg --keep-proxies
...
NOTIFY:arcapix.search.metadata.plugins.validation:Generated proxies:
[
    {
        "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpbw1ScZ.png",
        "mimetype": "image/png",
        "filename": "preview.png",
        "typeidentifier": "preview"
    },
    {
        "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png",
        "mimetype": "image/png",
        "filename": "thumb.png",
        "typeidentifier": "thumbnail"
    }
]
✔️ No issues found!

$ file /mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png
/mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png: PNG image data, 150 x 150, 8-bit/color RGBA, non-interlaced

searchctl plugins benchmark

Benchmark a plugin against one or more test files

This is useful for approximating how much longer an ingest might take if the plugin is enabled, or conversely, how much ingest time will be save by disabling the plugin.

Benchmark thumbnail generation for a directory of jpegs

$ searchctl plugins benchmark imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/*.jpg
/mmfs1/data/sample_data/cats/cats-1.jpg: 33.567 ms per call  (10 calls)
/mmfs1/data/sample_data/cats/cats-2.jpg: 77.257 ms per call  (10 calls)
/mmfs1/data/sample_data/cats/cats-3.jpg: 32.379 ms per call  (10 calls)
/mmfs1/data/sample_data/cats/cats-4.jpg: 33.276 ms per call  (10 calls)
/mmfs1/data/sample_data/cats/cats-5.jpg: 77.373 ms per call  (10 calls)
Average:  57.220 ms +- 24.333 ms
Range:    32.379 ms +- 77.373 ms

Deprecated Tools

Searchctl replaces various previously existing commands. These old commands are considered deprecated, and will be removed in a future release.

The following table shows which searchctl command should be used in place of the deprecated tools

Depreated command Searchctl replacement
finder add searchctl ingest
finder stop searchctl stop
clean_proxies searchctl admin clean-proxies
cleandb searchctl admin cleandb
find_proxy searchctl admin locate-proxy
profile_plugin searchctl plugins benchmark
validate_plugin searchctl plugins check

Additionally, finder update is deprecated in favour of apsearch-ingest update

Footnotes

[1](1, 2, 3) unless PixStor Search is configured to retain proxies