################## Command Line Tools ################## The main command line tool for PixStor Search is ``searchctl``. This provides sub-commands for performing various operations, such as ingest, plugin development, and administration. For more details, see the ``man searchctl`` Additional, client-side tools for PixStor Search are available in the ``arcapix-search-client-utils`` package. .. contents:: :local: Ingest Tools ============ .. _searchctl_ingest: searchctl ingest ---------------- Find files on the filesystem and ingest them into the search database. To ingest a single file, use :ref:`searchctl_add_file` instead. Ingest a directory ^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl ingest /mmfs1/data/sample_data/cats Re-ingest the whole filesystem with a newly installed plugin ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl ingest mmfs1 --plugins colours::ColoursPlugin Regenerate proxies for mov files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl ingest mmfs1 --include "*.mov" --plugins @proxy .. _searchctl_expunge: searchctl expunge ----------------- Find files on the filesystem and remove them from the search database. This will also remove any proxies associated with the removed entries [1]_ .. note:: Since this command scans the filesystem to decide what to expunge from the database, it won't remove entries for any files which no longer exist on the filesystem. To remove deleted files from the database, use :ref:`searchctl_admin_cleandb` instead. To expunge a single file, use :ref:`searchctl_remove_file` or :ref:`searchctl_admin_remove_id` instead. Remove DS_Store files from the database ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl expunge mmfs1 --include .DS_Store Make sure to add ``.DS_Store`` to any ingest excludes to prevent them from being ingested again. .. _searchctl_add_file: searchctl add-file ------------------ Optimised ingest of individual files. Unlike :ref:`searchctl_ingest`, this command doesn't scan the whole filesystem, so is much faster. Regenerate proxies for a file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl add-file /mmfs1/data/sample_data/cats/cats-01.jpg --plugins @proxy .. _searchctl_remove_file: searchctl remove-file --------------------- Optimised removal of individual files. This will also remove any proxies associated with the removed file [1]_ Unlike :ref:`searchctl_expunge`, this command doesn't scan the whole filesystem, so is much faster. .. note:: This command requires that the file being removed still exists on the filesystem. To remove deleted files from the database, you should use :ref:`searchctl_admin_cleandb` or :ref:`searchctl_admin_remove_id` instead. Remove a file from the database ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl remove-file /mmfs1/data/copyrighted.mp4 apsearch-ingest --------------- In addition to ``searchctl``, there is a standalone ``apsearch-ingest`` tool. Unlike ``searchctl ingest``, which is intended for one-shot ingests, ``apsearch-ingest`` is intended for periodic, incremental ingests. ``apsearch-ingest`` is configuration-driven, and handles setting up the ingest environment, such as changing to a designated ingest user and work directory. By comparison, ``searchctl ingest`` runs as the user who invoked the command. .. code-block:: console apsearch-ingest update The default configuration for ``apsearch-ingest`` is under ``/opt/arcapix/etc/search/search.yaml``. In a PixStor system, this file is managed by PixStor, and should only be changed via the ``pixstor config`` command. A different config can be passed to the command as ``apsearch-ingest update /path/to/config.yaml`` Job Tools ========= Long running tasks, such as ingest, are treated as 'jobs'. These 'jobs' can be interacted with using the following commands. .. _searchctl_jobs: searchctl jobs -------------- List long running search jobs. This includes :ref:`searchctl_ingest` and :ref:`searchctl_expunge`, as well as ``apsearch-ingest``. The job listing includes each job's unique 'run id', which can be used to view the job logs or to stop the job early. List active jobs ^^^^^^^^^^^^^^^^ By default, all jobs will be displayed, including those that have already finished running. The following will show only jobs which are actively running .. code-block:: console $ searchctl jobs --active RUNID TASK TARGET STATUS SINCE USER ---------------------------------------------------------------------------------- 24a08635aae540d2a93d651ab31ae131 ingest mmfs1 RUNNING 25 minute ago apsearch See the exact time when a job was started ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the default table output, the 'SINCE' column shows the start time if the job is still running or else the time the job stopped running (completed, failed). This is displayed in a 'human readable' format. To get the exact time that an job was started, refer to the json formatted output .. code-block:: console $ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started 1591301522 This returns the time as a unix timestamp. This can be reformatted using, e.g. .. code-block:: console $ date -d @$(searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started) Thu 4 Jun 21:12:02 BST 2020 The time at which the job ended is also provided in the json output as ``ended`` Get the name of the screen session an ingest is running in ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It's typical for an ingest to be run in a screen session, so it can be left running in the background. If you forget the name of the screen session, the following will tell you what it is .. code-block:: console $ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .screen "1670352.ingest" searchctl logs -------------- Show log entries for one or more jobs. The logs are displayed in a pager (``less``) View the info level logs for an ingest ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use :ref:`searchctl_jobs` to determine the unique 'run id' for a specific ingest .. code-block:: console $ searchctl logs 066b1a53a6fb4e96ac8631c57b5c3e12 --level info .. note:: An ingest must be run with ``APLOGLEVEL=info`` (or with a more verbose level) for info level messages to be recorded and viewable. .. _searchctl_stop: searchctl stop -------------- Stop one or more search jobs. Stop an ingest ^^^^^^^^^^^^^^ Use :ref:`searchctl_jobs` to determine the unique 'run id' for a specific ingest .. code-block:: console $ searchctl stop 066b1a53a6fb4e96ac8631c57b5c3e12 Stop all jobs running as root ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl stop --user root Admin Tools =========== Admin commands are meant for administrative users. They are nested under the admin subcommand .. code-block:: console $ searchctl admin --help usage: searchctl admin [-h] COMMAND ... positional arguments: COMMAND status check the status of apsearch services auto-config suggest configurations for ingest locate-proxy find the path to a proxy for a given file verify-file verify whether a single file has been ingested verify-ingest verify which files have been ingested clean-proxies clean up orphaned proxy files cleandb remove items from db which don't exist on the filesystem remove-id remove a single file from the index by id optional arguments: -h, --help show this help message and exit Run 'searchctl admin COMMAND --help' for more information on a specific command. .. _searchctl_admin_status: searchctl admin status ---------------------- Check status of apsearch related services. This can be used to identify the source of issues, for example if you are seeing 500 errors whilst browsing the PixStor Search UI. .. code-block:: console $ searchctl admin status SERVICE STATUS =========================== apsearch-middleware OK nginx OK elasticsearch DOWN apcore-auth OK condor OK gpfs OK elastic-index DOWN end-to-end DOWN searchctl admin auto-config --------------------------- Suggest performant ingest settings. .. note:: Currently this only supports stat-only ingest Auto-configuration for apsearch-ingest ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Copy the generated configs to ``search.yaml`` .. code-block:: console $ searchctl admin auto-config mmfs1 --plugins @stat-only nodes: - pixstor-mn-001 policy_options: dirThreadLevel: 4 globalWorkDirectory: /mmfs1/.policytmp/ iscanBuckets: 1 iscanThreads: 4 localWorkDirectory: /mmfs1/.policytmp/ maxFiles: 1500 threadLevel: 4 Auto-configuration for searchctl ingest ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Format the suggested configurations as CLI flags that can be passed to :ref:`searchctl_ingest` .. code-block:: console $ searchctl admin auto-config mmfs1 --plugins @stat-only --cli -N pixstor-mn-001 --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500" $ searchctl ingest mmfs1 --plugins @stat-only -N pixstor-mn-001 \ --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500" .. _searchctl_admin_locate_proxy: searchctl admin locate-proxy ---------------------------- Find the path to a proxy for a given file. This is useful for debugging - to check whether the proxy has been generated, what its permissions are, etc. Find the thumbnail for an image file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl admin locate-proxy /mmfs1/data/sample_data/cats/cats-01.jpg image.thumbnail /mmfs1/apsearch/proxies/044/482/549/4448254956900308779.png Check if a preview video was generate for a mov file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl admin locate-proxy /mmfs1/data/sample_data/sample.mov video.preview MissingField: 'video.preview' $ echo $? 1 The above may indicate that the asynchronous job generating the video preview is still running or has failed. This can be confirmed by checking ``condor_q`` .. note:: This plugin only reports on the contents of the search database. The returned path may not exist on the filesystem - e.g. it may have been deleted. searchctl admin verify-file --------------------------- Verify whether a single file has been ingested .. code-block:: console $ searchctl admin verify-file /mmfs1/data/example.mov --plugins @proxy Status INCOMPLETE Last Ingested 2021-09-18 10:23:03 Modification Time 2021-09-17 23:19:31 === PLUGINS ========================================== default::DefaultPlugin 2021-09-18 10:23:03 videpreview::VideoPreview NOT INGESTED videpreview::VideoThumbnail *2021-09-17 13:53:20 .. _searchctl_admin_verify_ingest: searchctl admin verify-ingest ----------------------------- Verify which files were successfully ingested. Summary report ^^^^^^^^^^^^^^ By default, the command will output a summary of the ingest status of files .. code-block:: console $ searchctl admin verify-ingest mmfs1 187 files scanned (2GB) 29 ingested for all plugins (2GB) 90 ingested with some plugins missing (71MB) 68 not ingested (9KB) Migrate ingested files with ngmigrate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Verify-ingest can generate lists of all files that are fully ingested. Those lists can then be passed to ngenea to migrate those files to offline storage. .. note:: Requires ngenea 1.9 or newer. Ngenea accepts newline terminated lists, but only if the listed paths don't contain newline characters. Therefore, generating null-terminated lists is most safe. .. code-block:: console # generate null-terminated lists $ searchctl admin verify-ingest mmfs1 --write-ingested --list-directory /mmfs1/apsearch --null-terminated # merge generated lists into a single file $ /usr/bin/cat /mmfs1/apsearch/ingested/* > /mmfs1/apsearch/tomigrate.list # migrate to offline storage $ ngmigrate -f /mmfs1/apsearch/tomigrate.list --filelist-format NUL If APBackup is being run on the cluster, some files may have already been pre-migrated. To ensure that any migrations from the generated lists don't conflict with APBackup's operation, add ``--with-xattr user.APXstier ARCHIVED`` to generating lists of only files which have already been processed by APBackup .. _searchctl_admin_clean_proxies: searchctl admin clean-proxies ----------------------------- Clean up orphaned proxy files. This may be necessary if files were removed from the database with the 'retain proxies' setting enabled, or if the underlying elasticsearch database was manually altered or dropped. .. code-block:: console $ searchctl admin clean-proxies Running this command periodically my be necessary for space saving or for compliance, e.g. ensuring material which has a copyright claim against it is removed. .. _searchctl_admin_cleandb: searchctl admin cleandb ----------------------- Clean up db entries for non-existent files. Under normal operation, if a file was deleted from the filesystem, an incremental ingest with the 'prune directory' plugin will detect that the file was deleted and remove the database entry. If the file no longer exists, it cannot be removed with :ref:`searchctl_expunge` or :ref:`searchctl_remove_file`. .. code-block:: console $ searchctl admin cleandb mmfs1 .. _searchctl_admin_remove_id: searchctl admin remove-id ------------------------- Remove a single file entry from the database by id. This will also remove any proxies associated with the removed file [1]_ Unlike :ref:`searchctl_remove_file`, the file doesn't need to exist on the filesystem to be removed. .. code-block:: console $ searchctl admin remove-id 3629588116342303481 .. hint:: The id for a given path can be found using ``pxs_file_list`` .. code-block:: console $ pxs_file_list -p /mmfs1/data/sample_data/cats/cats-01.jpg -F _id | cut -d, -f2 3629588116342303481 Plugin Tools ============ Plugin commands are used for examining plugins. These may be useful to plugin developers. They are nested under the plugins subcommand .. code-block:: console $ searchctl plugins --help usage: searchctl plugins [-h] COMMAND ... positional arguments: COMMAND list list installed plugins check check a plugin for potential issues benchmark benchmark a plugin against a test file optional arguments: -h, --help show this help message and exit Run 'searchctl plugins COMMAND --help' for more information on a specific command. searchctl plugins list ---------------------- List or view details about the currently enabled plugins List enabled plugins ^^^^^^^^^^^^^^^^^^^^ This is the most reliable way to see which plugins are currently enabled. If the output doesn't match what you expect, you may need to reapply salt state. .. code-block:: console $ searchctl plugins list location::LocationPlugin videopreview::VideoThumbnail imagepreview::CoreImageThumbnail image::ImagePlugin sha512hash::Sha512HashOfflinePlugin video::VideoImagePlugin sha512hash::Sha512HashPlugin prunedirectory::PruneDirectoryPlugin desktopconnector::DesktopConnectorFilePlugin stat::StatPlugin gpfs::GPFSPolicyPlugin camera::CameraPlugin common_attributes::CommonAttributesPlugin psd_exr_preview::PSDEXRthumbnail photoshop::PhotoshopMetaDataPlugin video::VideoPlugin dpx::DpxPlugin desktopconnector::DesktopConnectorDirPlugin videopreview::VideoPreview Test a set of plugin filters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Prior to running an ingest with plugin filters, it's a good idea to check which plugins will be selected .. code-block:: console $ searchctl plugins list image --exclude @proxy photoshop::PhotoshopMetaDataPlugin image::ImagePlugin allblack::AllBlackImagePlugin video::VideoImagePlugin $ searchctl ingest mmfs1 --plugins image --exclude-plugins @proxy See more details about a plugin ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl plugins list imagepreview::CoreImageThumbnail --long name: imagepreview::CoreImageThumbnail description: General purpose thumbnail and preview generator for image files. module: imagepreview:: class: CoreImageThumbnail namespace: image priority: 0 groups: - @core - @all - @sync - @proxy - @offline-unsafe - @non-lab .. _searchctl_plugins_check: searchctl plugins check ----------------------- Check a plugin for potential issues. This can also be used for debugging why a plugin generated unexpected or no metadata for a given file. For an example of 'check-driven' plugin development, see :doc:`plugin_development_walkthru` Check a plugin for potential issues ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: color_space ❗ no metadata extracted ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: creationtime ❗ no metadata extracted ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: icc_profile ❗ no metadata extracted ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: rendering_intent ❗ no metadata extracted The above indicates that some of the metadata fields defined in the ``ImagePlugin`` schema could not be extracted for the given test file. Messages like this might be because the test file doesn't provide that metadata, or because the plugin has some bug which means those fields aren't being properly extracted. To confirm, we can check the plugin against a wider variety of test files. .. code-block:: console $ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/*.jpg See what metadata the plugin extracts from a file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To view the actual extracted metadata, set the logging level to ``notify`` (or more verbose) .. code-block:: console $ APLOGLEVEL=notify searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg ... NOTIFY:arcapix.search.metadata.plugins.validation:Extracted metadata: { "bitdepth": 8, "orientation": "Horizontal (normal)", "megapixels": 0.563, "height": 563, "width": 1000, "aspect_ratio": 1.7761989342806395, "resolution": 72.0 } ... Check proxies that a plugin generates for a file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As with the above, set the logging level to ``notify``, and run with ``--keep-proxies`` .. code-block:: console $ APLOGLEVEL=notify searchctl plugins check imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/cats-01.jpg --keep-proxies ... NOTIFY:arcapix.search.metadata.plugins.validation:Generated proxies: [ { "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpbw1ScZ.png", "mimetype": "image/png", "filename": "preview.png", "typeidentifier": "preview" }, { "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png", "mimetype": "image/png", "filename": "thumb.png", "typeidentifier": "thumbnail" } ] ✔️ No issues found! $ file /mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png /mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png: PNG image data, 150 x 150, 8-bit/color RGBA, non-interlaced .. _searchctl_plugins_benchmark: searchctl plugins benchmark --------------------------- Benchmark a plugin against one or more test files This is useful for approximating how much longer an ingest might take if the plugin is enabled, or conversely, how much ingest time will be save by disabling the plugin. Benchmark thumbnail generation for a directory of jpegs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: console $ searchctl plugins benchmark imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/*.jpg /mmfs1/data/sample_data/cats/cats-1.jpg: 33.567 ms per call (10 calls) /mmfs1/data/sample_data/cats/cats-2.jpg: 77.257 ms per call (10 calls) /mmfs1/data/sample_data/cats/cats-3.jpg: 32.379 ms per call (10 calls) /mmfs1/data/sample_data/cats/cats-4.jpg: 33.276 ms per call (10 calls) /mmfs1/data/sample_data/cats/cats-5.jpg: 77.373 ms per call (10 calls) Average: 57.220 ms +- 24.333 ms Range: 32.379 ms +- 77.373 ms Deprecated Tools ================ Searchctl replaces various previously existing commands. These old commands are considered deprecated, and will be removed in a future release. The following table shows which searchctl command should be used in place of the deprecated tools .. list-table:: :header-rows: 1 :widths: 25 75 * - Depreated command - Searchctl replacement * - finder add - :ref:`searchctl_ingest` * - finder stop - :ref:`searchctl_stop` * - clean_proxies - :ref:`searchctl_admin_clean_proxies` * - cleandb - :ref:`searchctl_admin_cleandb` * - find_proxy - :ref:`searchctl_admin_locate_proxy` * - profile_plugin - :ref:`searchctl_plugins_benchmark` * - validate_plugin - :ref:`searchctl_plugins_check` Additionally, ``finder update`` is deprecated in favour of ``apsearch-ingest update`` .. rubric:: Footnotes .. [1] unless PixStor Search is configured to retain proxies