##################
Command Line Tools
##################

The main command line tool for PixStor Search is ``searchctl``.

This provides sub-commands for performing various operations, such as ingest, plugin development, and administration.

For more details, see the ``man searchctl``

Additional, client-side tools for PixStor Search are available in the ``arcapix-search-client-utils`` package.

.. contents::
    :local:

Ingest Tools
============

.. _searchctl_ingest:

searchctl ingest
----------------

Find files on the filesystem and ingest them into the search database.

To ingest a single file, use :ref:`searchctl_add_file` instead.

Ingest a directory
^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl ingest /mmfs1/data/sample_data/cats


Re-ingest the whole filesystem with a newly installed plugin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl ingest mmfs1 --plugins colours::ColoursPlugin


Regenerate proxies for mov files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl ingest mmfs1 --include "*.mov" --plugins @proxy

.. _searchctl_expunge:

searchctl expunge
-----------------

Find files on the filesystem and remove them from the search database.

This will also remove any proxies associated with the removed entries [1]_

.. note::

    Since this command scans the filesystem to decide what to expunge from the database,
    it won't remove entries for any files which no longer exist on the filesystem.

    To remove deleted files from the database, use :ref:`searchctl_admin_cleandb` instead.

To expunge a single file, use :ref:`searchctl_remove_file` or :ref:`searchctl_admin_remove_id` instead.

Remove DS_Store files from the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl expunge mmfs1 --include .DS_Store

Make sure to add ``.DS_Store`` to any ingest excludes to prevent them from being ingested again.

.. _searchctl_add_file:

searchctl add-file
------------------

Optimised ingest of individual files.

Unlike :ref:`searchctl_ingest`, this command doesn't scan the whole filesystem, so is much faster.

Regenerate proxies for a file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl add-file /mmfs1/data/sample_data/cats/cats-01.jpg --plugins @proxy

.. _searchctl_remove_file:

searchctl remove-file
---------------------

Optimised removal of individual files.

This will also remove any proxies associated with the removed file [1]_

Unlike :ref:`searchctl_expunge`, this command doesn't scan the whole filesystem, so is much faster.

.. note::

    This command requires that the file being removed still exists on the filesystem.

    To remove deleted files from the database, you should use
    :ref:`searchctl_admin_cleandb` or :ref:`searchctl_admin_remove_id` instead.

Remove a file from the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl remove-file /mmfs1/data/copyrighted.mp4

apsearch-ingest
---------------

In addition to ``searchctl``, there is a standalone ``apsearch-ingest`` tool.

Unlike ``searchctl ingest``, which is intended for one-shot ingests, ``apsearch-ingest`` is intended for periodic, incremental ingests.

``apsearch-ingest`` is configuration-driven, and handles setting up the ingest environment,
such as changing to a designated ingest user and work directory. By comparison, ``searchctl ingest`` runs as the user who invoked the command.

.. code-block:: console

    apsearch-ingest update

The default configuration for ``apsearch-ingest`` is under ``/opt/arcapix/etc/search/search.yaml``.
In a PixStor system, this file is managed by PixStor, and should only be changed via the ``pixstor config`` command.

A different config can be passed to the command as ``apsearch-ingest update /path/to/config.yaml``

Job Tools
=========

Long running tasks, such as ingest, are treated as 'jobs'. These 'jobs' can be interacted with using the following commands.

.. _searchctl_jobs:

searchctl jobs
--------------

List long running search jobs.

This includes :ref:`searchctl_ingest` and :ref:`searchctl_expunge`, as well as ``apsearch-ingest``.

The job listing includes each job's unique 'run id', which can be used to view the job logs or to stop the job early.

List active jobs
^^^^^^^^^^^^^^^^

By default, all jobs will be displayed, including those that have already finished running.
The following will show only jobs which are actively running

.. code-block:: console

    $ searchctl jobs --active

    RUNID                             TASK    TARGET  STATUS   SINCE          USER
    ----------------------------------------------------------------------------------
    24a08635aae540d2a93d651ab31ae131  ingest  mmfs1   RUNNING  25 minute ago  apsearch


See the exact time when a job was started
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In the default table output, the 'SINCE' column shows the start time if the job is still running
or else the time the job stopped running (completed, failed). This is displayed in a 'human readable' format.

To get the exact time that an job was started, refer to the json formatted output

.. code-block:: console

    $ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started
    1591301522

This returns the time as a unix timestamp. This can be reformatted using, e.g.

.. code-block:: console

    $ date -d @$(searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .started)
    Thu  4 Jun 21:12:02 BST 2020

The time at which the job ended is also provided in the json output as ``ended``

Get the name of the screen session an ingest is running in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's typical for an ingest to be run in a screen session, so it can be left running in the background.
If you forget the name of the screen session, the following will tell you what it is

.. code-block:: console

    $ searchctl jobs 066b1a53a6fb4e96ac8631c57b5c3e12 --json | jq .screen
    "1670352.ingest"

searchctl logs
--------------

Show log entries for one or more jobs.

The logs are displayed in a pager (``less``)

View the info level logs for an ingest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Use :ref:`searchctl_jobs` to determine the unique 'run id' for a specific ingest

.. code-block:: console

    $ searchctl logs 066b1a53a6fb4e96ac8631c57b5c3e12 --level info

.. note::

    An ingest must be run with ``APLOGLEVEL=info`` (or with a more verbose level)
    for info level messages to be recorded and viewable.

.. _searchctl_stop:

searchctl stop
--------------

Stop one or more search jobs.

Stop an ingest
^^^^^^^^^^^^^^

Use :ref:`searchctl_jobs` to determine the unique 'run id' for a specific ingest

.. code-block:: console

    $ searchctl stop 066b1a53a6fb4e96ac8631c57b5c3e12


Stop all jobs running as root
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl stop --user root


Admin Tools
===========

Admin commands are meant for administrative users. They are nested under the admin subcommand

.. code-block:: console

    $ searchctl admin --help
    usage: searchctl admin [-h] COMMAND ...

    positional arguments:
    COMMAND
        status         check the status of apsearch services
        auto-config    suggest configurations for ingest
        locate-proxy   find the path to a proxy for a given file
        verify-file    verify whether a single file has been ingested
        verify-ingest  verify which files have been ingested
        clean-proxies  clean up orphaned proxy files
        cleandb        remove items from db which don't exist on the filesystem
        remove-id      remove a single file from the index by id

    optional arguments:
    -h, --help     show this help message and exit

    Run 'searchctl admin COMMAND --help' for more information on a specific command.

.. _searchctl_admin_status:

searchctl admin status
----------------------

Check status of apsearch related services.

This can be used to identify the source of issues,
for example if you are seeing 500 errors whilst browsing the PixStor Search UI.

.. code-block:: console

    $ searchctl admin status

    SERVICE              STATUS
    ===========================
    apsearch-middleware    OK
    nginx                  OK
    elasticsearch         DOWN
    apcore-auth            OK
    condor                 OK
    gpfs                   OK
    elastic-index         DOWN
    end-to-end            DOWN


searchctl admin auto-config
---------------------------

Suggest performant ingest settings.

.. note::

    Currently this only supports stat-only ingest

Auto-configuration for apsearch-ingest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy the generated configs to ``search.yaml``

.. code-block:: console

    $ searchctl admin auto-config mmfs1 --plugins @stat-only
    nodes:
      - pixstor-mn-001
    policy_options:
      dirThreadLevel: 4
      globalWorkDirectory: /mmfs1/.policytmp/
      iscanBuckets: 1
      iscanThreads: 4
      localWorkDirectory: /mmfs1/.policytmp/
      maxFiles: 1500
      threadLevel: 4

Auto-configuration for searchctl ingest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Format the suggested configurations as CLI flags that can be passed to :ref:`searchctl_ingest`

.. code-block:: console

    $ searchctl admin auto-config mmfs1 --plugins @stat-only --cli
    -N pixstor-mn-001 --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500"

    $ searchctl ingest mmfs1 --plugins @stat-only -N pixstor-mn-001 \
        --policy-options="-s /mmfs1/.policytmp/ -g /mmfs1/.policytmp/ -a 4 -A 1 -n 4 -m 4 -B 1500"

.. _searchctl_admin_locate_proxy:

searchctl admin locate-proxy
----------------------------

Find the path to a proxy for a given file.

This is useful for debugging - to check whether the proxy has been generated, what its permissions are, etc.

Find the thumbnail for an image file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl admin locate-proxy /mmfs1/data/sample_data/cats/cats-01.jpg image.thumbnail
    /mmfs1/apsearch/proxies/044/482/549/4448254956900308779.png

Check if a preview video was generate for a mov file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl admin locate-proxy /mmfs1/data/sample_data/sample.mov video.preview
    MissingField: 'video.preview'

    $ echo $?
    1

The above may indicate that the asynchronous job generating the video preview is still running or has failed.
This can be confirmed by checking ``condor_q``

.. note::

    This plugin only reports on the contents of the search database.
    The returned path may not exist on the filesystem - e.g. it may have been deleted.

searchctl admin verify-file
---------------------------

Verify whether a single file has been ingested

.. code-block:: console

    $ searchctl admin verify-file /mmfs1/data/example.mov --plugins @proxy

    Status                                      INCOMPLETE

    Last Ingested                      2021-09-18 10:23:03
    Modification Time                  2021-09-17 23:19:31

    === PLUGINS ==========================================

    default::DefaultPlugin             2021-09-18 10:23:03
    videpreview::VideoPreview                NOT  INGESTED
    videpreview::VideoThumbnail       *2021-09-17 13:53:20

.. _searchctl_admin_verify_ingest:

searchctl admin verify-ingest
-----------------------------

Verify which files were successfully ingested.

Summary report
^^^^^^^^^^^^^^

By default, the command will output a summary of the ingest status of files

.. code-block:: console

    $ searchctl admin verify-ingest mmfs1

    187 files scanned (2GB)
    29 ingested for all plugins (2GB)
    90 ingested with some plugins missing (71MB)
    68 not ingested (9KB)

Migrate ingested files with ngmigrate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Verify-ingest can generate lists of all files that are fully ingested.
Those lists can then be passed to ngenea to migrate those files to offline storage.

.. note::

    Requires ngenea 1.9 or newer.

Ngenea accepts newline terminated lists, but only if the listed paths don't contain newline characters.
Therefore, generating null-terminated lists is most safe.

.. code-block:: console

    # generate null-terminated lists
    $ searchctl admin verify-ingest mmfs1 --write-ingested --list-directory /mmfs1/apsearch --null-terminated

    # merge generated lists into a single file
    $ /usr/bin/cat /mmfs1/apsearch/ingested/* > /mmfs1/apsearch/tomigrate.list

    # migrate to offline storage
    $ ngmigrate -f /mmfs1/apsearch/tomigrate.list --filelist-format NUL

If APBackup is being run on the cluster, some files may have already been pre-migrated.

To ensure that any migrations from the generated lists don't conflict with APBackup's operation,
add ``--with-xattr user.APXstier ARCHIVED`` to generating lists of only
files which have already been processed by APBackup

.. _searchctl_admin_clean_proxies:

searchctl admin clean-proxies
-----------------------------

Clean up orphaned proxy files.

This may be necessary if files were removed from the database with the 'retain proxies' setting enabled,
or if the underlying elasticsearch database was manually altered or dropped.

.. code-block:: console

    $ searchctl admin clean-proxies

Running this command periodically my be necessary for space saving or for compliance,
e.g. ensuring material which has a copyright claim against it is removed.

.. _searchctl_admin_cleandb:

searchctl admin cleandb
-----------------------

Clean up db entries for non-existent files.

Under normal operation, if a file was deleted from the filesystem,
an incremental ingest with the 'prune directory' plugin
will detect that the file was deleted and remove the database entry.

If the file no longer exists, it cannot be removed with :ref:`searchctl_expunge` or :ref:`searchctl_remove_file`.

.. code-block:: console

    $ searchctl admin cleandb mmfs1

.. _searchctl_admin_remove_id:

searchctl admin remove-id
-------------------------

Remove a single file entry from the database by id.

This will also remove any proxies associated with the removed file [1]_

Unlike :ref:`searchctl_remove_file`, the file doesn't need to exist on the filesystem to be removed.

.. code-block:: console

    $ searchctl admin remove-id 3629588116342303481

.. hint::

    The id for a given path can be found using ``pxs_file_list``

    .. code-block:: console

        $ pxs_file_list -p /mmfs1/data/sample_data/cats/cats-01.jpg -F _id | cut -d, -f2
        3629588116342303481

Plugin Tools
============

Plugin commands are used for examining plugins. These may be useful to plugin developers. They are nested under the plugins subcommand

.. code-block:: console

    $ searchctl plugins --help
    usage: searchctl plugins [-h] COMMAND ...

    positional arguments:
    COMMAND
        list       list installed plugins
        check      check a plugin for potential issues
        benchmark  benchmark a plugin against a test file

    optional arguments:
    -h, --help  show this help message and exit

    Run 'searchctl plugins COMMAND --help' for more information on a specific command.

searchctl plugins list
----------------------

List or view details about the currently enabled plugins

List enabled plugins
^^^^^^^^^^^^^^^^^^^^

This is the most reliable way to see which plugins are currently enabled.
If the output doesn't match what you expect, you may need to reapply salt state.

.. code-block:: console

    $ searchctl plugins list
    location::LocationPlugin
    videopreview::VideoThumbnail
    imagepreview::CoreImageThumbnail
    image::ImagePlugin
    sha512hash::Sha512HashOfflinePlugin
    video::VideoImagePlugin
    sha512hash::Sha512HashPlugin
    prunedirectory::PruneDirectoryPlugin
    desktopconnector::DesktopConnectorFilePlugin
    stat::StatPlugin
    gpfs::GPFSPolicyPlugin
    camera::CameraPlugin
    common_attributes::CommonAttributesPlugin
    psd_exr_preview::PSDEXRthumbnail
    photoshop::PhotoshopMetaDataPlugin
    video::VideoPlugin
    dpx::DpxPlugin
    desktopconnector::DesktopConnectorDirPlugin
    videopreview::VideoPreview


Test a set of plugin filters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Prior to running an ingest with plugin filters, it's a good idea to check which plugins will be selected

.. code-block:: console

    $ searchctl plugins list image --exclude @proxy
    photoshop::PhotoshopMetaDataPlugin
    image::ImagePlugin
    allblack::AllBlackImagePlugin
    video::VideoImagePlugin

    $ searchctl ingest mmfs1 --plugins image --exclude-plugins @proxy


See more details about a plugin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl plugins list imagepreview::CoreImageThumbnail --long
    name: imagepreview::CoreImageThumbnail
    description: General purpose thumbnail and preview generator for image files.
    module: imagepreview::
    class: CoreImageThumbnail
    namespace: image
    priority: 0
    groups:
    - @core
    - @all
    - @sync
    - @proxy
    - @offline-unsafe
    - @non-lab

.. _searchctl_plugins_check:

searchctl plugins check
-----------------------

Check a plugin for potential issues.

This can also be used for debugging why a plugin generated unexpected or no metadata for a given file.

For an example of 'check-driven' plugin development, see :doc:`plugin_development_walkthru`

Check a plugin for potential issues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg
    ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: color_space
     ❗ no metadata extracted
    ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: creationtime
     ❗ no metadata extracted
    ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: icc_profile
     ❗ no metadata extracted
    ImagePlugin :: /mmfs1/data/sample_data/cats/cats-01.jpg :: metadata :: image :: rendering_intent
     ❗ no metadata extracted

The above indicates that some of the metadata fields defined in the ``ImagePlugin`` schema
could not be extracted for the given test file.
Messages like this might be because the test file doesn't provide that metadata,
or because the plugin has some bug which means those fields aren't being properly extracted.

To confirm, we can check the plugin against a wider variety of test files.

.. code-block:: console

    $ searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/*.jpg

See what metadata the plugin extracts from a file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To view the actual extracted metadata, set the logging level to ``notify`` (or more verbose)

.. code-block:: console

    $ APLOGLEVEL=notify searchctl plugins check image::ImagePlugin /mmfs1/data/sample_data/cats/cats-01.jpg
    ...
    NOTIFY:arcapix.search.metadata.plugins.validation:Extracted metadata:
    {
        "bitdepth": 8,
        "orientation": "Horizontal (normal)",
        "megapixels": 0.563,
        "height": 563,
        "width": 1000,
        "aspect_ratio": 1.7761989342806395,
        "resolution": 72.0
    }
    ...

Check proxies that a plugin generates for a file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As with the above, set the logging level to ``notify``, and run with ``--keep-proxies``

.. code-block:: console

    $ APLOGLEVEL=notify searchctl plugins check imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/cats-01.jpg --keep-proxies
    ...
    NOTIFY:arcapix.search.metadata.plugins.validation:Generated proxies:
    [
        {
            "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpbw1ScZ.png",
            "mimetype": "image/png",
            "filename": "preview.png",
            "typeidentifier": "preview"
        },
        {
            "proxy_path": "/mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png",
            "mimetype": "image/png",
            "filename": "thumb.png",
            "typeidentifier": "thumbnail"
        }
    ]
    ✔️ No issues found!

    $ file /mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png
    /mmfs1/apsearch/proxies/.proxytmp/tmpDkalQv.png: PNG image data, 150 x 150, 8-bit/color RGBA, non-interlaced

.. _searchctl_plugins_benchmark:

searchctl plugins benchmark
---------------------------

Benchmark a plugin against one or more test files

This is useful for approximating how much longer an ingest might take if the plugin is enabled,
or conversely, how much ingest time will be save by disabling the plugin.

Benchmark thumbnail generation for a directory of jpegs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    $ searchctl plugins benchmark imagepreview::CoreImageThumbnail /mmfs1/data/sample_data/cats/*.jpg
    /mmfs1/data/sample_data/cats/cats-1.jpg: 33.567 ms per call  (10 calls)
    /mmfs1/data/sample_data/cats/cats-2.jpg: 77.257 ms per call  (10 calls)
    /mmfs1/data/sample_data/cats/cats-3.jpg: 32.379 ms per call  (10 calls)
    /mmfs1/data/sample_data/cats/cats-4.jpg: 33.276 ms per call  (10 calls)
    /mmfs1/data/sample_data/cats/cats-5.jpg: 77.373 ms per call  (10 calls)
    Average:  57.220 ms +- 24.333 ms
    Range:    32.379 ms +- 77.373 ms


Deprecated Tools
================

Searchctl replaces various previously existing commands.
These old commands are considered deprecated, and will be removed in a future release.

The following table shows which searchctl command should be used in place of the deprecated tools

.. list-table::
    :header-rows: 1
    :widths: 25 75

    * - Depreated command
      - Searchctl replacement
    * - finder add
      - :ref:`searchctl_ingest`
    * - finder stop
      - :ref:`searchctl_stop`
    * - clean_proxies
      - :ref:`searchctl_admin_clean_proxies`
    * - cleandb
      - :ref:`searchctl_admin_cleandb`
    * - find_proxy
      - :ref:`searchctl_admin_locate_proxy`
    * - profile_plugin
      - :ref:`searchctl_plugins_benchmark`
    * - validate_plugin
      - :ref:`searchctl_plugins_check`


Additionally, ``finder update`` is deprecated in favour of ``apsearch-ingest update``


.. rubric:: Footnotes

.. [1] unless PixStor Search is configured to retain proxies