The ingest phase scans the filesystem, and builds a number of sets of files to be ingested in parallel. This process is performed by the broker
For each pool, the broker examines each file in turn. Core metadata is extracted, and then plugins are offered access to the file to update the metadata and generate proxies.
Plugins can declare an priority, to ensure they are processed in a particular order. In addition, they can indicate to the broker that they wish to execute code asynchronously - indicating this provides performance improvements during the ingest phase.
Finally, the plugin must define which sorts of files it is interested in - this is typically based on mime-type and/or file extension.
Metadata plugins examine the contents of a file, and retrieve interesting attributes about it - for example, image widths and heights, word counts in documents, or GPS locations.
The plugin uses a helper object, defined in the arcapix libraries, to submit values for these attributes to the Database.
Each plugin must also define a ‘schema’, which indicates what items it intends to extract, the data types of those values, and in some cases, information about valid values, or range bounds.
It should be noted that a minimal schema is quite easy to define, but the more precise the schema, the better the performance and query experience of the resulting database.
For example, it would be possible to declare all values as strings, but this would make range-based searches impossible.
Proxy generation plugins¶
Proxy generation plugins are usually called after metadata extraction has completed (though note the impact of the asynchronous capability). They produce a file or files generated from the original. These will typically be a smaller version of the original (e.g. a thumbnail), or some visualisation of the data, meant for consumption via user interfaces. However, proxies need not be ‘browseable’.
In particular, there should be plugins which produce special proxies called
<namespace>.thumbnail, (e.g. image.thumbnail, video.thumbnail),
which can be used to represent files in “grid views”. Whilst not absolutely required, it is expected.
Plugins can also request that some of their processing is offloaded onto an asynchronous queue.
This is particularly helpful for very heavy processes such as video transcoding.
Support for this is builtin, and easily accessed from the plugin using the