Querying the Database via the REST API

This section contains information on how to make requests to PixStor Search by using the REST API.

The API is compliant with the Collection+JSON (C+J) specification, but note it also makes use of the collection level properties extension, and a minor extension to allow for templated URL’s.

It may be a little different to some other REST approaches, but is more predictable and easier to machine consume, since the schema is entirely consistent across any service which uses the C+J approach.

It should be viewed as somewhat like a website, with links that can be followed, forms that can be filled in etc, with the strong caveat that it is designed for machine consumption, rather than human, rather than as a simple fire-and-forget approach.

In addition, it is possible to short-circuit the process, but this requires an understanding of the URL format, which consuming the C+J directly does not. The URL format is considered to be less canonical (i.e. more likely to change between versions).

Note

In order to reduce the documentation size, in all circumstances success is indicated by a 200, 201, or 202 HTTP status code. A 500 HTTP Status code can also be returned for a generic Server error

All methods are GET, unless otherwise specified.

Note

In order to reduce the documentation size, in all circumstances OTHER THAN Authorisation requests, the following HTTP Headers must be included

Accept: application/vnd.collection+json

WWW-Authenticate: <bearer based token string>

Authentication

Before accessing the API, one must first authenticate, using the RFC6749 oAuth2 process oAuth2 process for resources. It should be noted that in this context, the developers application is the “Client”, and typically utilisation is via grant_type=password

Arcapix supplies a basic example Authorisation Server which supports only a grant_type of password, and authenticated against the standard PAM system

Note that the authentication server is a distinct service vs the search server, and is access on an alternative port. It must in all cases be via SSL. As standard, the search server runs on port 5000, but the auth server runs on 5001. Depending on the client library/language you are using, you may need to accept self-signed certificates or install the Arcapix CA.

URL

https://authserver:5001/oAuth2/access_token

Method

POST

Request Parameters

Parameter Description Required
grant_type Always “password” Yes
username username Yes
password password Yes

Success Status Code

200

Payload

{
  "access_token": "feoF8MpdWqnUAI3FiMc9v6_PDspAfJPzc_-uwueC9I7IDkxz_hvYITVsNWZ5IOH19nfwIADhIpo9q_GDaCLyUGvA-_RUAEaPcurWFSTX5zClBGZ-I3n2WQbnvLVkvweVWGNilBTdNwdNndmNyqYI-lVt4RO1tIylV29mN7GQOMRXZAWKMXunc_0qpNpJy47M8tPZVVReXREnGd96SovGspKQ-AUAH1IcaD3mqlzrxiNg_j9cRP3KSdhSy_cHSuhN4QdX96jJ5TnsPPHXbFnK26k4jbBPb7sOx39LcXXOOuCjV_RioqaZHe_xt7l3tuuetxlNeU5PhgM2vJsWxBHQrJau9bG0pO24tkMEj5ByUBIH4EiXCyCtx9NbfpB_Hyu0KsHv8IFPcMAZlC7Ijcpg9g2zCa7iGIA_o-uYrHDzxg6sQPQVzgPmJuD1RkFVMXsbiwan7vFCFOscoeCKfcxHW8GTB9SFEZ3aErnGsHMgIRIvBbcH3nyIATcnaTVVZOKYP82851NJgHQUaCmZ1zDkjndbcmiAdvYnOh2EUVVlAoL0UiTLS4qh6EgEF4OIj3_blEn0iSzF5269tiDgaMYtf39839_2eN1zr9Td7BEs9srz5OWQm482Djz04LjL2veYhLOdxVaDYoiRYrvyeDblRPaMu4AWZmjlJEqtDSm664AARCAPIX",
  "expires_in": 86400,
  "token_type": "Bearer"
}

Note

Tokens by default expire after 24 hours. The sample auth server does not support token refresh - a new token must be requested. Tokens may not persist across server restarts, depending on the configuration of the server.

Error status code

400 - Bad request (NB. This is not terribly precise - 403 or 412 might be more useful but are not what’s specified in the RFC)

Onward Usage

However it is achieved, the end point of a succesful authentication is an access token. This must be passed to the search server via an appropriately encoded Bearer WWW-Authenticate Header.

NB. Most libraries will take care of the encoding, if you pass the access token as the username, and an empty password e.g.

import requests
requests.get("http://searchserver:5000/files", auth=requests.auth.HTTPBasicAuth(access_token, ''))

Billboard URL

The C+J exploration starts by retrieving the server’s root URL.

Provided the correct access token is passed via the standard WWW-Authenticate header, we will receive a response containing a list of possible queries.

By filling in the paramters requested, one can craft a suitable query without knowing the URL structure.

URL

https://searchserver:5000/files

Response

See Example Responses

Considering a snippet of the response above:

{
  "data": [
    {
      "prompt": "Search string",
      "name": "where",
      "value": ""
    }
  ],
  "href": "/files?where={\"_all\":\"{where}\"}",
  "prompt": "Enter a string to search in all fields across all files",
  "rel": "search"
}

By replacing the {where} entries with values prompted for using the supplied prompts (Search string), a suitable query URL can be constructed - e.g. /files?where={"_all":"promptedvalue"}

A small command line tool might be written as follows:

r = requests.get("http://searchserver:5000/files", auth=HTTPBasicAuth(access_token, ''))

query = r.json()['collection']['queries'][0]
href = query['href']

print query['prompt']+"\n"

for param in queries['data']:
    href = href.replace("{"+param['name']+"}", raw_input(param['prompt']+":\n"))

results = requests.get(href, auth=HTTPAuth(access_token, ''))

Rich/Direct query

It is possible to directly query without going via the billboard URL, although this may mean your application needs updating should the URL format change.

URL

https://searchserver:5000/files

Request Parameters

Parameter Description Required Default
where Clause to filter results by Yes* NA
sort Key to sort by No relevance
page desired page of results No first
projection Specify fields to return No all properties
max_results Amount of results per page No 10

*The where clause isn’t strictly needed, but no items are returned if you do not provide one in order to reduce the chances of a malformed query overloading the server.

where

Within the where clause, the format is as follows where={"property1":"value1", "property2":"value2"}, which will produce an “AND” search.

It is possible to pass multiple values (OR) with an array syntax where={"property1":["value1","value2"]}

Ranges are also possible where={"property1":{"gte","value2"}}.

Property names are either of the form <namespace>.<property> or the special magic _all, which searches in all properties.

sort

The sort property specifies a column to sort the data on, with a preceeding - used to indicate an inversion of the sort. Multiple, comma-separated fields can be specified e.g. sort=-core.size,core.modificationtime

Note

By default, the items are returned in a “relevance” order. Unless the filter has been very precise, a lot of matches are likely, and sorting on these matches is likely to not be terribly useful, as well as being a performance hit.

projection

It is technically possible to request only a subset of the properties for items to be returned - if one knew for example that a particular metadata field was very large (say 10K or more), it may make sense to not have it returned, to reduce both network utilisation and JSON parsing overhead.

The syntax is projection={"property1":0} to exclude a field.

Alternatively, you can specify projection={"property1":1} to return only that field.

page

The page property indicates where in a paged result set you wish to be. In essence, page*max_results is the index of the first result you want.

NB. Using the C+J ‘HATEOS’ links means you don’t need to do computations to provide “previous”, “next”, “last” type functionality - the required URL’s are given to you.

max_result

The maximum number of results to return. This has a default of 25 and an absolute maximum of 1000. Smaller pages give faster results.

Payload

(See typical response below)

Error status codes

403 - Forbidden - most likely incorrect access token

Example request

GET http://searchserver:5000/files?where={"_all":"jpg"}&sort=core.pathname&projection={"core.size":0}&page=1&max_results=20 HTTP/1.1
Accept: application/vnd.collection+json
WWW-Authenticate: <bearer based token string>

Typical query response

The response (in C+J format) will contain 4 major sections

(For a full example, see Example Responses)

Items

This is a list of matches, typically the first 25. Each nested item will contain a “data” key, which in turn is a list of triples for the properties name, value, and prompt.

{
   "items": [
      {
        "href": "/files/3735374022151170231",
        "data": [
          {
            "prompt": "File basename (string)",
            "name": "core.filename",
            "value": "cats-22.jpg"
          },
          {
            "prompt": "File mime-type (string)",
            "name": "core.mimetype",
            "value": "image/jpeg"
          }
          ]
       }
   ]
}

Properties provided are

name - name of the field property value - field value prompt - human readable description of the field

The href attribute gives a direct link to this item, which will return this item, and only this item, with all properties returned. Thus, detail views can be built when used with projections.

Collection properties

These provide a list of data items indicating the total number of hits.

{
"properties": [
  {
    "prompt": "Number of matching documents",
    "name": "hits",
    "value": 73
  }
]
}