Walkthrough - Creating a Space

Introduction

On this page, we’re going to walk-through creating a templated space using the APMgmt REST API.

You can see a more thorough overview of the REST API, as well as examples of using it via cURL on the REST API page.

For this guide, we’re going to work in Python, using the requests library

We’re going to assume the REST server is running on localhost, behind an NGINX proxy, which provides SSL termination.

For convenience, lets store the server url in a variable

url = "https://localhost"

Objective

We want to create a Space for a new project we’re working on, called ‘Sleepy Snake’

Here, we’re assuming that the underlying filesystem is GPFS.

So in GPFS terms, what we want to do is

  • create an independent fileset
  • on the ‘mmfs1’ filesystem
  • we want the data to be placed in the ‘sas1’ pool
  • we want the fileset to have a particular project layout
  • we want the fileset to have a size (block quota) of 4GB

In APMgmt terms, this means creating a templated space

Authentication

Before we can do anything else, we need to get an auth token

The auth server url will be configured in APConfig (see Configuration directives), so lets lookup that url

from arcapix.config import config

authserver = config['arcapix.auth.server.url']  # typically https://localhost

Now we can request an access token from the auth server

import requests

payload = {'grant_type': 'password', 'username': 'myuser', 'password': 'mypassword'}
resp = requests.post(authserver + '/oauth2/token', data=payload)

assert resp.status_code == 200

token = resp.json()['access_token']

Tip

If the request raises SSLError("bad handshake ...") it’s likely because of self-signed certificates. This can be resolved by adding verify=False to the request

For more information, see SSL Cert Verification

To make use of this access token, we have to use HTTP basic auth, with the token as the username and with an empty password.

For convenience, lets create a requests session and apply the auth to it, so that we don’t have to explicitly pass auth to every future requests

session = requests.Session()
session.auth = (token, '')

# if you get SSLErrors from the self-signed certificates, add the following
# session.verify = False

From now on, we’ll assume that all our requests are successful, but in practice you should always check status codes.

Auth Roles

Before we try to create a space, we should make sure that we’re actually allowed to create a space.

Our user/group will have associated with it a collection of authentication roles, and these roles determine what operations we’re allowed to perform on what endpoints.

We don’t have to check the actual roles, though. We can just do an OPTIONS request against the /spaces endpoint and check the Allow header

resp = session.options(url + '/spaces/')

print resp.headers['Allow']
# HEAD, GET, POST, OPTIONS

Here, we can see that we do have permission to perform POST requests against the /spaces endpoint, so we can indeed create spaces!

Note

Different endpoints may have different permissions - you may have permission to create spaces but not profiles, for example.

In general it’s a good idea to check OPTIONS before trying to create an object.

Collection+JSON Template

To create a Space, we need to know what fields should be provided with our POST request.

Fortunately, APMgmt uses the Collection+JSON (C+J) format, which provides us with a template for creating a new objects

So lets check the template from the /spaces endpoint

resp = session.get(url + '/spaces/')

print resp.json()['collection']['template']
{
  "data": [
    {
      "prompt": "space name",
      "name": "name",
      "value": ""
    },
    {
      "prompt": "path of the space relative to its exposers",
      "name": "relativepath",
      "value": ""
    },
    {
      "prompt": "templates applied to the space",
      "name": "templates",
      "value": ""
    },
    {
      "prompt": "exposers providing access to the space",
      "name": "exposers",
      "value": ""
    },
    {
      "prompt": "profile applied to the space",
      "name": "profile",
      "value": ""
    },
    {
      "prompt": "hard limit on the size of the space in blocks",
      "name": "size",
      "value": ""
    }
  ]
}

So we need to provide a name, a relative path, templates, exposers, a profile, and a size.

Okay, so going back to the objective (above), lets call the Space sleepy-snake, with relative path projects/sleepy_snake - that’s relative to the exposers (filesystem). And we’ll give it a size of 4GB.

Warning

Space names can’t contain whitespace - they can only contain letters, numbers, hythens and underscores.

If you try to use a name containing ‘invalid’ characters, the POST request will return a 422 (Unprocessable Entity) error

resp = session.post(url + '/spaces/', data={'name': 'sleepy snake', ...})
print resp.json()
# {u'collection': {u'error': {u'message': u'Insertion failure: 1 document(s) contain(s) error(s)', u'code': 422, u'title': u'Error'}}}

But what about the Profile and the Exposers?

Finding where to put things

It’s not possible to create a filesystem or a pool via the REST api (yet).

However, when the REST server is started up, it is populated with objects based on what already exists - so we get an exposer for every filesystem, and a data store for every pool. In addition, a special placement policy rule is created for each pool, resulting in corresponding ‘default’ profiles.

So we need to lookup the exposer and profile for our filesystem and placement pool of choice.

Pool

We want our data to be placed in pool sas1. As mentioned above, the database should have been pre-populated with a datastore for pool sas1, and with a profile to assign data to that datastore.

The naming scheme for the pre-populated default placement profiles is {filesystem}-{pool}, so in our case, we want to find the profile named mmfs1-sas1.

Tip

It is possible to create your own profile with additional ilm steps (migration rules), and with placement controls, such as matching certain file types or file size ranges, etc.

But that won’t be covered in this walkthrough

So how do we find the profile with a particular name?

Collection+JSON Queries

Once again, the C+J helps us out by providing models for queries [1]

resp = session.get(url + '/profiles/')

print resp.json()['collection']['queries']
[
  {
    "prompt": "Search by Name",
    "href": "/profiles?where={\"name\":\"{name}\"}",
    "data": [
      {
        "prompt": "profile name",
        "name": "name",
        "value": ""
      }
    ],
    "rel": "search",
    "encoding": "uri-template"
  },
  ...
]

This shows us how to construct a query to search for a profile by name. The href gives the template for the url, and the data block tells use what parameter we need to replace in that href.

Here, we have to replace the name parameter {name} with the actual name we want to search for, giving us

"/profiles?where={\"name\":\"mmfs1-sata1\"}"

So lets perform this query in python - we’re using params for readability, but the result is the same

params = {'where': '{"name": "mmfs1-sas1"}'}
resp = session.get(url + '/profiles/', params=params)

resp.headers['X-Total-Count']  # 1

print resp.json()
{
  "collection": {
    "items": [
      {
        "href": "/profiles/8f01ab22-cc0b-2056-ff8c-e1d829dc806c",
        "data": [
          {
            "prompt": "current status of the item",
            "name": "status",
            "value": "ACTIVE"
          },
          {
            "prompt": "unique identifier for the item",
            "name": "id",
            "value": "8f01ab22-cc0b-2056-ff8c-e1d829dc806c"
          },
          {
            "prompt": "profile name",
            "name": "name",
            "value": "mmfs1-sas1"
          },
          ...
        ],
        "links": [...]
      }
    ],
    "href": "/profiles?where={\"name\": \"mmfs1-sas1\"}",
    "links": [...],
    "template": {...},
    "queries": [...],
    "version": "1.0",
  }
}

The full C+J response is quite long and unwieldy, so the above has been truncated.

Tip

For a quick sanity check, we can look at the X-Total-Count header to see how many results the query returned. Profile names are unique, so logically, there should be only one.

Referencing Items

When referencing an item in a POST request, we can provide either the item’s href or its id.

The href is preferred, since it uniquely identifies an item - it’s possible for items of different types with the same id, whereas the href explicitly includes the id AND the type of item it is.

The href is also easier to extract from the C+J response. In the reponse above, we see that we can get our profile item as resp.json()['collection']['items'][0]. And at the very top of that item, we see a field for href

So we can get the profile href from our query like so

profile = resp.json()['collection']['items'][0]['href']
# "/profiles/8f01ab22-cc0b-2056-ff8c-e1d829dc806c"

Filesystem

We want to create our space in the mmfs1 filesystem, so we need to find the corresponding exposer.

Unlike profiles, there are multiple different types of exposer - including native, nfs, smb. So in addition to a name, we also have to query for the right exposer type.

A GPFS filesystem is represented in APMgmt as a GPFSNativeExposer, which has type='gpfsnative'

So to find the mmfs1 filesystem we can query the /exposers endpoint, again using the where url parameter

params = {'where': '{"type": "gpfsnative", "name": "mmfs1"}'}
resp = session.get(url + '/exposers/', params=params)

print resp.json()
{
  "collection": {
    "items": [
      {
        "href": "/exposers/d672fde7-ba69-5d16-0acd-5868d2a8f3b9",
        "data": [
          {
            "prompt": "current status of the item",
            "name": "status",
            "value": "ACTIVE"
          },
          {
            "prompt": "specific type of the exposer",
            "name": "type",
            "value": "gpfsnative"
          },
          {
            "prompt": "path at which the exposer is mounted",
            "name": "mountpoint",
            "value": "/mmfs1"
          },
          {
            "prompt": "unique identifier for the item",
            "name": "id",
            "value": "d672fde7-ba69-5d16-0acd-5868d2a8f3b9"
          },
          {
            "prompt": "exposer name",
            "name": "name",
            "value": "mmfs1"
          },
          ...
        ],
        "links": [...],
    "href": "/exposers?where={\"type\": \"gpfsnative\", \"name\": \"mmfs1\"}",
    "links": [...],
    "template": {...},
    "queries": [...],
    "version": "1.0",
  }
}

Again, we expect exactly one result, and we can get the exposer href the same as before

exposer = resp.json()['collection']['items'][0]['href']
# "/exposers/d672fde7-ba69-5d16-0acd-5868d2a8f3b9"

Hint

If there are no results for the exposers query, it’s possible the database hasn’t been populated (yet). You can check the progress of the populate job via the /jobs endpoint

params = {
    'where': '{"task": "ReconcileDBTask"}',  # populate job
    'sort': '-start_time'  # most recent first
}

resp = session.get(url + '/jobs', params=params)

print resp.json()['collection']['items'][0]['data']
# [{"name": "status", "value": 255,  ...  # ERRORED

Making a Project Template

The last thing we want before we can create our space is a template - a pre-defined directory layout that we can apply to our new space, and to any future spaces we might create

Note

It can be confusing, but try not to mix up Template objects with the C+J creation template discussed above

Template Model

The template we want to use doesn’t exist yet, so we have to create it.

To do this, we create a ‘model’ of the directory layout we want

$ tree /mmfs1/project_template/
/mmfs1/project_template/
├── assets                  # <-- this is a dependent fileset
│   ├── models
│   └── rigs
├── flame
├── houdini
├── maya
├── mudbox
├── nuke
├── published
└── rendering

Along with the directory layout, the model can include files and dependent filesets. We can even set up permissions, which the template will capture.

POSTing the Template

Once we have our template model, we create the actual template via the REST interface.

As with spaces, we can lookup the C+J ‘template’ for the fields we need to POST

resp = session.get(url + '/templates/')

print resp.json()['collection']['template']
{
  "data": [
    {
      "prompt": "template name",
      "name": "name",
      "value": ""
    },
    {
      "prompt": "specific type of the template",
      "name": "type",
      "value": ""
    },
    {
      "prompt": "path to the template",
      "name": "template_location",
      "value": ""
    }
  ]
}

We need to provide the name, the type (filesystemtemplate in this case), and the location of the template’s model

To perform a POST request with C+J, we have to fill in the value fields in the C+J template we just looked up

template_data = {
    "template":
        "data": [
            {"name": "name", "value": "project_template"},
            {"name": "type", "value": "filesystemtemplate"},
            {"name": "template_location", "value": "/mmfs1/project_template"}
        ]
    }

(You don’t need to include the prompt fields, but if you do include them, they will just be ignored)

We then POST this data to the /templates endpoint

resp = session.post(
    url + '/templates/',
    data=template_data,
    headers={"Content-Type": "application/vnd.collection+json"}
)

print resp.status_code  # 202 (Accepted)

Important

As shown above, we need to include the Content-Type: application/vnd.collection+json header so the REST server knows what JSON format it is receiving

Important

The trailing slash on the end of the url /templates/ is required. Without it, the POST request will fail.

If there were no issue with the request, we should get back status code 202 (Accepted).

Template Builder

A 202 status means the new template has been added to the database, and a task has been submitted to the job engine.

This builder task will copy the template model into the configured template store (see Configuration directives).

The response from the POST request will include a Location header, which we can query to check the status of our template

print resp.headers['Location']
# 'https://localhost/templates/e00e872b-f4c5-2557-795d-4ccf4715b602?projection={"status":1}'

Because of the way C+J is structured, it’s not easy to grab just the status field from the response. Lets write a little helper function

def get_status(collection, item):
    for data in collection['collection']['items'][item]['data']:
        if data['name'] == 'status':
            return data['value']
    else:
        raise KeyError("Status field not found")

Now lets check the status of our template

r = session.get(resp.headers['Location'])
print get_status(r.json(), 0)
# 'ACTIVE'

The status will transition from NEW (data POSTed), to PENDING (task submitted), to INPROGRESS (task running), to ACTIVE

When the template reaches state ACTIVE, we know that the builder job has completed successfully, and it’s ready to use.

template = r.json()['collection']['items'][0]['href']
# "/templates/e00e872b-f4c5-2557-795d-4ccf4715b602"

Note

Once the template has been ingested, its template_location field is updated (internally) to point to the location of template within the template store.

On subsequent GET requests, the template_location field will be returned as null

Tip

Once the template is built, the original model can be modified or even deleted without affecting the template.

Checking the Builder Task

Say we want to check the status of the builder task itself, rather than watching the template’s status, or say the template enters state ERRORED, implying that the builder task failed.

When we look up an item, included in the response (in the items block) is a links block.

print r.json()['collection']['items'][0]['links']
[
  {
    "href": "/jobs?where={\"resource_type\": \"templates\", \"resource_id\": \"e00e872b-f4c5-2557-795d-4ccf4715b602\"}",
    "prompt": "Jobs",
    "name": "jobs",
    "render": "link",
    "rel": "jobs"
  }
]

Here we see a link to a jobs query.

Lets do another helper function

def get_link_href(collection, item, name):
    for link in collection['collection']['items'][item]['links']:
        if link['name'] == name:
            return link['href']
    else:
        raise KeyError(name)

Visiting the jobs link will return a collection of all jobs associated with our template

href = get_link_href(r.json(), 0, 'jobs')
resp = session.get(url + href)

In this instance, there should be only one job returned, since we’ve only submitted one task for our template (the builder task)

If the job is no longer active (COMPLETED or ERRORED), we can check the job’s logs to try and diagnose any issues. In the job’s links section, we should see links for stdout and stderr

print resp.json()['collection']['items'][0]['links']
[
  {
    "href": "/templates/e00e872b-f4c5-2557-795d-4ccf4715b602",
    "prompt": "Resource",
    "name": "resource",
    "render": "link",
    "rel": "resource"
  },
  {
    "href": "/jobs/clientdemo-pixstor-01.pixstor%23238.0%231540565545/stderr",
    "prompt": "Stderr",
    "name": "stderr",
    "render": "link",
    "rel": "stderr"
  },
  {
    "href": "/jobs/clientdemo-pixstor-01.pixstor%23238.0%231540565545/stdout",
    "prompt": "Stdout",
    "name": "stdout",
    "render": "link",
    "rel": "stdout"
  }
]

(We also get a link back to our template in the resource link)

Performing a GET request against the stderr link will return the log in plain text format

href = get_link_href(r.json(), 0, "stderr")
resp = session.get(url + href)

print resp.headers['Content-Type']  # text/plain

print resp.text
# 'DEBUG:...

Note

Python logging is written to stderr by default, so typically will end up in the stderr log.

The stdout log would contain anything written to stdout, such as print statements. But since none of the APMgmt tasks use print statements, the stdout log will usually be empty.

However, this behaviour may vary depending on which job engine is being used.

Creating the Space

Now, finally, we have everything we need to create our Space.

So, as we did for our project template above, we need to fill in the C+J template values

space_data = {
    "template":
        "data": [
            {"name": "name", "value": "sleepy-snake"},
            {"name": "relativepath", "value": "projects/sleepy_snake"},
            {"name": "exposers", "value": exposer},
            {"name": "profile", "value": profile},
            {"name": "templates", "value": template},
            {"name": "size", "value": 4*1024*1024*1024}  # 4GB
        ]
    }

Tip

Here we only have one exposer and one template.

If instead, we wanted to create a space with multiple exposers (or templates) we would send their hrefs as a comma-separated list - e.g.

"value": '/exposers/bb2873a8-c489-4530-a8a5-ece70598f3ea,/exposers/8b234c97-23a1-4f2b-9fce-1200d40e96c1'

Then we POST the data to the /spaces endpoint and wait for our new space to enter an ACTIVE state

resp = session.post(
    url + '/spaces/',
    data=space_data,
    headers={"Content-Type": "application/vnd.collection+json"}
)

checkurl = resp.headers['Location']

from time import sleep

# query the status every second for 10 seconds
for _ in xrange(10):
    sleep(1)
    r = session.get(checkurl)
    status = get_status(r.json(), 0)
    if status == 'ACTIVE':
        break
else:
    raise Exception("Space didn't become active after 10s - got status %r" % status)

And we’re done!

Checking Our Work

Database

First of all, lets check what our space looks like in the APMgmt database

The Location header for our space uses a projection to only return the status field, not the other data fields, so we can’t use that.

Lets try doing a query for our space

params = {'where': '{"name": "sleep-snake"}'}

resp = session.get(url + '/spaces/', params=params)
print resp.json()
{
  "collection": {
    "href": "/spaces/",
    "items": [
      {
        "href": "/spaces/de95cf28-fdf9-5455-b1d9-831cf8a1869b",
        "data": [
          {
            "prompt": "current status of the item",
            "name": "status",
            "value": "ACTIVE"
          },
          {
            "prompt": "space name",
            "name": "name",
            "value": "sleepy-snake"
          },
          {
            "prompt": "path of the space relative to its exposers",
            "name": "relativepath",
            "value": "projects/sleepy_snake"
          },
          {
            "prompt": "unique identifier for the item",
            "name": "id",
            "value": "de95cf28-fdf9-5455-b1d9-831cf8a1869b"
          },
          {
            "prompt": "hard limit on the size of the space in blocks",
            "name": "size",
            "value": 4294967296
          },
          ...
        ],
        "links": [
          {
            "href": "/templates/?where=id==\"e00e872b-f4c5-2557-795d-4ccf4715b602\"",
            "prompt": "templates applied to the space",
            "name": "templates",
            "render": "link",
            "rel": "templates collection"
          },
          {
            "href": "/exposers/?where=id==\"d672fde7-ba69-5d16-0acd-5868d2a8f3b9\"",
            "prompt": "exposers providing access to the space",
            "name": "exposers",
            "render": "link",
            "rel": "exposers collection"
          },
          {
            "href": "/profiles/8f01ab22-cc0b-2056-ff8c-e1d829dc806c",
            "prompt": "profile applied to the space",
            "name": "profile",
            "render": "link",
            "rel": "profile item"
          },
          {
            "href": "/jobs?where={\"resource_type\": \"spaces\", \"resource_id\": \"de95cf28-fdf9-5455-b1d9-831cf8a1869b\"}",
            "prompt": "Jobs",
            "name": "jobs",
            "render": "link",
            "rel": "jobs"
          }
        ]
      }
    ],
    "links": [...],
    "template": {...},
    "queries": [...],
    "version": "1.0",
  }
}

Looks good.

Notice that the related items - exposers, templates, profile - appear in the links section. This allows us to look up those items without having to figure out their urls.

Note

It is possible to have multiple spaces with the same name, but only if they have different profiles.

In general, you should avoid creating multiple spaces with same name.

Filesystem

Now, lets check that a fileset was actually created for our Space

$ mmlsfileset mmfs1
Filesets in file system 'mmfs1':
Name                         Status    Path
...
sas1-sleepy-snake            Linked    /mmfs1/projects/sleepy_snake
sata1-sleepy-snake-8b234c97  Linked    /mmfs1/projects/sleepy_snake/assets

The first one, sas1-sleepy-snake, is the fileset created for our space.

Notice that the fileset doesn’t have exactly the name we specified for the space - it has the name of its placement pool stuck on the front. This prefix is used for matching the space (fileset) to the right pool for the space’s profile (placement policy rule)

$ mmlspolicy mmfs1 -L
...

RULE 'sas1-placement' SET POOL 'sas1'
 WHERE FILESET_NAME LIKE 'sas1-%'

RULE 'default' SET POOL 'sata1'

The second fileset shown, sata1-sleepy-snake-8b234c97, is the dependent fileset installed by our template. Instead of the space’s placement pool, this name is prefixed with the pool that the original, model fileset was assigned to - in this case sata1. The name also includes the name of our space (plus a random suffix).

Lets check the full template was installed

$ tree /mmfs1/projects/sleepy_snake
/mmfs1/projects/sleepy_snake
├── assets
│   ├── models
│   └── rigs
├── flame
├── houdini
├── maya
├── mudbox
├── nuke
├── published
└── rendering

Great! Finally, the size value we specified should have been translated into a block quota

$ mmlsquota -j sas1-sleepy-snake mmfs1
                         Block Limits                                    |     File Limits
Filesystem type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
mmfs1      FILESET         109 3865470566 4294967296          0     none |       10       0        0        0     none

Perfect! Our space is now ready to use.

Exercise: Create a Snapshot

Now that you’ve made yourself a space, why not practice what you’ve learned by creating a snapshot of that space

Hints

  • The endpoint you’re looking for is /snapshots
  • To create a space snapshot, the only fields you need to POST are name, type, and space
  • The type you want to use is gpfsspacesnapshot

Footnotes

[1]uri-templates are not part of the standard Collection+JSON spec