Virtual Data Lakes
Virtual Data Lakes

REST API

A client program can use the API to query or update the data in the virtual data lake's sources. These operations are subject to access control. To make them, the client sends and receives data in prescribed formats.

API Specifics

The request endpoint depends on the system in which the virtual data lake is installed. For the virtual data lake used by the lacibus.net website, it is https://lacibus.net/home/.api Other systems will have different API URLS, determined by their designers.

The requests use the POST method. 

A request must have a command parameter in its query string, whose value is either query or update.

Other parameters are passed in the request body, which must have multipart/form-data content type. 

The responses also have multipart/form data content type.

The multipart form data parts of the requests and responses are described under Data Representations below.

Queries

A query request returns a list of solutions that satisfy a set of constraints.

A constraint has a subject, verb and object, each of which can be given or unknown. A solution assigns a typed value to each unknown in the set of constraints.

For example, a content management system might organize pages in workspaces. To list the pages in the 'Site Main Pages' workspace it could make a query with constraints:

  • Workspace has name "Site Main Pages"
  • Workspace contains page Page
  • Page has name Page Name

This would return a list of solutions, each of which would assign a typed value to Workspace, Page, and Page Name. (The workspace would be the same in all the solutions; the pages and page names would be different.) The workspaces and the pages would be of type ITEM, and the page names ('Home', 'About', 'Get Involved', etc.) would be of type TEXT.

Updates

An update request performs a set of changes and returns the new items created by them.

The changes that can be made are:

  • Create an item: an item is created in a source, with read and write access levels, and with a handle that can be used to refer to itin other changes.
  • Delete an item: a specified item is deleted.
  • Put a triple: A triple with a specified subject, verb and object is created if the store does not already contain one.
  • Remove triples: All triples with a specified subject, verb and object are deleted or, if no object is specified, all triples with a specified subject and verb are deleted.
  • Set a unique object: if there is no triple with a specified subject, verb and object, then one is created; if there is more than one triple with the specified subject and verb, all but one are deleted; so that there is exactly one triple with the specified subject and verb, and it has the specified object.

When a request returns, all or none of its changes will have been made. If an error occurs in making one of them, any previously-made changes are reversed.

The new items are returned with the handles given in the changes to create them, so that they can be distinguished from each other. The handles are not stored and cannot be used in subsequent API requests.

Access Control

Each request is processed at an access level that determines what data it can read and write. The access level can be set by a credential in the request. If no credential is supplied then the request is processed at the lowest (public) level. This allows reading of some data but does not allow writing of any data.

A supplied credential must be accompanied by its correct key. Credentials are not kept secret, but their keys should be kept secret by clients. They are not stored in the triple store; hash digests of them are stored and used to validate them.

Credentials and their keys are obtained by out-of-band means, not through the API.

It is also possible to use an authorized session that has been established by means other than this API.

Data Representations

The API uses JSON as its data representation format, but this is complicated by the need to pass binary values, which cannot natively be included in JSON.

The requests and responses have multipart form data content type. One or more of the parts is a utf8 string representation of a JSON object or of a JSON array of JSON objects. Each binary value is represented in the JSON object or array by a string of the form bn where n is an integer. There is then a part named bn with application/octet-stream encoding that contains the binary value.

Another part, which need not be supplied, is named auth. If supplied, it contains a JSON object represented as utf8-encoded plain text. This has two string attributes: credential and key. The credential is the item identifier of a credential, and the key is the credential's key.

Yet another part, which also need not be supplied, is named meta. It can be used to pass data of any kind, as agreed by the client and server software.

Requests and Responses

For a query, the input JSON is a JSON array, each of whose elements is a JSON array representing a constraint. It is passed in a part named constraints. The output JSON is a JSON array, each of whose elements is a JSON object representing a solution. It is passed in a part named json.

For an update, the input JSON is a JSON array, each of whose elements is a JSON object representing a change. It is passed in a part named changes. The output JSON is a JSON object representing the new items with their handles. It is passed in a part named json.

Constraints

A JSON array that represents a constraint has three elements, representing the constraint's subject verb and object. Each of these is represented by a JSON object and is either a given, which is a typed value, or an unknown.

Solutions

A JSON object that represents a solution has an attribute for each unknown. Its name is the name of the unknown, and its value represents the typed value assigned to the unknown in the solution.

Changes

A JSON object that represents a change has a change attribute that specifies the type of change, and other attributes depending on the type. The type can be: create item, delete item, put triple, remove triples, or set unique object.

For a change to create an item, the other attributes are handle, source, read, and write. The handle is a string that can be used to refer to the item in other changes. The source is an integer that is the numeric identifier of the source that is to contain the item. The read and write attributes are strings that specify the read and write access levels that the new item is to have. They need not be present; they default to the access level of the authorized store session.

For a change to delete an item, there is one other attribute: item, giving the item identifier of the item to be deleted.

For a change to put a triple, the other attributes are subject, verb, and object, representing the subject, verb and object of the triple to be put. Each of these may be given or unknown. If it is unknown, it is the handle of a new item to be created.

For a change to remove triples, the other attributes are subject, verb, and object, representing the subject, verb and object of the triples to remove. The subject and verb attributes are strings that specify the subject and verb. The object attribute need not be present. If it is present, it is a typed value.

For a change to set a unique object, the other attributes are subject, verb, and object. The subject and verb attributes are strings that specify the subject and verb that are to have a unique object. The object may be given or unknown. An unknown is the handle of a new item to be created.

New Items

The JSON object representing the new items created by an update has an attribute for each of these items. Its name is the item's handle, and its value is the item's identifier.

Unknowns

A JSON object that represents an unknown has a single attribute, unknown, whose value is the name of the unknown.

Typed Values

A JSON object that represents a typed value has two attributes: type and value.

The type is a string that can be ITEM, BOOLEAN, INTEGRAL, REAL, TEXT, or BINARY.

The value is a textual representation of a quantity of the type. For an ITEM, it is either an item identifier or the qualified name of a named item. For a BOOLEAN, it is true or false, ignoring case. For an INTEGRAL or REAL, it is the string representation of a decimal integer or real (floating point) number. For a TEXT, it is the textual value. For a BINARY, it is the name of a multipart form part whose content is the binary value.

Items and Access Levels

A string that specifies an item is an item identifier or the qualified name of a named item.

A string that specifies an access level is a string that specifies its item.