Starting Guide

The information contained in this page is meant to provide background information in order to help you get started using the API.

Design Notes

The API has been design with simplicity in mind, first and foremost. It avoids taking JSON documents as input, and rather takes:

  • RESTful URLs without a request body as input to POST and PUT resources. In such a case, the data to use for updating is specified as part of the URL.
  • HTTP parameters where a RESTful approach is not well-suited.

The intent was to suggest an as simple as possible API in order to favor swift adoption by system administrators, as part of automation tasks.

Security

The API gives access to "write" operations (killing processes, deploying, etc.) that require authentication and authorization. In addition, such calls should be made over SSL, which Corus supports. The following sections explain these security considerations.

Note that it is possible to make the REST resources available and over SSL only, through configuration.

Authentication and Authorization

Authentication and authorization require the creation of application keys. An application key is associated to an application ID, and together they provide the credentials required by an application to be authenticated. In turn, such credentials are associated to a role, which itself corresponds to a set of permissions. The association of an application key to a role thus allows for authorization.

Each Corus node keeps role definitions and application keys in its own local database. The CLI commands allowing for the administration of roles and application keys support the -cluster option, which allows replicating that data across a cluster. In addition, when the repository functionality is used, replication of roles and application keys is performed automatically (from repo servers to repo clients).

Creating Roles

Prior to creating application keys, a logical step is to create one or more roles, to which application keys are meant to be associated. Roles and permissions are created in Corus, through the CLI. Roles can be created, deleted, and listed, through the role command in the CLI.

Furthermore, each role must be associated to a set of permissions. Corus supports the following predefined permissions:

  • READ: this permission does not required authentication. All read resources in the API (those that do not modify the state of Corus - which are typically GET calls) are configured with such a permission.
  • WRITE: this permission corresponds to PUT, POST and DELETE resources that modify the state of Corus by adding or removing data (similary to the CLI's port add, conf del, etc.
  • EXECUTE: this permission corresponds to resources that affect process lifecycle, and correspond to the CLI's exec, kill, suspend, resume and restart commands.
  • DEPLOY: this permission strictly corresponds to the API's deployment/undeployment resources (analoguous to the CLI's deploy and undeploy commands.
  • ADMIN: this permission is required to administer roles and application keys through the REST API.

To create (or update) a role, use the role add command in the CLI (see further below about managing roles through the API):

role add -n admin -p rwxd -cluster

The above creates a role, which is replicated through all the cluster. In a similar manner, roles can be deleted (through role del and listed, using role ls (see the CLI's help for the details, by typing man role).

One thing to note is that the command can take up to two options, aside from the -cluster one:

  • n: specifies the name of the role.
  • p: specifies the set of permissions to assign to the role. In this case, note that one character is used for each permission (that character is nicknamed the "abbreviation"):
    • r: READ
    • w: WRITE
    • x: EXECUTE
    • d: DEPLOY
    • a: ADMIN

Creating Application Keys

Once at least one role is created, you can proceed to creating application keys. To that end, the appkey command is used:

appkey add -a chef -r admin -k akw71sey927a -cluster

The above creates (or updates) an application key for the "chef" application, assigning to it the "admin" role, and the application key specified by the -k option. That new application key is replicated across the cluster.

More formally, the options shown in the example above correspond to:

  • -a: application ID.
  • -k: application key (optional, defaults to a generated one).
  • -r: role (corresponds to the name of an already defined role).

It is optional to specify an application key as a parameter: if none is passed, one is automatically created (that application key consists of a randomnly generated sequence of 32 characters).

You can view the existing application keys with appkey ls and delete application keys with appkey del.

Application keys can be managed through the REST API - see further below.

Using an Application Key

Once an application key is configured, it can be used to make API calls (this should be done over a SSL link, which Corus has support for, as explained further below). There are two ways to do this: by providing the application ID and application key as HTTP headers, or by specifying that information as HTTP request parameters (the former method is preferred).

The HTTP headers to use are the following:

  • X-corus-app-id: specifies the application ID.
  • X-corus-app-key: specifies the application key.

In place of the above headers, the aid and apk request parameters can be used to specify the application ID and key, respectively, as in the following:

https://localhost:33443/rest/clusters/app-01/distributions?aid=chef&apk=akw71sey927a

SSL

When using the REST API, it is recommended to connect to the Corus SLL port. By default, SSL is disabled. It can be enabled through simple configuration (in the corus.properties file).

A prerequisite, though, is to have created the required SSL certificate (and the keystore that stores it). This is widely documented online. Once the keystore has been created (using the JDK's keytool command), the Corus configuration can be modified accordingly.

The following configuration properties drive SSL support by Corus:

  • corus.server.ssl.enabled: enables/disables SSL (defaults to false, must of course be set to true for SSL to work).
  • corus.server.ssl.keystore.file: points to the keystore file (normally, it corresponds to ${user.home}/.keystore.
  • corus.server.ssl.key.password: the password used to access the SSL certificate in the keystore.
  • corus.server.ssl.keystore.password: the password used to access the keystore itself.
  • corus.server.ssl.port: the SSL port to use. If zero (0) is specified, the port will default to corus_port + 443. That is, if the server runs on port 33000, the SSL port will be set to 33443; if it runs on 33001, the port will be set to 33444, and so on.

Here's an example configuration for the above properties

corus.server.ssl.key.password=corus-ssl
corus.server.ssl.keystore.password=corus-ssl
corus.server.ssl.keystore.file=${user.home}/.keystore
corus.server.ssl.port=0
corus.server.ssl.enabled=true

Once the configuration is done, Corus must be restarted.

Enforcing SSL and Authentication

As of release 4.5.2, it is possible to enforce use of the API over SSL (that is: REST resources will then not be available over plain HTTP). By the same token, it is also possible to require authentication for all REST resources (even GET ones).

To enforce the use of SSL, configure the corus.server.api.ssl.enforced property as follows, in corus.properties:

corus.server.api.ssl.enforced=true

When the above is set to true, REST resources are only available over HTTPS - see the SSL section for more details.

To require systematic authentication (even on GET resources), set the corus.server.api.auth.required property to true in corus.properties:

corus.server.api.auth.required=true

Using the API

Using the API is straighforward as you will see. The REST resources are accessible at any Corus node, under the following URI:

http://<host>:<port>/rest

Note that the above in itself returns nothing.

General Guidelines

All GET resources described further below return a JSON response for which a sample provided in the documentation.

For other resources, they always return a JSON payload, consisting of the following in non-error situations:

{
  "status": 200
}

When an error occurs, the following JSON response is returned:

{
  "status": <error_code>
  "stackTrace": "<stack_trace>"
}

The <stack_trace> placeholder corresponds to the full server-side error stack trace. The <error_code> placeholder is meant for the relevant HTTP error code, given the error that occurred.

Note that the stackTrace field is optional: a response may result in an error even if it is not the result of an exception being thrown. The status field is indicative of an error, even stackTrace field is absent.

Of course, the HTTP status response header is always available. The status message, though, may or may not be present - consider it optional, from Corus' standpoint.

Some resources may return feedback (which are progress messages emitted by Corus). Such a feedback, if any, is provided by the feeback field, which holds an array of progress messages:

{
  "status": 200
  "feeback": [
    "<message_1>",
    "<message_2>",
    "<message_N>"
  ]
}

If the backend did not provide such progress messages, then the feedback field will be present, with an empty array as its value. It may occur that a response results in an error, has not HTTP status message set, and has no stack trace. It will in such a case have feedback.

Last but not least, some resources have an asynchronous behavior: upon being invoked, they trigger an asynchronous task on the server side. At that point, a resource may return status OK, but the background task's outcome may eventually be an error. This API documents the behavior of the resources from that standpoint. Asynchronous behavior will by identified by ASYNC in the doc, whilst synchronous behavior will be labeled SYNC. Note that some resources support both synchronous and asynchronous invocation. When such is the case, an async parameter must be passed to the resource, in the query string. Its value must be set to true.

Additionally, some resources may return a so-called completion token, which clients can use to poll for the completion of certain asynchronous calls and obtain information about the outcome of the operation (success/failure) - such resources are labeled with ASYNC with polling in the doc.

All GET resources can be considered SYNC, unless otherwise specified.

Partitioning

In order to provide more flexibility than the ability to communicate either with a single or all the nodes in a cluster, the notion of "partitions" has been introduced as part of Corus 4.8.

More precisely: one can defined, using the API, so-called "partition sets", which are themselves groups of Corus nodes whose size is defined with a partitionSize parameter passed at creation time. Given a cluster of 20 nodes, a partition size of 5 will yield a partition set containing 4 partitions of 5 nodes each.

The API allows creating multiple partition sets, which are each assigned a UUID. That UUID must then be used when wanting to perform operations on the corresponding partition set. Partition sets are not eternal: they are assigned a timeout of 5 minutes by default - their last access time is reset every time they're used.

Furthermore, the API allows passing in criteria for partition set creation. More precisely: partition sets can be created based on the tags of Corus nodes, through an inclusion/exclusion mechanism.

The API provides three resources for managing partition sets. Their definition is provided in the following sub-sections.

Get Partition Set

GET
- Permission.....: READ
- Behavior.......: SYNC
- Request headers:
  - Accept: application/json

- Resources:
  /partitionsets/{partitionSetId}
        
- Path variables:
  - partitionSetId: The ID of the desired partition set.

Sample request

http://saturn:33000/rest/clusters/app-01/hosts/192.168.0.104:33000/partitionsets/24399ed2-56a3-11e5-885d-feff819cdc9f

Sample response

{
  "id": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "partitionSize": 2,
  "partitions": [
    {
      "index": 0,
      "hosts": [
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.103",
          "port": 33000
        },
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.104",
          "port": 33000
        }       
      ]
    },
    {
      "index": 1,
      "hosts": [
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.105",
          "port": 33000
        },
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.106",
          "port": 33000
        }       
      ]
    }
  ]
}

Create Partition Set

Allows creating partition sets. The API allows specifying inclusion/exclusion patterns (based on Corus tags) used to determine which nodes should be included in a partition set. Note the following:

  • All nodes are included if NO inclusion and/or exclusion patterns are specified.
  • If inclusion patterns are specified, then only the nodes that match will be included.
  • Exclusion is applied after inclusion, and overrides it (if a node matches both an inclusion and exclusion pattern, the exclusion one will win).

PUT
- Permission.....: WRITE
- Behavior.......: SYNC
- Request headers:
  - Accept: application/json

- Resources:
  /partitionsets
        
- Parameters:
  - partitionSize......: The size of the partitions in the partition set.
  - timeout............: The timeout to assign to the partition set, in seconds 
                         (defaults to 300 seconds - or 5 minutes).
  - includes (optional): A comma-delimited list of tags/tag patterns, used to select 
                         the Corus nodes that should be included for partitioning.
  - excludes (optional): A comma-delimited list of tags/tag patterns, used to select 
                         the Corus nodes that should be excluded from the partition set.

Sample Requests

http://saturn:33000/rest/clusters/app-01/hosts/192.168.0.104:33000/partitionsets?partitionSize=2
http://saturn:33000/rest/clusters/app-01/hosts/192.168.0.104:33000/partitionsets?partitionSize=2&includes=master        
http://saturn:33000/rest/clusters/app-01/hosts/192.168.0.104:33000/partitionsets?partitionSize=2&includes=slave        
http://saturn:33000/rest/clusters/app-01/hosts/192.168.0.104:33000/partitionsets?partitionSize=2&includes=*&excludes=master

Sample response

{
  "id": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "partitionSize": 2,
  "partitions": [
    {
      "index": 0,
      "hosts": [
		    {
		      "cluster": "app-01",
		      "corusVersion": "4.5",
		      "hostName": "saturn",
		      "hostAddress": "192.168.0.103",
		      "port": 33000
		    },
		    {
		      "cluster": "app-01",
		      "corusVersion": "4.5",
		      "hostName": "saturn",
		      "hostAddress": "192.168.0.104",
		      "port": 33000
		    }       
      ]
    },
    {
      "index": 1,
      "hosts": [
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.105",
          "port": 33000
        },
        {
          "cluster": "app-01",
          "corusVersion": "4.5",
          "hostName": "saturn",
          "hostAddress": "192.168.0.106",
          "port": 33000
        }       
      ]
    }
  ]
}

Delete Partition Set

A partition set is deleted by specifying its ID

DELETE
- Permission.....: WRITE
- Behavior.......: SYNC
- Request headers:
  - Accept: application/json

- Resources:
  /partitionsets/{partitionSetId}
        
- Path variables:
  - partitionSetId: The ID of the desired partition set.

Sample response

{
  "status": 200,
  "feedback": [
  ]
}

Asynchronicity and Polling

Some resources offering an asynchronous behavior return a completion token (a string uniquely identifying an asynchronous operation). Such behavior is marked as "Async with Polling" in the documentation, when implemented for the resource in question.

The completion token that's returned in such a context may be used by clients to poll the server for obtaining progress information for the corresponding operation. The sequence is as follows:

  1. A resource is invoked with the async query parameter set to true. The resource returns either HTTP status code 250 (indicating that the operation is in progress), or an error status code if a problem occurred and the operation could not start. The JSON response that is returned then embeds a completionToken field. The value of this field is the actual completion token to use for polling the server (see next step) and inquire about the status of the ongoing operation.
  2. The client polls the /progress resource, passing to it the completion token as a path parameter - according to the following format: /progress/{completionToken} - see further below for the formal definition of the resource.
  3. The server returns HTTP status code 250 if the operation is still in progress (251 is also possible, this is explained further in the Operation Batching and Error Tolerance section below). The JSON payload then contains data about the ongoing operation, which may be used for debugging or display.
  4. The client polls until either HTTP status code 200 (or 252) or an error code (4xx/5xx) is returned - status code 252 is also explained int he Operation Batching and Error Tolerance section.
A client having obtained a completion token from a Corus server must poll that same server with the said completion token - such tokens are not replicated across a Corus cluster.

The formal definition of the progress resource goes as follows:

GET
- Permission.....: READ
- Behavior.......: SYNC
- Request headers:
  - Accept: application/json
- Response status code:
  - 250: operation has not completed yet and client should keep on polling.
  - 200: operation has completed succesfully.
  - 4xx/5xx: operation has completed with a failure.

- Resources:
  /progress/{completionToken}

Operation in progress sample response

{
  "status": 250,
  "completionToken": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback": [
    "Executing asynchronous operation"
  ]
}

Operation Batching and Error Tolerance

Certain REST resources allow executing operations over the cluster, but only over a given number of hosts at a time. This allows better control of errors in case problems happen at certain nodes: one might want to still go ahead with a deployment, for example, and intervene on the specific hosts later on.

Batching is controlled by the batchSize and minHosts query parameters. Error tolerance is configured with the maxErrors query parameter. More precisely:

  • The batchSize parameter specifies over how many hosts in a cluster at a time an operation should be performed. For example, in the context of a deployment, a batch size of 2 would deploy on two hosts at a time.
  • The minHosts parameter specifies how many hosts should be present in the cluster for the given batch size to apply. If yhe given minimum is not observed, execution of the operation will then fall back to one host at a time. This minimum is meant to act as a safeguard, in order to prevent performing a given operation on too many hosts at a time.
  • The maxErrors parameter determines the number of errors that are tolerated until the operation is deemed unsuccessful and an error status is returned. More precisely, the parameter applies to batches of hosts (as determined by the batchSize parameter): if errors occur in the context of more batches of hosts specified by batchSize, an error status is then returned.

Batching (and error tolerance) are reflected in the responses returned by a given REST resource:

  • When an operation is performed in the context of batching, the JSON response that is returned holds the JSON respresentation corresponding to the last batch(es) that was/were processed.
  • If the number of batches in error does not exceed the value specified by maxErrors, then:
    • a 251 HTTP status code is returned if hosts do remain on which the operation should be performed - this status code should be interpreted as "partial success, operation still in progress".
    • a 252 HTTP status code is returned if no hosts remain on which the operation should be performed - and if the operation was successful on at least one batch of hosts overall.
    • a 500 HTTP status code is returned if the operation resulted in a failure, against all batches of hosts on which it was executed.
  • If the number of batches in error exceeds the value specified by maxErrors, then:
    • a 252 HTTP status code is returned if no hosts remain on which the operation should be performed - and if the operation was successful on at least one batch of hosts overall.
    • a 500 HTTP status code is returned if the operation resulted in a failure, against all batches of hosts on which it was executed.
  • The JSON response will hold a processedHosts field, enumerating the Corus hosts on which the operation was last performed.

Illustration of Batching with Error Tolerance

To make the above clearer, sample responses are provided below, illustrating a typical sequence. The scenario involves a client invoking a REST resource with the async flag set to true, with the batchSize parameter set to a given number of hosts to be processed at a time, and the maxErrors parameter also set, in order to allow the operation to continue as long as the number of errors is not greater than the given threshold.

The first response obtained would be:

{
  "status": 250,
  "completionToken": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback": [
    "Executing asynchronous operation"
  ]
}

The following corresponds to a subsequent response indicating that an operation has been performed successfully on a batch of 2 hosts, but that more batches of hosts are left to process. Note: the 250 status, indicating that no error occurred, but that the operation has not been performed on all hosts yet; the processedHosts field, stating which hosts have been contacted, for the given batch:

{
  "status": 250,
  "completionToken": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback": [
    "Executing asynchronous operation"
  ],
  "processedHosts": [
	  {
	    "cluster": "app-01",
	    "corusVersion": "4.5",
	    "hostName": "saturn",
	    "hostAddress": "192.168.0.103",
	    "port": 33000
	  },
	  {
	    "cluster": "app-01",
	    "corusVersion": "4.5",
	    "hostName": "saturn",
	    "hostAddress": "192.168.0.104",
	    "port": 33000
	  }  
  ]
}

Yet another subsequent response, this time indicating that an error occurred for the corresponding batch of hosts, but that the maximum error threshold has not yet been reached (and that more hosts are left to process). Note the 251 status, indicating that an error occurred, but that the operation was not aborted (and should be considered as still in progress):

{
  "status": 251,
  "completionToken": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback": [
    "Process could not be killed in a timely manner"
  ],
  "processedHosts": [
    {
      "cluster": "app-01",
      "corusVersion": "4.5",
      "hostName": "saturn",
      "hostAddress": "192.168.0.103",
      "port": 33000
    },
    {
      "cluster": "app-01",
      "corusVersion": "4.5",
      "hostName": "saturn",
      "hostAddress": "192.168.0.104",
      "port": 33000
    }  
  ]
}

Yet another subsequent response, this time indicating that an error occurred for the corresponding batch of hosts, and that either the maximum error threshold has been reached, or that there are no more hosts to contact. Note the 252 status this time, indicating a partial success (since the operation could previously complete successfully on some hosts). This should be interpreted as "final" status: the client should stop polling at this points:

{
  "status": 252,
  "completionToken": "24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback": [
    "Process could not be killed in a timely manner"
  ],
  "processedHosts": [
    {
      "cluster": "app-01",
      "corusVersion": "4.5",
      "hostName": "saturn",
      "hostAddress": "192.168.0.103",
      "port": 33000
    },
    {
      "cluster": "app-01",
      "corusVersion": "4.5",
      "hostName": "saturn",
      "hostAddress": "192.168.0.104",
      "port": 33000
    }  
  ]
}

If execution in the context of all batches of hosts results in an error, than the final status code will be a HTTP 500 (indeed in such a case, a partial success would not reflect reality).

The table below summarizes the status codes relevant to batching:

Status CodeDescriptionFinal
250No error thus far, operation still in progress.No
251An error occurred in the context of the batch of hosts corresponding to the response, but the operation is still in progress.No
252Partial success: the operation completed partially, on some hosts, but the maximum error threshold has been reached, or there are no batches of hosts left to contact.Yes
500Error occurred, execution on all hosts resulted in a failure.Yes
200Operation completed, execution on all hosts was successful.Yes

Batching without Error Tolerance

When the maxErrors parameter is not specified, an operation is deemed a failure as soon as a single error occurs. In such a case, the final status will always be an error indicated by a 4xx or 5xx HTTP status code. If no error occurs, the final status will be HTTP 200. Intermediary "in progress" responses will have status code 250, as previously documented.

Automatic Diagnostic

Some REST resources allow internally performing a diagnostic check automatically, after execution of their main operation. In such cases, the resources accept a runDiagnostic parameter, whose value must be set to true for the automatic diagnostic to be executed. In addition, such resources also support a diagnosticInterval query parameter, which indicates a time interval (in seconds) to wait for in between diagnostic acquisition checks - until a final diagnostic status is obtained (if not specified, the default value used is 10 seconds).

The outcome of the diagnostic will correspond to either HTTP status code 200 if no anomaly was detected, or HTTP 500 otherwise. The JSON content corresponding to the diagnostic details (as documented in the Diagnostic API) will be embedded in the last JSON response sent back to the client. The sample below illustrates such a response:

{
  "status":200,
  "completionToken":"24399ed2-56a3-11e5-885d-feff819cdc9f",
  "feedback":[
    "Operation completed"
  ],
  "processedHosts":[
    {
      "cluster":"app-01",
      "corusVersion":"4.5",
      "hostName":"saturn",
      "hostAddress":"192.168.0.103",
      "port":33000
    }
  ],
  "diagnostic":{
    "status":"SUCCESS",
    "results":[
      {
        "cluster":"app-01",
        "host":"192.168.0.103:33000",
        "dataType":"diagnostic",
        "data":{
          "classVersion":1,
          "status":"SUCCESS",
          "processDiagnostics":[
            {
              "classVersion":1,
              "status":"SUCCESS",
              "suggestedAction":"NOOP",
              "name":"httpServer",
              "distribution":{
                "name":"demo",
                "version":"1.0"
              },
              "results":[
                {
                  "classVersion":1,
                  "status":"CHECK_SUCCESSFUL",
                  "message":"Found 1 process(es) for [distribution=demo,version=1.0,process=httpServer] (expected 1) - all processes are found running",
                  "diagnosticPort":{
                    "name":"http.server",
                    "value":100
                  },
                  "process":{
                    "classVersion":2,
                    "id":"150822379402",
                    "name":"httpServer",
                    "pid":"4937",
                    "distribution":"demo",
                    "version":"1.0",
                    "profile":"test",
                    "activePorts":[
                      {
                        "name":"http.server",
                        "port":100
                      }
                    ]
                  }
                }
              ]
            }
          ],
          "progressDiagnostics":{
            "classVersion":1,
            "errors":[

            ]
          }
        }
      }
    ]
  }
}

Running Corus Scripts on the Server-Side

Even though the API is meant to give access to all Corus functionality through HTTP (and thus acts as a substitute to the CLI), it may still be convenient to have the ability to work with scripts, even for HTTP clients.

The API thus allows pushing scripts to Corus, for them to be executed on the server-side. Given the implications, this requires ADMIN permissions.

Scripts are given the following as a base directory: $CORUS_HOME/files/uploads. They have access to the Java system properties of the Corus JVM (since they're executed within Corus) and to its environment variables.

The resource to hit is given below. Note that is a very specific endpoint that differs from the normal shape of this API: since a Corus script is meant to be executed at the current host, there was no point in publishing that functionality unde /clusters.

POST
- Permission.....: ADMIN
- Behavior.......: SYNC
- Request headers:
  - Accept: text/plain

- Resource:
  /runscript
 
- Parameters:
  - clusteringEnabled (defaults to true): if set to false, clustering for the commands in the provided script will be disabled

Sample script

echo "test"

Sample response

d

{
  "status": 200,
  "feedback": [ "test"]

}