Advanced

This section described some Corus characteristics that may be considered more "advanced". We will add more content with time.

High-Availability

Corus provides high availability (HA) at the level of the Corus daemon itself, and also at the application process level:

  • When ran as a service, the Corus daemon is in fact managed by Service Wrapper - the latest non-commercial open source version (3.2.3) is used.
  • When a Corus daemon starts an application process, it monitors it, automatically restarting it if a crash occurs.
We're explaining the above in more detais in the following subsections.

Corus Daemon HA

If you install Corus as a service, the Service Wrapper provides the necessary checks at runtime in order to make sure that the daemon is always up.

Indeed, when installed in such a way, Corus starts when the OS first boots. From then on, the Service Wrapper takes over.

Application Process HA

When Corus starts a JVM, that JVM process embeds an agent that implements the Corus Interoperability Protocol. That protocol specifies the interactions that occur between a Corus daemon and the process it starts.

Among other interactions, we're finding that a process must poll its Corus daemon at a predefined interval (the Corus Guide describes in more details how the various runtime parameters are passed by the Corus daemon to the process, upon startup).

If Corus detects that a process has not polled after the specified "poll interval", it enters into the restarting of the process. This is done by first sending a "kill" event to the process. At this point, we're not talking about an OS kill, but a process termination event as specified in the Corus Interop Protocol.

If processes have diagnostics configured, then that support is also used internally for determining process health. Since interop is disabled for Docker containers, corresponding process configurations should have diagnostic check configured to insure auto-restart by Corus at runtime.

It is then the responsibility of the process to handle that termination event coherently (usually by releasing all system resources and having internal application components cleanly shut down themselves). Upon the termination event having been processed internally, the process must send a termination confirmation to Corus, and then exit from its main loop (which at this point means that the process de facto terminates, from the point of view of the OS).

After having received the termination confirmation from the process, Corus starts a fresh instance of it.

But there is a catch: a process might not even send a termination confirmation. What if it has really crashed (i.e.: disappeared from the OS' process list) ? The process is not even up then, so it can't even pickup its termination event...

Corus has a workaround for that specific issue: it waits for a predefined amount of time in order to receive the termination confirmation. Passed that delay, it proceeds to a "hard" kill, invoking the native kill command (in Linux/Unix, Corus in fact performs a kill -9).

Command Replication

Since there can be many Corus instances in a domain, it is important that the command replication algorithm be efficient. In order to spare the resources used by each Corus instance in the context of replication, a so-called cascading distributed replication algorithm has been implemented. The diagram below provides an illustration:

  1. When a command with the -cluster switch is typed in the command-line interface (CLI), it is sent to the Corus instance to which the CLI is currently connected.
  2. Upon receiving the command, the Corus instance picks another Corus instance in the domain that has not yet received it. It then sends the command to that instance (that step is performed at each instance, until all instances have been visited).
  3. When a given instance has replicated the command it has received, it executes its own locally.

The above also applies to the deploy command.

The above means that each Corus instance replicates to only one other Corus instance (if any is left to replicate to).

Application Status

You can integrate application-provided status information with Corus, and then hook up Corus with your monitoring infrastructure, through HTTP.

The first step is to publish application status data through the Corus interop API, in the application's code. Todo so, you need the following dependency:

<dependency>
  <groupId>org.sapia</groupId>
  <artifactId>sapia_corus_iop_api</artifactId>
  <version>3.0</version>
</dependency>

The code snippet below, taken from the source that's part of the tutorials, publishes Jetty status data to Corus:

import org.sapia.corus.interop.api.InteropLink;
import org.sapia.corus.interop.api.StatusRequestListener;
import org.sapia.corus.interop.api.message.InteropMessageBuilderFactory;
import org.sapia.corus.interop.api.message.StatusMessageCommand.Builder;

...

  public void init(){
    
    ...
    
    InteropLink.getImpl().addStatusRequestListener(this);
  }

  @Override
  public void onStatus(Builder statusBuilder, InteropMessageBuilderFactory factory) {
    ContextMessagePart.Builder contextBuilder = factory.newContextBuilder().name("org.sapia.corus.sample.jetty");
    contextBuilder
      .param("dispatched", String.valueOf(stats.getDispatched()))
      .param("dispatchedActive", String.valueOf(stats.getDispatchedActive()))
      .param("dispatchedActiveMax", String.valueOf(stats.getDispatchedActiveMax()))
      .param("dispatchedTimeMax", String.valueOf(stats.getDispatchedTimeMax()))
      .param("dispatchedTimeTotal", String.valueOf(stats.getDispatchedTimeTotal()))
      .param("dispatchedTimeMean", String.valueOf(stats.getDispatchedTimeMean()))
      .param("requests", String.valueOf(stats.getRequests()))
      .param("requestsActive", String.valueOf(stats.getRequestsActive()))
      .param("requestsActiveMax", String.valueOf(stats.getRequestsActiveMax()))
      .param("requestsTimeMax", String.valueOf(stats.getRequestTimeMax()))
      .param("requestsTimeMean", String.valueOf(stats.getRequestTimeMean()))
      .param("requestsTimeTotal", String.valueOf(stats.getRequestTimeTotal()))
      .param("suspends", String.valueOf(stats.getSuspends()))
      .param("suspendsActive", String.valueOf(stats.getSuspendsActive()))
      .param("suspendsActiveMax", String.valueOf(stats.getSuspendsActiveMax()));
    statusBuilder.context(contextBuilder.build());        
  }
  
 ...

As the code shows, in order to publish status data, one needs to implement a StatusRequestListener that specifies a single onStatus method. That method is passed an instance of the Status class. The application should create one or more Context instances, add Param instances to each, and then add each context to the Status. The diagram below illustrates that containment relationship:

Repository

The repository functionality has been introduced as of version 4.0. The Corus Guide provides exhaustive documentation as to what the functionality consists of, and how to configure Corus in order to activate it. We'll nevertheless cover the basics here.

Briefly said, the repo functionality involves having Corus instances acting as repository servers (typically, one instance per cluster), while others act as repository clients.

The role of a a repo server node (on top of offering the usual Corus functionality) is to push its state (distributions, port range configs, process properties, tags...) to repo client nodes, on demand (that is, at it's requested by client nodes, in the form of a "pull").

The pull that is initiated by repo client nodes occurs automatically, when these start up.

This means that new repo client nodes appearing in a cluster need not having an explicit deployment performed on them. Rather, this is done automatically, through the automatic pull at startup.

In addition, as part of that bootstrap phase, client nodes will automatically start processes that require it (see the section on execution configurations for more details about the process "start-on-boot" feature).

The repo functionality was introduced in order to ease the setup of new Corus instances in a domain: the fact that any new instance automatically becomes a copy of the existing ones saves a configuration step for sysadmins.

Also, in cloud environment, where VMs are spawned in a dynamic fashion, the feature makes Corus deployment a breeze: appearing Corus instances automatically fetch their state from the existing Corus repo server node, without necessitating an explicit application deployment step.

Conclusion

Corus has advanced features that you'll come to appreciate as you work with them. The tutorials and the Corus Guide provide additional information.