View source | View content page | Page history | Printable version   

Projects:Import Entry High Availability/Specs

Back button.png   Back to Projects:Import Entry High Availability

Contents

Functional Requirements

Currently when Openbravo is running in a clustered environment, the import entries are always processed in the same (one) node. The goal of this project is to provide a mechanism to ensure that in case the node in charge of processing the import entries goes down, the system can select another node in the cluster to continue with the processing of the import entries.

The import entries will continue being processed in a single node though, the mechanism introduced with this project will ensure that we will not lead to an state in which the system can not process import entries.

Project Scope

The main objective is to provide a mechanism to ensure high availability for the import entry processing. Nevertheless, it will be implemented a generic mechanism flexible enough to provide the high availability feature not only for the import entry processing but also for other services that could require it, like the background process execution.

Technical Specifications

Single Leader Election Algorithm

The solution is based on a leader election algorithm that uses periodical pings fired by each node present in the cluster.

The key component is a single database table that we are going to use in order to register which node is responsible of executing a particular service: handle import entries, execute the background process, etc. Thus, the database is the central point used to manage the execution of these services in the cluster.

Each node will periodically query this table in order to discover if it is the node that must manage a service in particular or otherwise, if it should replace the node that should manage the service because that is no longer alive.

The way to decide if a node is not alive is by checking that it has not performed the ping during a time interval.

In summary, our algorithm fulfills the following points:

Leader Registering Table

We are going to define a database table which will have a single row per service. It would have, among others, the following columns:


NODE SERVICE_TYPE LAST_PING

Attempt Election

The first ever node trying to discover if it should handle a particular service will not find any record in the table for that service. Therefore it will succeed in inserting the record in the table with its node information:


NODE SERVICE_TYPE LAST_PING
Node1 IMPORT_ENTRIES 2017-10-25 17:55:11


Thus, the node with identifier Node1 has registered itself as the node in charge of handling the service IMPORT_ENTRIES. It also sets the last ping with the time where the record insertion takes place.

If at that very same moment there was a node trying to register itself as the leader of the IMPORT_ENTRIES service then the register will not be completed, because the table will ensure a single record per service type by defining the SERVICE_TYPE column as unique.

Rejecting Election

Each node in the cluster (including the leader) will periodically check the status of the leader in order to try to register its own leadership.

Let’s suppose that we perform this action in an interval of 1 minute and Node2 is now trying to determine if it should be the leader: since LAST_PING < now() - interval 1 minute is false for the IMPORT_ENTRIES service, the NODE column is unchanged, nor is LAST_PING.

Replacing the Leader

Let’s suppose that after the 1 minute interval, the Node1 is no longer active. The LAST_PING information of the table remains without any change: it has not been updated.


NODE SERVICE_TYPE LAST_PING
Node1 IMPORT_ENTRIES 2017-10-25 17:55:11


But Node2 is still alive and during its periodical checking detects that LAST_PING < now() - interval 1 minute is now true, then it will try to update the table with its own information in order to become the leader:


NODE SERVICE_TYPE LAST_PING
Node2 IMPORT_ENTRIES 2017-10-25 18:06:21

Updating the Ping Timestamp

The next time that Node2 attempts to register its own leadership it will find that it is already the leader and therefore it will just update the LAST_PING information:


NODE SERVICE_TYPE LAST_PING
Node2 IMPORT_ENTRIES 2017-10-25 18:16:21

Configurations

In order to increase the flexibility of this solution and make it adaptable to the requirements of each environment, it will be possible to configure the timeout frequency in which the ping will be made.

Note that it will be required to restart all the nodes in the cluster if the timeout value is changed in order to make the new value visible for the entire cluster.

JMX MBean

Useful information related to the cluster services and their respective leaders will be provided with a MBean that will be accessible through JMX. Using this technology it will also be possible to:

Note that a security mechanism will be also implemented to avoid deregistering a leader or disabling the ping in a node in case the cluster service is processing its tasks (i.e. processing import entries) in that very same moment.

Known Limitations

It is important to note that with this solution we are assuming that all the nodes in the cluster have exactly the same hour. It will also be mandatory to configure all the nodes with a name that identifies each one uniquely.

Besides, we may have periods in time which we do not have a node in charge of executing a service (until a node in the cluster notices that it should take control). The maximum duration of a period without a leader will depend on the ping frequency.

Retrieved from "http://wiki.openbravo.com/wiki/Projects:Import_Entry_High_Availability/Specs"

This page has been accessed 1,037 times. This page was last modified on 1 February 2018, at 15:12. Content is available under Creative Commons Attribution-ShareAlike 2.5 Spain License.