Projects:Backgroud Process Cluster Support/TechSpec
This project aims to allow scheduled background processes, managed by Quartz Scheduler, to be executed from any application server node by using Quartz's built-in clustering functionality.
To enable this, a persistent JobStore needs to be used. Also, the JobStore must be shared between the running instances of the Quartz Scheduler cluster.
- Leader: Mauricio Peccorini <mpe at openbravo dot com>
- Status: In progress
- Branch: https://code.openbravo.com/erp/devel/pi-quartz-cluster
This project will take advantage of the existing clustering functionality of the Quartz Scheduler to allow Background Processes to be executed from any instance of Openbravo when running in a configuration of multiple application servers.
To enhance Openbravo's Load Balancing and High Availability characteristics, the Background Processes Scheduler will be modified to work as a cluster. This will allow processes to be executed in a balanced fashion instead of generating all the load on a single application server. Furthermore, scheduled processes will keep being executed if an application server fails, no matter which instance presents the failure.
In the future, this enhancement will also open the possibility of dynamically creating and destroying application server instances to better cope with user load in the Cloud environment while saving costs when the load is reduced. However, there are other modifications required for this to be possible. Most prominently, sharing the HTTP sessions information between the application servers.
Modify core to use the clustering functionality of the Quartz Scheduler library while maintaining all the functionality already available.
No Job Recoverability
When a node fails while executing a Job, Quartz allows it to be re-executed by one of the remaining nodes. This is set up in the Job by enabling the property "requests recovery". Since, Openbravo does not consider this possibility in the Processes metadata, it will be assumed that Jobs interrupted by a node failure will not be restarted and, instead, the Process will be executed until the next scheduled time.
This could be modified in the future to allow the administrators to choose whether processes could be recovered in that sense. However, it must be considered that an application server's instance failure could well be caused by a delinquent process so it could be unwise to restart it on another instance without further analysis.
Process Data Serializability
While using the current RAMJobStore java objects are added to the ProcessBundle, a Map of objects available to the process as execution parameters. These objects can be added to the map without regard of their ability to be serialized.
Since the JDBCJobStore will store this information in the database, it must be able to serialize all the objects passed to the process as parameters. Hence, when using the JDBCJobStore, it will be required that all the parameters are serializable or the scheduling of the process will fail with an apropriate user message.
Background process clustering in Openbravo depends on version 2.3.0 or higher of Quartz Scheduler. Even though clustering is available in previous versions of Quartz, the Job and Trigger management that takes advantage of the clustering functionality is implemented using version 2.3.0 of Quartz.
All the development must be done on top of core 19Q4 and have changesets generated by this project applied.
Single execution of the process per each scheduled firing
If a process is scheduled to be executed with a determined frequency, one and only one instance of the scheduler cluster will execute the process for each time of execution. Under no circumstances the cluster can initiate the process simultaneously in more than one instance and expect the process itself to control this situation.
This is different from concurrent execution due to a process taking longer than the time between executions. In this circumstances, the same or a different instance will initiate the process as part of the new scheduled time of execution. It is the responsibility of the process infrastructure to continue respecting the "Concurrent Execution" setting already existent in the processes meta data.
Random load balancing
There will be no specific strategy for assigning job executions to scheduler instances. Instead, after the execution time of a job has been reached, the next scheduler instance to check will get the job assigned.
Limit to the number of concurrent jobs in a single instance
There must be a limit to the number of concurrent jobs being executed in an instance of the cluster. However, no specific development needs to be done for this as the number of workers can be controlled in Quartz configuration.
Job Store persistence
Instead of relying on memory structures within the JVM, the information of the existing Jobs (processes) and Triggers (schedules) needs to be managed by a persistent store so they can be accessed by all the instances of Quartz.
Currently, there is only one standard Job Store available from Quartz that allows for clusterization, the JDBCJobStore. This Job Store comes in two flavours depending on how the database transactions will be managed. Namely: JDBCJobStoreTX and JDBCJobStoreCMT.
Since Openbravo manages the database connection pool itself instead of registering the Data Source within the container to be accessed via JNDI, the JDBCJobStoreTX will be used. This poses no additional restrictions as in no case a transaction related to business logic should be reused by Quartz or vice-versa.
Using the JDBCJobStore instead of an in-memory JobStore has a performance impact because now quartz will do database queries instead of fetching data from RAM. In this document this overhead was analized, and determined to be small.
Database structures of the JobStore
To persist the jobs and triggers information, Quartz uses a JobStore that relies on a database connection. This can be implemented in several different ways within Openbravo:
- Separate databases: A separate PostgreSQL database can be created, either managed by DBSourceManager or not.
- Openbravo database, separate tables: Quartz tables can be created within the Openbravo database where they would be managed by the DBSourceManager. A custom table prefix can be assigned to the tables to respect the nomenclature of Openbravo.
- Joined tables: Quartz job and trigger information could be integrated into the existing ad_process, ad_process_request and ad_process_run tables. This would enable the use of DAL to access Quartz information. However, this would mean creating a custom Openbravo JobStore altogether. This involves a serious design and development project to take care of scheduler instances management, process concurrency and other scheduling logic that is already implemented in the JDBCJobStore. Furthermore, this ties Quartz very tightly into Openbravo, reducing our ability to switch to a different job scheduler in the future.
Since the information in the Quartz tables represent already scheduled jobs and triggers that have gone through the processing of the Quartz API, whereas information in Openbravo tables represent processes that could be scheduled, now or in the future, and that not necessarily have gone through the processing of the API. The currently preferred alternative is to use the Openbravo database, manage the structures with DBSourceManager to properly support Oracle and PostgreSQL and keep the Quartz and Openbravo tables separated.
Access to the database structures
The implementation of data access in Quartz comprises two main concepts: the Connection Provider and the Driver Delegate.
The connection provider gives Quartz access to a database connection pool, while the Driver Delegate provides a layer of abstraction to resolve the specific syntactical features of the different database vendors.
The alternatives for the implementation of data access are:
- C3p0 connection pool with Quartz implemented Driver Delegate: By default, Quartz uses a C3p0 connection pool. Also, Quartz already has Driver Delegates to access PostgreSQL and Oracle databases. Having a separate connection pool would just add to the list of dependencies without providing any value, so this is an inconvenience. On the other hand, the already implemented Driver Delegates do not aim to standardize the database structures and that would prevent us from using DBSourceManager.
- Openbravo connection pool with Quartz implemented Driver Delegate: This alternative's only advantage is the minimal development required for it to be implemented. A simple class implementing Quartz ConnectionProvider interface is all it would take. However, it doesn't resolve the problem of having database structures that cannot be managed by DBSourceManager.
- Openbravo connection pool with Openbravo custom Driver Delegate: This alternative avoids the need for a new connection pool implementation required in alternative 1. Also, since access to the Quartz tables would be implemented in our own Driver Delegate, database structures can be standardized and adjusted for management by DBSourceManager.
Even though option 3 is the alternative where most development is necessary, this isn't a complex endeavour and it will greatly simplify installation and maintenance operations in the future. Unless a better solution is found, this will be the one implemented by the project.
Closed Discussion Items
- JobStore: Data structure used by Quartz to maintain the list of jobs to be executed and the associated schedule of execution.
- Serializability: Ability of a java object of being converted into a stream of bytes. This imposes several constraints like not having pointers to open files or network communication sockets.