Projects:Configurable Update Analytics Fact Tables Process/Specs
The objective is to allow the possibility of calculating the records of the Analytics fact tables just for the data that belongs to a particular range of time.
Currently, the existing Update Analytics Fact Tables Process can take several hours if run initially or after the fact tables have been cleared. This process will be adapted to be configurable, in order to calculate the data of the fact tables just for a period starting from a date forward. For this reason, we will rename it as Forward Update Analytics Fact Tables.
Apart from defining an starting date, the process will accept a parameter to define the number of records to be calculated on every execution of the process.
So, this process will allow to calculate a fixed number of records of a fact table starting from a particular date. Thus, it will be possible to schedule the process to be executed along different days, during those hours where the system has less overload (for example at night).
In the same way, a new process called Backward Update Analytics Fact Tables will be implemented. It will allow to calculate the data for a period from an ending date backwards. In this case, it will also be needed to define a starting date as the limit of the backwards execution and the batch size of every individual execution of this process.
If the process has already calculated the facts of a particular date range, and after that the data related to any of the facts in that range is modified, then in the next execution of the process those related facts will not be recalculated again but just deleted.
Application Dictionary Infrastructure
The project requires the creation of new database tables together with new Applicaton Dictionary infrastructure in order to allow users to define the configuration of the processes mentioned above.
A new standard window is going to be available where the processes configuration will be entered. It is not allowed to have multiple configurations for the same client, so within this window, it will be possible to create just one record per client. Thus, if there is a configuration for current client, it will be used, if not a configuration defined at System client is tried to be selected.
The following fields will be present in the new window:
- Forward Start Date: the fact table records to be taken into account will be those which are subsequent to this date.
- Forward Batch Size: number of records that will be processed on each execution of the Forward Update Analytics Fact Tables Process.
- Backward Start Date: if defined, the fact records previous to this date will be calculated also. This field will be displayed just if the Forward Start Date is defined.
- Backward Batch Size: number of records that will be processed on each backwards execution. It will be displayed just if the Forward Start Date is defined.
Add the Configuration into the Openbravo Fact Table Process
It is also required to add this new functionality into the Java classes involved in the fact record calculation process.
We are adding two new instance attributes:
- mode: defines the behavior (FORWARD or BACKWARD) on the record creation.
- updaterConfiguration: contains the configuration definition for the updater process: the values of the FORWARD/BACKWARD dates and the batch sizes.
Together with these new attributes, the following methods are going to be included:
- initializeConfiguration: initializes the updaterConfiguration attribute. This method will be invoked on every execution of the updater, before proceeding with the update of the fact table.
- getConfiguration: retrieves the current process configuration. If it is not defined for the current client, it will try to recover the configuration for the System client.
- getDateRangeQuery: builds an String with a clause that contains the range filters according to the process configuration.
- setConfigParameters: sets the parameters for the clause created with getDateRangeQuery, according to the current execution mode.
- shouldApplyBackwardStartDate: detects if the the start date filter must be applied, i.e., if we are in backward mode and indeed if the backward start date is defined.
- getBatchSize: returns the batch size for the current mode.
- setProcessMode: a setter to define the process mode type.
- getDateSortDirection: returns how dates should be sorted according with current execution mode: ascending if mode is FORWARD and descending in BACKWARD mode. It must be used on the getData() method of the fact updaters in order to retrieve the data properly depending on the mode.
This is the class responsible of iterate over all fact updaters and launch the executeUpdate() method for every one.
This class will define an enum, FactUpdaterProcessMode, used to represent the mode type used by the updaters. It will also define this new method:
- setProcessMode: a setter to define the mode used by the updaters that the handler is going to run.
This is the class which implements the Forward Update Analytics Fact Tables process. It uses an instance of OBAnalyticsFactUpdaterHandler which from now is going to be configured with FORWARD mode.
This is a new class that implements the new Backward Update Analytics Fact Tables process. It extends OBAnalyticsFactUpdaterProcess, so it uses the same code as the Forward Update Analytics Fact Tables process but using the BACKWARD mode instead.
Use the Configuration Feature on Existing Fact Updaters
It is necessary to implement some small changes on the existing updater processes: OBAnalyticsFactDiscountsUpdater, OBAnalyticsFactOrderUpdater and OBAnalyticsFactSalesOrderHeaderUpdater so they can make use of this configuration feature. These changes should be implemented in their getData method as follows:
- Retrieve the date property which is going to be used to compare with the starting forward/backward dates.
- Inside the generated query, use this date property with the inherited getDateRangeQuery method in order to include the clause used to select the records.
- Add a sort clause using the getDateSortDirection, this will allow to sort the records properly based on the mode defined by the updaters handler.
- Finally use the setConfigParameters to perform the replacement of the configuration parameters within the query.
Replace UUID in Time Closure Table with Date Strings
Another performance improvement tested during this project was related with TimeDimensionProvider class.
This class makes use of a cache that maps time closures UUIDs (entries of the OBANALY_Time_Dimension table) with their corresponding date value representation.
Within the Update Time Dimension Refs + Tree Closure Table process the columns based on Time Dimension references are updated. This process can have a high cost in time depending on the number of records to be updated. This process is performing individual updates (setTimeDimensionReferences method), updating the column (Time Dimension) with the UUID of the referenced value in the OBANALY_Time_Dimension table.
The proposed improvement was to replace those individuals updates with just one massive update. To make this massive update possible, the OBANALY_Time_Dimension table UUIDs would be replaced with the string representation of the date.
The table affected mainly by this process is the FACT_ACCT table. Analytics adds a column in this table, em_obanaly_accountingdate, based on this Time Dimension reference.
To validate this possible improvement, it was tested in an environment with the following volumes:
- 444400 records in the FACT_ACCT table (to be updated with the proper value in em_obanaly_accountingdate column)
- 22002 records in the OBANALY_FACT_SALESORDHEADER table (to be inserted)
- 31844 records in the OBANALY_FACT_ORDER (to be inserted)
The result was an improvement of about 2 minutes in the execution of the whole fact data process population.
The changes that this improvement also required (replacing UUIDs with date strings) for environments with fact data already calculated outweigh the improvement achieved in saved time. For this reason, this improvement will not be applied for the moment.
Move the Existing Updater Processes From Analytics to Another Module
As a final stage, it was decided to move the cubes defined in Analytics module together with their related fact tables and updaters to another modules. The idea was to try to keep in Analytics just the basic infrastructure. The planned movements for the database elements were as follows:
- Move the OBANALY_FACT_ORDER, OBANALY_FACT_SALESORDERHEADER, OBANALY_FACT_DISCOUNTS tables to the Retail Analytics module.
- Move the EM_OBANALY_ACCOUNTINGDATE column of the FACT_TABLE table to the Multi-Dimensional Financial Reports Infrastructure.
But several technical problems were found during this task:
- For the fact tables, it was necessary to rename the foreign key and primary key constraints together with their indexes. Those changes could affect seriously to the upgrade process of the module, in terms of performance.
- The Retail Analytics module already has two external columns for each fact table, so it would be necessary to rename those columns, causing data lost in those columns during a possible upgrade.
- For the EM_OBANALY_ACCOUNTINGDATE column, currently is not possible to export it to another module, because of its name (EM_ + DBPREFIX), DBSourceManager will always try to export it to the Analytics module.
Taking into account all of these disadvantages, this movements will not be performed withing the scope of this project.