View source | View content page | Page history | Printable version   
Toolbox
Main Page
Upload file
What links here
Recent changes
Help

PDF Books
Show collection (0 pages)
Collections help

Search

Projects:Configurable Update Analytics Fact Tables Process/Specs

Contents

Functional Requirements

The objective is to allow the possibility of calculating the records of the Analytics fact tables just for the data that belongs to a particular range of time.

Currently, the existing Update Analytics Fact Tables Process can take several hours if run initially or after the fact tables have been cleared. This process will be adapted to be configurable, in order to calculate the data of the fact tables just for a period starting from a date forward. For this reason, we will rename it as Forward Update Analytics Fact Tables.

Apart from defining an starting date, the process will accept a parameter to define the number of records to be calculated on every execution of the process.

So, this process will allow to calculate a fixed number of records of a fact table starting from a particular date. Thus, it will be possible to schedule the process to be executed along different days, during those hours where the system has less overload (for example at night).


Update Analytics Fact Tables process with batch size


In the same way, a new process called Backward Update Analytics Fact Tables will be implemented. It will allow to calculate the data for a period from an ending date backwards. In this case, it will also be needed to define a starting date as the limit of the backwards execution and the batch size of every individual execution of this process.


Backward Update Analytics Fact Tables process with batch size

Limitations

If the process has already calculated the facts of a particular date range, and after that the data related to any of the facts in that range is modified, then in the next execution of the process those related facts will not be recalculated again but just deleted.

Technical Specs

Application Dictionary Infrastructure

The project requires the creation of new database tables together with new Application Dictionary infrastructure in order to allow users to define the configuration of the processes mentioned above.

A new standard window is going to be available where the processes configuration will be entered. It is not allowed to have multiple configurations for the same client, so within this window, it will be possible to create just one record per client. Thus, if there is a configuration for current client, it will be used, if not a configuration defined at System client is tried to be selected.

The following fields will be present in the new window:

Add the Configuration into the Openbravo Fact Table Process

It is also required to add this new functionality into the Java classes involved in the fact record calculation process.

OBAnalyticsFactUpdater

The OBAnalyticsFactUpdater is an abstract class that every fact updater class must implement as explained here.

We are adding two new instance attributes:

Together with these new attributes, the following methods are going to be included:

OBAnalyticsFactUpdaterHandler

This is the class responsible of iterate over all fact updaters and launch the executeUpdate() method for every one.

This class will define an enum, FactUpdaterProcessMode, used to represent the mode type used by the updaters. It will also define this new method:

OBAnalyticsFactUpdaterProcess

This is the class which implements the Forward Update Analytics Fact Tables process. It uses an instance of OBAnalyticsFactUpdaterHandler which from now is going to be configured with FORWARD mode.

OBAnalyticsBackwardFactUpdaterProcess

This is a new class that implements the new Backward Update Analytics Fact Tables process. It extends OBAnalyticsFactUpdaterProcess, so it uses the same code as the Forward Update Analytics Fact Tables process but using the BACKWARD mode instead.

Use the Configuration Feature on Existing Fact Updaters

It is necessary to implement some small changes on the existing updater processes: OBAnalyticsFactDiscountsUpdater, OBAnalyticsFactOrderUpdater and OBAnalyticsFactSalesOrderHeaderUpdater so they can make use of this configuration feature. These changes should be implemented in their getData method as follows:

  1. Retrieve the date property which is going to be used to compare with the starting forward/backward dates.
  2. Inside the generated query, use this date property with the inherited getDateRangeQuery method in order to include the clause used to select the records.
  3. Add a sort clause using the getDateSortDirection, this will allow to sort the records properly based on the mode defined by the updaters handler.
  4. Finally use the setConfigParameters to perform the replacement of the configuration parameters within the query.

Replace UUID in Time Closure Table with Date Strings

Another performance improvement tested during this project was related with TimeDimensionProvider class.

This class makes use of a cache that maps time closures UUIDs (entries of the OBANALY_Time_Dimension table) with their corresponding date value representation.

Within the Update Time Dimension Refs + Tree Closure Table process the columns based on Time Dimension references are updated. This process can have a high cost in time depending on the number of records to be updated. This process is performing individual updates (setTimeDimensionReferences method), updating the column (Time Dimension) with the UUID of the referenced value in the OBANALY_Time_Dimension table.

The proposed improvement was to replace those individuals updates with just one massive update. To make this massive update possible, the OBANALY_Time_Dimension table UUIDs would be replaced with the string representation of the date.

The table affected mainly by this process is the FACT_ACCT table. Analytics adds a column in this table, em_obanaly_accountingdate, based on this Time Dimension reference.

To validate this possible improvement, it was tested in an environment with the following volumes:

The result was an improvement of about 2 minutes in the execution of the whole fact data process population.

The changes that this improvement also required (replacing UUIDs with date strings) for environments with fact data already calculated outweigh the improvement achieved in saved time. For this reason, this improvement will not be applied for the moment.

Move the Existing Updater Processes From Analytics to Another Module

As a final stage, it was decided to move the cubes defined in Analytics module together with their related fact tables and updaters to another modules. The idea was to try to keep in Analytics just the basic infrastructure. The planned movements for the database elements were as follows:

But several technical problems were found during this task:

  1. For the fact tables, it was necessary to rename the foreign key and primary key constraints together with their indexes. Those changes could affect seriously to the upgrade process of the module, in terms of performance.
  2. The Retail Analytics module already has two external columns for each fact table, so it would be necessary to rename those columns, causing data lost in those columns during a possible upgrade.
  3. For the EM_OBANALY_ACCOUNTINGDATE column, currently is not possible to export it to another module, because of its name (EM_ + DBPREFIX), DBSourceManager will always try to export it to the Analytics module.

Taking into account all of these disadvantages, this movements will not be performed withing the scope of this project.

Retrieved from "http://wiki.openbravo.com/wiki/Projects:Configurable_Update_Analytics_Fact_Tables_Process/Specs"

This page has been accessed 1,620 times. This page was last modified on 13 February 2017, at 15:03. Content is available under Creative Commons Attribution-ShareAlike 2.5 Spain License.