Reporting Server/Reporting Table and Load Script Development
Contents |
This document discusses the two main artifacts related to the reporting database, also called the reporting datawarehouse:
- table in the reporting database
- load script to load data from the Openbravo database into the reporting database
In the description we will follow the standard path, specialties (like custom load logic) are discussed in a separate document.
Advise: Table versus Materialized View
When looking at the structure of a reporting database the advise is to do the following:
- in the load SQL script do some simple joins and field transformations (date to string for example)
- the reporting table should be simple and have very few indexes
- create one or more materialized views for aggregations, setting indexes on the materialized view
For example consider a reporting table holding information of c_orderline:
- it is loaded from Openbravo pre-joining information on customer, address and other header information
- the corresponding reporting table is simple and has no indexes
- from the reporting table several materialized views are defined, each for different aggregations: product, store, sales rep, customer
From a performance perspective it is interesting to note that refreshing materialized views is quite fast, a standard postgresql install on a M5 AWS machine can refresh a materialized view with a rate of 2-3million records/minute.
Table in the reporting database
Definition
The reporting database is a normal sql database within the database server.
As you can see a reporting table is prefixed by the module DB prefix (similar to an Openbravo table).
Similar to the tables in the Openbravo database, the definition of a table in the reporting database is represented in the source code in two locations:
- as a xml file in the src-db/database/reporting-model/tables folder in your module
- in the application dictionary within Openbravo backoffice
The definition in the application dictionary is visible through the Reporting Tables and Columns window.
Creating, extending
To create or change the reporting table definition do the following steps:
- change the definition directly in the database server (also add indexes there)
- execute ant export.reporting.model.full within the org.openbravo.reporting.tools module. Make sure the module is flagged to be in development. This will read the database model directly from the database server and update the model in the xml files as well as in the application dictionary.
To extend a reporting table with a column through another module add it to the table with the 'em_' prefix followed by the module db prefix. Then when exporting the reporting model it will be added to the corresponding module. See here for an example.
Creating the Reporting Database
You can create the reporting database directly from the xml definitions by executing this command in the org.openbravo.reporting.tools module folder:
ant create.reporting.database
To update the reporting database, make the required changes to the xml, drop the database from the database server and then run the above command.
Load Script
The load scripts are responsible for loading, updating and refreshing data in the reporting database. Load scripts are executed within the data loader application in a certain order, scripts with the same sorting/order number can be executed in parallel, providing large load time benefits.
Definition
In its base form the load script is a sql select statement which is executed in the source Openbravo database and its results are used to insert data into the reporting database. Any valid query is allowed, you can do joins, inner selects etc. The only requirement for the standard load script is that the columns it returns in the select clause should use the alias corresponding to the column name of the reporting table in the target database.
Next to the standard load script there are the following types:
- Custom load script: a load script which uses a custom java class for its execution
- Standard update script: instead of loading data from the source database this script is run only in the target database and can be used to update data
- Custom update script: similar to the previous script, only the execution is done by a custom update java class.
- Materialized view: a materialized view does not have a load script but the refresh of a materialized view is part of the overall load logic therefore it is included in the definition of load scripts.
The other fields of the load script definition:
- script order: is used to control ordering of script execution, for example to make sure that a materialized view is refreshed after a specific table is loaded. It makes sense to give scripts that are independent from each other the same script order number (if possible) to facilitate parallel execution.
- script type: as discussed above
- reporting table: the table in the reporting database which will get updated, is relevant for standard load scripts as they expect a target table to load data into
- always do full reload: do not do incremental load but always reload the complete table. For smaller tables (smaller than a few 10-thousands of records) it definitely makes sense to always do full reload. For large tables (100-thousands, millions of records) incremental load (this field unchecked) is better.
- custom load class name: you can override the standard load or update class with your own implementation.
- execute once: is used for module-script-like behavior, check this field if you only want to load or update a table once. This can be useful when adding a new column to the reporting database and setting it for all existing records in the reporting database.
- SQL Script: relevant for load script type, the sql script which is executed in the source database.
SQL Script
The standard load script is the most common form. Its SQL script can be any SQL select statement. The SQL script is executed in the source database. The main requirement for the SQL script is that the columns it returns in the select clause should use the alias corresponding to the column name of the reporting table in the target database.
Here is an example:
SELECT bp.c_bpartner_id AS obrtmd_bpartner_id, bp.value AS value, bp.name AS name, bp.iscustomer AS iscustomer, bp.isvendor AS isvendor, bp.c_bp_group_id AS obrtmd_bp_group_id @SELECT_EXTENSION@ FROM c_bpartner bp @FROM_EXTENSION@ WHERE bp.ad_client_id IN ('0', $client.id$) AND bp.updated > $updatedFrom$ AND bp.updated < $updatedTo$ @WHERE_EXTENSION@
The SQL script has extension points defined. When you create your own script make sure to include these in your script. The following extension points are supported:
- @SELECT_EXTENSION@
- @FROM_EXTENSION@
- @WHERE_EXTENSION@
- @GROUPBY_EXTENSION@
Script extension is discussed in more detail below.
Incremental Load and Client Filtering Parameters
The script above also shows three other parameters which can be good to use. The parameters are enclosed between $:
- client.id: every query should normally use this parameter to ensure to only read the data of one client. The client id is set in the reporting.properties file.
- updatedFrom/updatedTo: are timestamps set automatically by the system to support incremental loading of data. If full reload is checked in the load script then these timestamps are set so wide that everything is read. After a first execution the updateFrom timestamp is read from the obrtpttl_parameters table.
Extending
The SQL Script can be extended through custom modules with Update Script Extension subtab. You can extend the select, from, where and group by clause.
View resulting Query with extensions
To view the query with its extensions inserted visit the Update Script Content subtab.
Update script
Next to the standard load script the data loader application can also run direct update scripts. Update scripts do not load information from a source transactional database.
Update scripts are part of the overall ETL/ELT functionality and are run in the same ordering as load scripts. So it is for example possible to first load data from the source database and then in a next step update this information.
An update script is run in the reporting database so you can use update with select statements accessing other tables and materialized views in the reporting database. However, you need to make sure (by setting the script order) that your update script runs after all its dependent tables/materialized-views have been updated/refreshed.
An example of an update script is the script to update the organization tree after the tree information has been loaded from the source database.
Some things to note:
- the script type is Standard Update. This type assumes that the SQL script field contains one or more SQL update statements.
- the SQL Script can consist of more than one update statement. As long as the syntax is correct. All update statements within one SQL Script are executed in one transaction.
- no result checking is done (except for low level script/connection errors), so your update script may update zero records without it being reported.
- update script do not support extensions through modules. This should not be an obstacle as modules can provide update scripts and in this way extend/add update actions.
Purge script
There is a new custom update script that supports purging data of specific tables in reporting to lower the data volume stored in tables, resulting in better reporting performance.
This custom update is defined as follows:
To run the purge process, please ensure that Active field is marked.
If activated, the custom update will look for a JSON file called data-purge-definition.json with the settings to use, so the first step is to create this file:
- A template called data-purge-definition.json.template can be found in modules/org.openbravo.reporting.tools directory to make this step easier.
- The path where the data-purge-definition.json file has to be located can be checked in the property purge.definition.location of the reporting.properties file.
Having this definition:
{ "setup": { "purgeFrom": { "type": "month", "quantity": "13" }, "executionDay": "SUNDAY" }, "steps": [ { "name": "obrts_order_sales", "timestampColumn": "created" }, { "name": "obrts_orderline_tax", "timestampColumn": "", "joinTable": { "name": "obrts_orderline_sales", "parentColumnToJoin": "obrts_orderline_sales_id", "columnToJoin": "obrts_orderline_sales_id", "timestampColumn": "created" } } ] }
The different properties are:
- setup.purgeFrom.type: possible values are: year, month, week and day
- setup.purgeFrom.quantity: quantity of "type". In this case, the purger will remove all the data older than 13 months from all the listed tables.
- setup.executionDay: prevent execution on a different day than expected. The purging process will be run on a weekly basis
- steps: array with the tables to purge. For each element of the array:
- steps.name: name of the table to purge
- steps.timestampColumn: timestamp column to filter the data to purge. If the table doesn't have a timestamp column the field remains empty. In this case we must to fill the rest of the fields:
- steps.joinTable.name: name of a related table with a valid timestamp column.
- steps.joinTable.parentColumnToJoin: column of the original table used to join with the joined table
- steps.joinTable.columnToJoin: column of the joined table used to join with the original table
- steps.joinTable.timestampColumn: timestamp column of the joined table
When the JSON file is created, in the next execution of the load process the purger will be executed and it will purge the data according to the definition.
Run the load/update script
You can run the a load script directly if the reporting.properties are correctly setup in your environment. To run the load script, select a record and click the run script button. Select a client and press done.