Projects:Purge Data From Database/Functional Specification
Contents |
Overview
The purpose of this project is to provide a script to help remove data from the database of an Openbravo environment.
The purge script removes data from the Openbravo database permanently, archiving the data that will be purged is out of the scope of this project. The script can be execute in dry-run mode to verify the data about to be purged is no longer needed in the Openbravo database before actually purging it.
Target users
The script should be used with care, to make sure it purges all the data meant to be removed, and no other records that should be persisted.
Because of this, the target user of the script is a person with a detailed knowledge of the database schema of the Openbravo server where the script will be executed.
In the first stage of the project the purge script will be a database function, that must be executed as the postgresql user (or any other user with superuser privileges).
Requirements
PostgreSQL
In the first phase of the project, the purge script must work in a PostgreSQL database.
Foreign keys
The script must be able to automatically remove data that references rows that are being purged.
In most cases the script can rely on the foreign keys defined in the catalog, but those are not enough.
Consider the following database schema:
The objective would be, starting from the cashups, deleting all the related data (orders that belong to those cashups, its order lines, etc).
- For legacy reasons there is no foreign keys from c_order to obpos_app_cashup, so it is not possible to rely on the foreign keys defined in the database catalog, so it should be possible to define custom foreign keys outside the catalog.
- When an entry from c_order is purged, its c_orderline entries will be deleted by the purge script via the foreign keys defined in the catalog.
- The same will happen when a c_orderline is removed, the c_invoiceline that reference them will be removed using the standard foreign keys.
- When a record from c_invoiceline is removed, we must remove its c_invoice, because otherwise it would be left in an inconsistent state. Because the foreign key between those tables is on the opposite direction (c_invoiceline -> c_invoice), an extra customization is needed so that the scripts removes the entries from c_invoice.
- Finally, there may be records from c_file that indirectly reference the entries removed from c_invoice. It is impossible to create a foreign key in the catalog for this relationship, because the record_id column of c_file can point to entries for several tables, determined by its ad_table_id column. Because of this, it should be possible to define the relationship between this tables, providing a where clause to filter on the relevant records.
To sum it up, the purge script will rely on four mechanism to determine what data should be removed as a consequence of data from a table:
- Foreign keys defined in the catalog: c_orderline -> c_order
- Natural foreign keys defined in custom configuration: c_order -> obpos_app_cashup
- Inverse foreign keys defined in custom configuration: c_invoiceline -> c_invoice
- Foreign keys defined in custom configuration that require extra where clause: c_file -> c_invoice
On Delete Null
Sometimes a record that references a purged record cannot be removed, and instead the column that references the purged record must be nullified. This can be useful, for instance, if the purge script removes orders from 2017, but should not remove the returns done in 2018. In that case, the reference to the purged m_inoutline_id should be set to NULL, instead of removing the line of the return ticket.
Purge script output
As a result of executing the purge script, apart from the removal or the purged rows the following output:
CSV files to identify the purged data
The script will generate files for each purge table, listing the ids of the purged rows. These files will be stored in a predefined folder, that must be given the process permissions so that the postgresql user can place files there.
Additionally, it should be possible to include additional columns in the output file (for instance, it would be useful to include that path of the purged entries from c_file). A configuration table will be provided to define the additional columns to include per purged table.
Summary information of the purged data
The previous output will allow to know in detail whata entries were removed in each purged table. But it will be useful also to include a summary of the purged data, and provide the number of rows removed per table.
Dry run mode
The script should provide a dry-run mode. This will allow to execute the script and analyze the data that would have been removed, without actually removing any data from the database.
Another way to achieve the same, without using the dry-run mode, is to execute the script in a transaction, analyze the relevant data, and rollback the transaction.
No local changes
After executing the purge script there should be no local changes in the database model. The script must make sure it drops all the database components (functions, tables, etc) that it needs to be executed