Projects:Purge Data From Database/Technical Specification
Contents |
Purge script overview
This entry will describe the technical aspects of the purge script.
Auxiliary tables
Five tables are part of the purge script. The first two are automatically populated by the purge script, while the other three are configuration tables that must be populate to configure how the purge should behave.
obpurg_temp_purg_data
Stores information about the records that have been purged, whose dependencies has not been removed yet. Let's consider this schema, and suppose that the data that will be removed is this.
If the script starts removing the cash ups, the evolution of the obpurg_temp_purg_data will be like this:
The cashups are removed, and info about the purged is included in obpurg_temp_purg_data:
table_name | record_id | deleting_dependencies |
---|---|---|
obpos_app_cashup | #1 | N |
obpos_app_cashup | #2 | N |
After this initial steps the scripts goes in a loop. While the obpurg_temp_purg_data is not empty, it picks any record, and sets the deleting_depencies flag of all the records with the same table_name to 'Y', and will start deleting its dependencies.
table_name | record_id | deleting_dependencies |
---|---|---|
obpos_app_cashup | #1 | Y |
obpos_app_cashup | #2 | Y |
The purge script removes the records that reference the purged cashups, and add them to obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
obpos_app_cashup | #1 | Y |
obpos_app_cashup | #2 | Y |
c_order | #1 | N |
c_order | #2 | N |
Once all dependencies of obpos_app_cashup has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_order | #1 | N |
c_order | #2 | N |
The next table whose dependencies will be removed is c_order:
table_name | record_id | deleting_dependencies |
---|---|---|
c_order | #1 | Y |
c_order | #2 | Y |
The purge scripts removes the records that reference the purged orders and add them to obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_order | #1 | Y |
c_order | #2 | Y |
c_orderline | #1 | N |
c_orderline | #2 | N |
c_orderline | #3 | N |
c_orderline | #4 | N |
Once all dependencies of c_order has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_orderline | #1 | N |
c_orderline | #2 | N |
c_orderline | #3 | N |
c_orderline | #4 | N |
The next table whose dependencies will be removed is c_orderline
table_name | record_id | deleting_dependencies |
---|---|---|
c_orderline | #1 | Y |
c_orderline | #2 | Y |
c_orderline | #3 | Y |
c_orderline | #4 | Y |
The purge scripts removes the records that reference the purged order lines and add them to obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_orderline | #1 | Y |
c_orderline | #2 | Y |
c_orderline | #3 | Y |
c_orderline | #4 | Y |
c_invoiceline | #1 | N |
c_invoiceline | #2 | N |
c_invoiceline | #3 | N |
c_invoiceline | #4 | N |
Once all dependencies of c_orderline has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoiceline | #1 | N |
c_invoiceline | #2 | N |
c_invoiceline | #3 | N |
c_invoiceline | #4 | N |
The next table whose dependencies will be removed is c_invoiceline:
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoiceline | #1 | Y |
c_invoiceline | #2 | Y |
c_invoiceline | #3 | Y |
c_invoiceline | #4 | Y |
Because of a custom configuration, when an invoice line is removed, its invoices are also removed. Info about the removed records is added to obpurg_temp_purg_data:
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoiceline | #1 | Y |
c_invoiceline | #2 | Y |
c_invoiceline | #3 | Y |
c_invoiceline | #4 | Y |
c_invoice | #1 | N |
c_invoice | #2 | N |
Once all dependencies of c_invoiceline has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoice | #1 | N |
c_invoice | #2 | N |
The next table whose dependencies will be removed is c_invoice:
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoice | #1 | Y |
c_invoice | #2 | Y |
Once all dependencies of c_invoice has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_invoice | #1 | Y |
c_invoice | #2 | Y |
c_file | #1 | N |
c_file | #2 | N |
Once all dependencies of c_invoice has been removed, its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|---|---|
c_file | #1 | N |
c_file | #2 | N |
The next table whose dependencies will be removed is c_file:
table_name | record_id | deleting_dependencies |
---|---|---|
c_file | #1 | Y |
c_file | #2 | Y |
There are no records that reference the purged c_files. Its records are removed from obpurg_temp_purg_data.
table_name | record_id | deleting_dependencies |
---|
When obpurg_temp_purg_data is empty, the purge script is finished.
obpurg_purge_summary
This table will keep track of how many records has been purged per table.
At the end of the purge script execution its contents will be exported to the output folder
obpurg_external_foreign_key
This configuration table must be used to let the purge script know of foreign keys that are not defined in the database catalog.
For instance, to define a virtual foreign key from c_order to obpos_app_cashup, the following row must be inserted:
INSERT INTO obpurg_external_foreign_key(source_table_name, source_column_name, target_table_name) VALUES ('c_order', 'em_obpos_app_cashup_id', 'obpos_app_cashup');
The table admits more complex virtual foreign keys. For instances, it allows to add a foreign key from c_file to c_invoice like this:
INSERT INTO obpurg_external_foreign_key(source_table_name, source_column_name, target_table_name, where_clause) VALUES ('c_file', 'ad_record_id', 'c_invoice', 'ad_table_id = ''318''');
obpurg_table_parent_definition
This configuration table must be used to let the purge script know of foreign keys that should be used to remove data referenced FROM a table that has been purged (by default the purge script automatically that that references a table that has been purged, the other way around).
For instance, to determine that when a c_invoice line is removed its c_invoice should be deleted, the following configuration is needed:
INSERTO INTO obpurg_table_parent_definition VALUES ('c_invoiceline', 'c_invoice_id', 'c_invoice_id','c_invoice');
Four columns must be filled in:
- table_name: the name of the child table
- column_name_in_child_table: the name of the column in the child table that points to the parent table
- column_name_in_parent_table: the name of the primary key column of the parent table
- parent_table_name: the name of the parent table
In most cases the name of the column in that child and parent tables will be the same, but not in all.
obpurg_on_delete_null_refs
When a record is purged, there might be that some of the records that referenced it should not be removed, but the foreign key set to null.
The foreign keys that must be set to null when its referenced record is purged must be configured in the obpurg_on_delete_null_refs, specifying the foreign key by its table name and column name:
INSERT INTO obpurg_on_delete_null_refs VALUES ('c_orderline', 'm_inoutline_id');
obpurg_extra_output_config
When the purge script is executed, it generates files with info about the records that have been removed. By default it includes the table name and the record id. The obpurg_extra_output_config table can be populated to include more information in those files.
Example:
INSERT INTO obpurg_extra_output_config VALUES ('c_file', 'path, name', 'path || ''/'' || name');
Script functions
The script is comprised of several smaller functions. This section will summarize them.
create_purge_temp_tables
Signature:
create_purge_temp_tables() RETURNS void
Creates the tables used by the purge script:
- obpurg_temp_purg_data: Keeps track of the list of purged records, whose dependencies have not been removed yet.
- obpurg_purge_summary: Keeps track of how menu records have been purged per table.
- obpurg_table_parent_definition: Configuration table to specify child-parent relationships between tables where when a record is removed from the child table, its parent row must also be purged. For instance, when an invoice line is removed, its parent invoice should also be purged.
- obpurg_external_foreign_key: Configuration table to define foreign keys to be used by the purge script, that are not defined in the catalog.
- obpurg_extra_output_config: Configuration table to define extra output per table (i.e. we use this to include the file path of the purged entries of c_file)
- obpurg_on_delete_null_refs: Configuration table to specify which foreign keys should behave like a On Delete Null foreign key, instead of the standard On Delete Cascade.
initialize_purge_summary_tables
Signature:
initialize_purge_summary_tables() RETURNS void
Populates the obpurg_purge_summary table, it creates an entry for each table initializing the n_rows_delete column to 0.
nullify_references_of_table
Signature:
nullify_references_of_table(table_name character varying, dependent_table_name character varying, dependent_column_name character varying) RETURNS void
Function invoked to nullify the references to the purged record of a table (table_name) of a given foreign key, described by its table (dependent_table_name) and column (dependent_column_name).
For instance, if an order that has been returned is purged, its return should not be removed by the purge script, but turned into a blind return. A returned order line references the original order via the c_orderline.m_inoutline_id property.
The following query will be executed to nullify those references:
UPDATE c_orderline SET m_inoutline_id = NULL FROM obpurg_temp_purg_data p WHERE m_inoutline_id = p.record_id AND p.table_name = 'm_inoutline' AND p.deleting_dependencies = 'O'
purge_dependencies_of_table
Signature:
purge_dependencies_of_table(table_name character varying, dependent_table_name character varying, dependent_column_name character varying, where_clause character varying, what_to_delete character varying) RETURNS void
This function is invoked after purging data from a table (table_name), to remove its dependencies per foreign key (determined by dependent_table_name, dependent_column_name).
For instance, when entries from c_order are purged, the records from c_orderline that reference them are removed using this query:
WITH deleted AS (DELETE FROM c_orderline USING obpurg_temp_purg_data p WHERE c_order_id = p.record_id AND p.table_name = 'c_order' AND p.deleting_dependencies = 'O' RETURNING c_orderline_id), inserted AS (INSERT INTO obpurg_temp_purg_data (obpurg_temp_purg_data_id, table_name, record_id) SELECT get_uuid(), 'c_orderline', c_orderline_id FROM deleted RETURNING 1);
Note that info about the deleted records is inserted into obpurg_temp_purg_data, so that its dependencies are removed by the purge script in a later step.
If dependent_table_name has been configured so that its parent record must automatically be deleted, the query looks a little bit different to take this into account:
WITH deleted AS (DELETE FROM c_invoiceline USING obpurg_temp_purg_data p WHERE c_orderline_id = p.record_id AND p.table_name = 'c_orderline' AND p.deleting_dependencies = 'O' RETURNING c_invoiceline_id, c_invoice_id), inserted AS (INSERT INTO obpurg_temp_purg_data (obpurg_temp_purg_data_id, table_name, record_id, parent_table_name, parent_record_id) SELECT get_uuid(), 'c_invoiceline', c_invoiceline_id, 'c_invoice', c_invoice_id FROM deleted RETURNING 1);
purge_table
purge_table_and_remove_fks
Signature:
purge_table_and_remove_fks(table_name character varying, where_clause character varying)
Same as the purge_table function, the difference being that purge_table_and_remove_fks removes the foreign keys. It intented to be used just before running update.database, to ensure that the purge script execution has not broken the referencial integrity (otherwise update.database will fail when trying to recreate the foreign key).