Projects:Table Recreation Prevention/Technical Specifications
Contents |
Introduction
There are some tasks, such as installing/uninstalling modules and updating Openbravo or module version, that can require of changes in database model. Those changes are performed by DBSourceManager in update.dabase
task.
When these changes include some modifications (like adding/removing columns) in table model, the strategy used by DBSourceManager is two recreate the whole table, this recreation process involves the following steps:
- A new temporary table is created with the same structure the original had
- Data from original table is moved to new temporary one
- Original table is dropped
- Table is created with the new model
- Data is moved from temporary table to new one
- Temporary table is dropped
- Indexes, constraints and FKs are created in new table
Executing this process has some disadvantages:
- It can be slow, the bigger the amount of data is the slower it becomes. Note it is not only about moving data but it also requires creation of new indexes, constraints, etc, which can be expensive.
- Because we finally have a new table, statistics need to be recomputed. Queries performed while statistics are not up to date can result in suboptimal execution plans.
- ...
The aim of this project is to reduce the cases where tables require to be recreated, improving in this way performance in the update process.
Current state (in 3.0PR15Q1)
The recreation strategy depends on the action and the DB Platform. In 3.0PR15Q1, this is the state:
PostgreSQL | Oracle | |
---|---|---|
Append non mandatory column | recreates | does not recreate |
Append mandatory column | recreates | recreates |
Add any column between others | recreates | recreates |
Add constraints, indexes, FKs... | does not recreate | does not recreate |
Data type changes (different size...) | recreates | recreates |
Drop column | recreates | does not recreate |
Drop constraints, indexes, FKs... | does not recreate | does not recreate |
Technical requirements
The main goal of this project is to prevent table recreation in, at last, the most common situations with the motivation explained in the first section of this document.
Dropping existent columns
Though it is not a very common case, dropping columns is correctly implemented in Oracle and it should also be implemented in PostgreSQL. This is a trivial case to implement where the expected improvement when it occurs is huge because dropping a column, even in tables with big volumes is matter of milliseconds.
Adding new columns
Non mandatory without default value
This is a similar case to dropping, where it is already correctly implemented in Oracle but not in PostgreSQL, and here the improvement is equivalent as described for dropping.
Mandatory and non-mandatory with default value
These cases will require to be measured after implementing to determine how non recreation strategy performs compared to recreation.
Inserting new columns before existent ones
This is a common case when an external module modifies structure of a table delivered by another module. Currently, the columns defined in the module owning the table are physically placed in first position and columns added by external modules are appended grouped by module.
Example:
- Module A 1.0.0
- defines table TA with columns
- CA1
- CA2
- defines table TA with columns
- Module B 1.0.0 (depends on A)
- adds column to table TA
- EM_B_B1
- adds column to table TA
- Module A 1.0.1
- adds a new column to TA
- CA4
- adds a new column to TA
In an instance with the following actions:
- Install A 1.0.0
- Table A has columns CA1, CA2
- Install B 1.0.0
- Table A has columns CA1, CA2, EM_B_B1
- Update A to 1.0.1
- Table A has columns CA1, CA2, CA4, EM_B_B1
Table structure modification to move from CA1, CA2, EM_B_B1
to CA1, CA2, CA4, EM_B_B1
is currently not supported by PostgreSQL nor Oracle.
The approach to follow is not to try to preserve physical column position. So in this example the table would be finally CA1, CA2, EM_B_B1, CA4
. In this manner old column additions are appended.
Performance requirements
This project has two goals:
- Improve, at least in some cases, performance when adding new columns.
- Module scripts are executed in an stable DB where table recreation didn't occur so:
- Foreign Key are the same as they were, this has a functional impact, as currently, if table recreation occurs, module scripts are executed without FKs, if one of these FKs is "on delete cascade" and the script relies on this to do deletions, its behavior can differ depending on the tables that were recreted. Note that appart of the FKs in the recreated tables, also any FK pointing to any recreated table is also dropped.
- Additionally, it has performance impact because at module script execution, statistics for recreated tables might not be recreated yet.
Because of all of this, it is expected to have a big performance improvement when adding new non mandatory columns without default value, this is specially noticeable the bigger the table is. In other cases the requirement is to have, at least, the same performance as before the project.
Limitations
Table recreation will continue in some cases:
- Column data type change
- Column size change
![]() | Some of these limitations will be solved in 3.0PR17Q2 with DBSM Performance Improvements project. |