View source | View content page | Page history | Printable version   

Projects:Support Partial Indexes On Nullable Columns/Specs

Contents

Functional Requirements

The main goal is to add a mechanism that helps to improve the performance of the indexes which can be added to nullable foreign key columns with a high ratio of null values.

Partial Database Indexes

A partial index is an index where it is possible to specify the rows that are indexed. This kind of indexes are useful for commonly used WHERE conditions that use constant values.

An index helps to get faster queries, but depending on the case they can waste a lot of disk space if most of the indexed data is rarely needed. With a partial index it is possible to index just the table data that is most commonly used, this way it will contain fewer rows reducing the amount of disk space used by the index.

Therefore, partial indexes provide the following advantages:

Not Null Partial Indexes On Nullable Columns

As mentioned before, the main goal of this project is to improve the performance of the indexes that are added to nullable foreign keys with a high ratio of null values.

Taking into account the advantages mentioned in the previous section, in this type of cases it seems reasonably to have a partial index to keep just those rows with NOT NULL values on those columns. Depending on the rows we want to access this type of indexes can have some drawbacks, but in general, this will provide us better performance.

For example, lets consider the following scenario where we have data distributed as follows:


Table Name Column Name Total Rows Null Count
c_order c_return_reason_id 6644690 6626194


According to this table, the 99.72% of the rows in c_order has a NULL value on the c_return_reason_id column. This means that most of the orders are barely returned.

Imagine that now we create an index for the c_return_reason_id column on the c_order table:

CREATE INDEX c_order_return_reason ON c_order (c_return_reason_id);

This index will be useful to improve queries where we want to know information about the returned orders, for example:

SELECT dateordered
FROM c_order
WHERE c_return_reason_id = '0B10695DC880488ABA24F70BB2D1442A'; -- 'Incorrect Product' return reason

The new c_order_return_reason index will fulfill its purpose, but it also will include many rows that are never searched by this type of queries, namely all the orders which are not a return from customer/vendor. We are making the query very fast even though it wastes a lot of disk space.

Now, lets suppose that we are able to create a partial index as follows:

CREATE INDEX c_order_return_reason ON c_order (c_return_reason_id)
WHERE c_return_reason_id IS NOT NULL;

In this case, the index only contains the rows that satisfy the WHERE clause, i.e., it only contains the orders that have been returned. Therefore we are achieving the following improvements:

Nevertheless, a partial index can have some drawbacks depending on the case. Following the example, our partial index is not indexing those rows with NULL value on c_return_reason_id, therefore we will have slower queries if they are as follows:

SELECT dateordered
FROM c_order
WHERE c_return_reason_id IS NULL;

This is something that must be taken into account before creating a partial index: we have to analyze which type of queries we are using most in order to decide if we may use a partial index.

Not Null Partial Indexes: Oracle vs PostgresSQL

In an Oracle database, it does not include rows in an index if all indexed columns are NULL. That means that every index is a partial index.

This is not the behavior in PostgresSQL databases, where we will need to define the index as partial.

Affected Foreign Keys

In the following sheet it can be found the nullable foreign key columns present in core which the partial indexes based on NOT NULL expression would be applicable.

Technical Requirements

The effort within this project will be intended to provide the ability of implementing partial indexes in PostgresSQL. Besides, the XML definition of this new elements should not be considered in Oracle but at the same time we must keep the information in order to avoid its losing when exporting the database.

Two different approaches have been considered to decide how partial indexes should be applied on the nullable foreign keys of the core tables: explicit and implicit.

Explicit Approach

In this approach the decision is delegated to the developer. Each developer decides whether the index should be partial or not at the time of its creation. This approach has the following benefits:

Implicit Approach

The partial indexes are created automatically during the update.database process giving us the following benefits:

If this approach is taken, it should also be considered if the indexes must be created for every nullable column (even if it already has an index) or just for those columns which do not have an index.

In the first case, the time spent for the update can grow significantly, because of the fact that some indexes would be recreated.

Further Considerations

Exporting Partial Index Information into XML

As mentioned before, Oracle does not index nullable columns, so the definition of the NOT NULL partial indexes does not apply in this case. But this definition needs to be stored somehow because otherwise when the index is exported from Oracle to the XML files it would lose the partial index information.

To solve this problem, one possible approach could be the same as the one currently used by DBSourceManager in order to keep the operator class information. This approach consists of store this kind of information into a table comment with the following format:

indexName1.indexColumn1.operatorClass=operatorClass1$indexName2.indexColumn2.operatorClass=operatorClass2$...

These comments are updated each time an index that defines an operator class is added or removed, both when it is done at the same time of the table creation or after it. This information must also not be taken into account when comparing indexes in Oracle in order to prevent unneeded index recreations.

The main limitation of this approach is that comments in Oracle has a limited size of 4000 characters. This would limit the number of partial indexes that can be created for every table.

For this reason, it should be considered the possibility of inserting information at column level, using a column comment. This way the number of indexes that could be defined per table would be high enough.

Exporting Changes on Index to a Configuration Script

It must be reviewed the ability of exporting the index definition within a Configuration Script. This way, it would be possible to revert an index definition in a custom template making the database index configuration more adaptable to the particularities of every environment.

Known Limitations

Due to the existing limitation in Oracle which prevents to index more than once the same set of columns, if we define two partial indexes for the same column but with a different where clause it will work in PostgresSQL but that XML model will not be valid in Oracle. This is a known issue reported here.

Retrieved from "http://wiki.openbravo.com/wiki/Projects:Support_Partial_Indexes_On_Nullable_Columns/Specs"

This page has been accessed 1,432 times. This page was last modified on 28 November 2016, at 17:00. Content is available under Creative Commons Attribution-ShareAlike 2.5 Spain License.