Projects:Support Indexes To Speed Up Contains Search/Specs
The main goal of this project is to provide a mechanism that helps to reduce the time spent in those database searches that look for a particular sub-string in a text column.
Examples of these kind of queries are:
-- CONTAINS SEARCH SELECT value FROM c_bpartner WHERE name LIKE '%ali%'
-- ICONTAINS SEARCH (case insensitive) SELECT value FROM c_bpartner UPPER(NAME) LIKE UPPER('%ali%')
The technical requirement behind this project is to include into DBSourceManager the ability to import and export indexes that helps to speed up those database queries which make use of the contains (LIKE/ILIKE) operators.
Index Model Implementation
The indexes intended for fast searching in contains queries will be included as part of the XML model of the database.
Unlike function based indexes and partial indexes there will not be any implementation detail (operator class, where clause) present in the XML definition. Instead, a new attribute called containsSearch will be included in the index XML definition as a flag for this kind of indexes.
The containsSearch property will only be included within the XML definition of the index when it is true. Therefore, it will be displayed just for the new indexes created for this purpose and it will not affect to the definition of the existing indexes.
The implementation of this kind of indexes depends on the platform (Oracle or PostgresSQL) and therefore it is going to be different on each case.
To implement indexes for contains search indexes the pg_trgm extension has been chosen. It provides functions and operators for determining the similarity of ASCII alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.
The main reasons for choosing this extension are:
- It is recommended when you want to speed up LIKE, ILIKE, ~ or ~*.
- It is a feature included as an official extension of PostgresSQL.
- It will be very feasible to integrate with the current Openbravo infrastructure.
In order to include this extension as part of the distribution we will need to:
- Create the extension using the PostgreSQL pre-script.
- Exclude from the model the functions installed with the extension by using an exclude filter.
The pg_trgm extension provides two different index access methods: GIN and GIST. Based on the statistics extracted with some performance tests with high-volumes (see here) we are choosing GIN as the access method for our contains search indexes.
For this reason developers will be able to define an index as follows:
CREATE INDEX words_idx ON words USING gin(word gin_trgm_ops);
And DBSourceManager should be able to export and import it automatically as a contains search index.
With this kind of indexes we will be able to improve the searching on queries that use the LIKE operator (contains).
In order to improve case insensitive searching (icontains), it will be possible to define function based indexes as contains indexes. Thus, we will be able to combine the indexing of the UPPER()/LOWER() expressions together with the usage of the GIN access method with its particular operator class.
The alternative in Oracle is called Oracle Text. It provides three different types of indexes: CTXCAT, CONTEXT AND CTXRULE. To use them, it is necessary to grant access to the CTX_DDL package.
The index type CTXCAT would be the closest alternative to pg_trgm, but the main problem is that it forces to use the CATSEARCH function instead of the standard LIKE operator. Thus, a CTXCAT based index will not be used by the Openbravo queries transparently, and we would need to change them (including DAL generated ones).
The effort to change the mentioned queries implies a new project by itself, for this reason the Oracle alternative will not be implemented for the moment.
So, this kind of indexes will not be implemented yet in Oracle but still we need to store their definition somehow because otherwise when the index is exported from Oracle to the XML files it would lose the containsSearch property information.
To solve this problem, we will store this information into a table comment with the following format:
The reason of choosing table comments instead of column comments is because this kind of indexes can use virtual columns (function based) and therefore it does not exists a physical column in the database that we can refer to. This is the very same solution that we are using with function based indexes.
- As mentioned before, indexes for contains searching in Oracle are currently not supported: they will be created as a regular index in Oracle.
- Due to the existing limitation present in Oracle which prevents to index more than once the same set of columns, if we define two different indexes (one of them flagged with the containsSearch property as true) for the same column, the model will be valid in PostgresSQL but not in Oracle. This is a known issue reported here.