Big DB/Technical Specifications
Contents |
Introduction
Final objective for this project is to create a database with volume enough to ensure projects correctly perform and are scalable.
Approach
After discussing with several people in the team, the decision in the approach to take is as follows:
Big DB vs Huge DB
First discussed topic was about DB size. The 2 options here were a Huge DB with millions of records on the transactional tables versus just a Big DB with 100Ks records.
Having a Huge DB, trying to create a DB containing at least the maximum number of records we foresee for each different use case has the advantage of being able to early detect potential performance problems.
On the other hand, this approach has some disadvantages:
- It is more difficult to maintain.
- It makes very difficult/impossible to deploy it in development environments. It would consist on a single Huge DB instance used by CI to ensure projects correctly perform.
A Just Big DB, makes possible/easier to deploy to development environments, where it would be used to test projects in the development stages. This will allow to discover potential issues earlier.
If the Big DB is big enough, it should be sufficient to detect the problems. Technically we need to ensure transactional tables are never sequentially read.
Because of this we think the best approach is a Big DB.
Real Customer Data vs Generated Data
Next decision was whether selecting some real customer data and obfuscating it to guaranty sensitive information not to be deployed, would be correct.
The advantage is this approach provides a real case that at least a real life customer has. The problem is, due to the heterogeneous plethora of different use cases, this approach would focus developments to solve concrete problems for this customer whether other cases could just be not detected.
On the other hand, if we are able to think on the different common scenarios, we should be able to generate dummy data covering them without focusing on a single case.
The selected approach is Generated Data.
From Scratch vs Cloning Templates
Once we have decided we need to generate our own data, next discussion is whether this data can be completely generated from scratch, or it based on same "templates".
We think the quality of data generated based on cloning templates will be better. The idea is to create a set of sample template data, this set will consist on the cases we detect as significant for each entity. These templates will be cloned to generate the actual DB data.
Metadata vs Process
Cloning existent templates requires of some logic (ex. change organization...). Definition of transformations can be done in AD or through code.
Due the potential complexity of these transformations we think the best approach is to code them in a Java Process.
Process
Definition
Processes are defined in AD (within a module), they define:
- Name
- Qualifier: a String which will be used by Java as Qualifier
- Magnitude (?): it is a factor to determine number of clones to be created for each template. ??? Should a subprocess create this magnitude for each of the times the parent process was executed.
- Base entity: it defined the type of entity that will be used for templates (ex. Order Generator Process will define here C_Order)
Processes are defined within a tree with a single node (client?), this tree determines the dependencies for process execution. Ex. Client should be root node and Organization a child of it, this would mean Organization is required to be executed after Client because it is a dependency of it.
As child of these processes templates are defined. A template is a record of the entity defined as base.
?? Technical difficulties:
- in DB template table cannot implement FK integrity because it links to different tables
- it would be great to be able to visualize it correctly in UI and to able to create a generic selector to different entities.
Implementation
Whenever a template is cloned, a relationship template <- clone is created (in-memory/persisted/both??).
Child processes can use these relationships to create its own clones. It should work something like:
- BPa, is a template for Business Partner clone process, which creates 100 clones per template.
- SOa, is a template for Sales Order clone process, which creates 50 clones per template. This template refers to BPa. Sales Order clone process, requires to be aware of all clones created for BPa, so the SOa clones can refer to any of BPa clones.
There will be some base clone functionality which is able to clone a template following some basic rules. Processes can make use of this functionality overwriting some properties for the cloned object to apply its specific rules.
Processes should work with DAL, in this way, at least defaults, are properly calculated in newer versions of pi.
Processes should be extendable by other modules, so it is possible for example to create a Base Big DB Generator and based on this a Retail Big DB Generator. Retail might modify some processes (i.e. Sales Order) to include its own particularities.
When implementing these processes performance needs also to be taken into account, specially memory management.
While these processes are executed triggers are disabled, so for example a completed sales order is cloned as completed without need of execute the Order Post process afterwards.