View source | Discuss this page | Page history | Printable version   

Big DB/Technical Specifications



Final objective for this project is to create a database with volume enough to ensure projects correctly perform and are scalable.


After discussing with several people in the team, the decision in the approach to take is as follows:

Big DB project decision flow.png

Big DB vs Huge DB

First discussed topic was about DB size. The 2 options here were a Huge DB with millions of records on the transactional tables versus just a Big DB with 100Ks records.

Having a Huge DB, trying to create a DB containing at least the maximum number of records we foresee for each different use case has the advantage of being able to early detect potential performance problems.

On the other hand, this approach has some disadvantages:

A Just Big DB, makes possible/easier to deploy to development environments, where it would be used to test projects in the development stages. This will allow to discover potential issues earlier.

If the Big DB is big enough, it should be sufficient to detect the problems. Technically we need to ensure transactional tables are never sequentially read.

Because of this we think the best approach is a Big DB.

Real Customer Data vs Generated Data

Next decision was whether selecting some real customer data and obfuscating it to guaranty sensitive information not to be deployed, would be correct.

The advantage is this approach provides a real case that at least a real life customer has. The problem is, due to the heterogeneous plethora of different use cases, this approach would focus developments to solve concrete problems for this customer whether other cases could just be not detected.

On the other hand, if we are able to think on the different common scenarios, we should be able to generate dummy data covering them without focusing on a single case.

The selected approach is Generated Data.

From Scratch vs Cloning Templates

Once we have decided we need to generate our own data, next discussion is whether this data can be completely generated from scratch, or it based on same "templates".

We think the quality of data generated based on cloning templates will be better. The idea is to create a set of sample template data, this set will consist on the cases we detect as significant for each entity. These templates will be cloned to generate the actual DB data.

Metadata vs Process

Cloning existent templates requires of some logic (ex. change organization...). Definition of transformations can be done in AD or through code.

Due the potential complexity of these transformations we think the best approach is to code them in a Java Process.



Processes are defined in AD (within a module), they define:

Processes are defined within a tree with a single node (client?), this tree determines the dependencies for process execution. Ex. Client should be root node and Organization a child of it, this would mean Organization is required to be executed after Client because it is a dependency of it.

As child of these processes templates are defined. A template is a record of the entity defined as base.

?? Technical difficulties:


Whenever a template is cloned, a relationship template <- clone is created (in-memory/persisted/both??).

Child processes can use these relationships to create its own clones. It should work something like:

There will be some base clone functionality which is able to clone a template following some basic rules. Processes can make use of this functionality overwriting some properties for the cloned object to apply its specific rules.

Processes should work with DAL, in this way, at least defaults, are properly calculated in newer versions of pi.

Processes should be extendable by other modules, so it is possible for example to create a Base Big DB Generator and based on this a Retail Big DB Generator. Retail might modify some processes (i.e. Sales Order) to include its own particularities.

When implementing these processes performance needs also to be taken into account, specially memory management.

While these processes are executed triggers are disabled, so for example a completed sales order is cloned as completed without need of execute the Order Post process afterwards.

Retrieved from ""

This page has been accessed 1,492 times. This page was last modified on 24 April 2014, at 09:09. Content is available under Creative Commons Attribution-ShareAlike 2.5 Spain License.