Monitoring and Managing Synchronization Errors
In an synchronized multi-tier architecture that allows database modifications on all ends, synchronization errors are bound to happen.
This document describes the tools available to monitor the synchronization errors, as well as the processes available to manage them.
In addition to the synchronization error you can also view the ongoing Symmetric DS operation through several incoming and outgoing batch and data windows.
Monitoring Synchronization Errors from the Central Server
In general all synchronization errors can be monitored from the central server. There are 2 data flows/directions in the Openbravo multi-store-server environment:
- from the Central Server to the Store Servers
- from the Store Servers to the Central Server
Most replication will be from the central to the store servers. For both directions you can view/monitor the data flow and potential errors from the central server.
In case of errors they are shown in 2 ways:
- Synchronization Error window: covering the data flow from the Central Server to the Store Servers
- Incoming Batch Window and Incoming Errors window: covering the data flow from the Store Servers to the Central Server
So with the provided windows Openbravo provides a complete view on all the data flows within the Openbravo multi-server architecture.
Synchronization Errors will halt further replication, restart synchronization
The central server will retry a replication several times. If the failure persists then the replication will be halted. This means that even if the synchronization error is resolved the replication needs to be restarted explicitly.
The registered server window shows if the replication is halted for a store server. This window allows stopping and restarting the replication.
Managing Errors When a Store Server Is Registered
The first time a store server is started, it will try to register itself in the central server. Data will not be synchronized between the central and the store server until the store server is registered.
There are two ways to know if the a store server has been registered:
- In the Registered Server window in the central server, or
- Running this query in the central server
SELECT node_id FROM sym_node;
If the store server is registered, then its mobile server key will be included in the list returned by the query.
These are the most common causes of a store server not registering properly:
The store server has not been explicitely given registration access
In order to be allowed registration, the Can Be Registered flag of the store server must be checked in the Mobile Server window of the Central Server. If it is not checked, the store server will not be allowed registration and the following messages will be shown in its log:
50151 [main] INFO org.jumpmind.symmetric.service.impl.RegistrationService - This node is unregistered. It will attempt to register using the registration.url 50155 [main] INFO org.jumpmind.symmetric.service.impl.DataLoaderService - Using registration URL of http://localhost:8070/openbravo/org.openbravo.replication.symmetricds/sync/registration?nodeGroupId=StoreServer&externalId=Store1&syncURL=http%3A%2F%2Flocalhost%3A8090%2Fopenbravo-store%2Forg.openbravo.replication.symmetricds%2Fsync&schemaVersion=%3F&databaseType=PostgreSQL&databaseVersion=9.4&symmetricVersion=development&hostName=por0942&ipAddress=192.168.1.104 50167 [main] WARN org.jumpmind.symmetric.service.impl.DataLoaderService - Registration attempt failed. Registration was not open
These messages would be shown in the log of the central server:
123066 [http-8070-2] WARN org.jumpmind.symmetric.service.impl.RegistrationService - Registration not allowed for StoreServer:Store1:? because The server with search key Store1 was not allowed to register, its registration has not been opened in the Mobile Servers window 123067 [http-8070-2] WARN org.jumpmind.symmetric.web.RegistrationUriHandler - StoreServer:Store1:? was not allowed to register.
To fix it, just open the ERP in the central server, open the Mobile Server window, select the entry that belongs to the store server that is trying to register and check the Can Be Registered flag. The next time the store server tries to register (for instance when its tomcat is restarted), it will be allowed to register.
The url of the servers are not properly set
If the URLs of the servers are not properly set in the Mobile Server window, the store server will not be able to register in the store servers. For instance, if the URL of the central server were not properly defined, an error like this would be shown in the log of the store server when it tries to register:
19511 [main] INFO org.jumpmind.symmetric.service.impl.RegistrationService - This node is unregistered. It will attempt to register using the registration.url 19526 [main] INFO org.jumpmind.symmetric.service.impl.DataLoaderService - Using registration URL of WRONG_URL/org.openbravo.replication.symmetricds/sync/registration?nodeGroupId=StoreServer&externalId=Store1&syncURL=http%3A%2F%2Flocalhost%3A8090%2Fopenbravo-store%2Forg.openbravo.replication.symmetricds%2Fsync&schemaVersion=%3F&databaseType=PostgreSQL&databaseVersion=9.4&symmetricVersion=development&hostName=por0942&ipAddress=192.168.1.104 22671 [main] WARN org.jumpmind.symmetric.service.impl.DataLoaderService - Could not connect to the transport because the host was unknown: WRONG_URL
If the URLs where not properly defined only in the store server, fix them and restart tomcat in the store server. If they where not properly defined in the central server, the current recommendation is to uninstall symmetricds from both the servers, fix the URLs, and start from scratch.
The Synchronization Error window
The Synchronization Error window lists all the current outgoing batches that have not been properly imported in the store servers due to synchronization errors.
The following information is shown per synchronization error:
- Node ID: The mobile.server.key of the store server that could not import the batch
- Channel: The channel used to synchronize the data
- DB Table Name: The table where the data couldn't be imported
- Event Type: The event type that triggered the synchronization
- SQL Error Message: The SQL error message obtained when trying to import the data in the store server
- New Data: The data that could not be imported in the store server
- Old Data: The contents of the inserted/updated/deleted row before the modification that triggered the synchronization event.
The window contains a subtab called Data Entries in Batch that lists all the data entries included in the outgoing batch that could not be imported. This information contained in this tab can be useful to decide how to manage the synchronization error.
How to Manage Synchronization Errors
The Synchronization Error window offers two process to manage synchronization errors. In addition in some cases the synchronization error can be resolved by reloading any missing record directly to the store server.
Note: after resolving the synchronization error it is needed to restart the synchronization process for the store server through the Registered Server window.
Ignore Whole Batch
This process flags the outgoing batch as 'Completed' in the central server, so it will no longer be sent to the store server. Note that none of the data entries contained in the batch (shown in the Data Entries in Batch subtab) will be synchronized in the store server. If the whole batch can't be ignored, the following process may be useful:
Ignore Data in Batch
This process will ignore only the data entry that caused the synchronized error. The outgoing batch will be rebuilt with the remaining data entries and sent to the store server.
Usually ignoring the problem (may it be the whole batch or just part of it) does not address its root cause. Most of the times the synchronization errors are caused by problems in the configuration of the segmentation. Try to understand what happened, see why the given record could not be synchronized in the target database (the table name and the error message should come in handy). If the problem is caused by a segmentation error, fix it and then uninstall SymmetricDS from both servers and start from scratch.
A synchronization error can be caused by a missing record in the store server. The missing record can be directly replicated to the store server through the reload remote table window.
Note: set the where clause to select the correct record to load into the store server.
OB also provides several views from the receiving (i.e. incoming) side on any errors which occur when on the receiving side. For this two windows are of importance:
- Symmetric DS Incoming Batches window
- Symmetric DS Incoming Errors window
Errors on the receiving side are visualized/shown in several ways:
- on the receiving (target) side the incoming batch window will show the incoming batch record in error status
- on the receiving side an incoming error record is created with the details of the error
- on the outgoing (source) side the related outgoing batch will show an error status
The incoming and outgoing side will use the same batch id, so this makes it possible to analyze the cause from both sides and linking the right information.
The screenshot below shows an example of an incoming batch error:
The incoming batch window shows all the information from the Symmetric DS Incoming_Batch table.
The Symmetric DS Incoming Errors then provides additional details. This window shows detailed error information providing the actual record update/delete/insert which causes the issue in addition to the sql error message.
The window shows all the information from the Symmetric DS Incoming_Errors table.
To analyze incoming errors the following steps can be taken:
- Receiving/Target side: In the incoming batch window check the sql error message, note the node id which denotes the store which is sending the data and note the batch id
- Receiving/Target side: In the incoming error window you can see the exact record with its data which causes the error
- Sending/Source side: visit the store server and open the Synchronization Error window.
Depending on the error different actions should be taken. For example if it is a foreign key exception (data was not replicated to the central server from the store server) then the approach can be to ignore this batch, touch the missing record (which is the missing foreign key) again (by doing a simple manual change to it) and then touch the previously replicated again.