View source | Discuss this page | Page history | Printable version   

Finding Perfomance Issues

Contents

Overview

This document targets people with a technological background, but not necessary developers.

Many times it is easy to notice the application has performance issues because it behaves slower than usual or a process takes to finish longer than it should. But usually, it is not so easy to identify where the problem is. This article aims to be a guide to help to find the pieces causing these issues.

This article guides on finding the problem, but the fix for it is out of scope of the document. Even though all information gathered to reach the point of knowing where the issue, can be really helpful to find a proper solution.

All the issues this article discusses about are in server side, so in case the performance problems are in net or in client side it will not be helpful.

Finding the issue

When a performance issue is detected, these are the steps to take in order to find what is causing it.

MonitorJava MonitorDB MonitorThread dumpProfileThread dumpFind performance issues flow.png

Usually, there are more than one performance problem. In this case this flow should be executed several times to find and fix each of them, starting with the biggest ones.

Monitor

Openbravo is a web application that is served from a web container (Tomcat), most of the business logic is implemented in java, it stores data in a database (PostgreSQL or Oracle). Backend performance problems can be in any of these two sides: Java or Database (DB).

The first thing to do is to identify whether the problem is in DB or Java. A first approach is to understand where the system resources are consumed in.

top is a Unix utility that provides information about which are the processes in execution and how much CPU and memory they use.

This is the output it generates:

Top.png

The important columns are:


Depending on the outcome of this monitoring, next step is to focus on Java or DB.

Java Monitor

The next step, once it is known the problem is in Java side, is to determine which is the process that is causing performance downgrade.

In some occasions the culprit is directly known, in case the rest of the application is acceptably working but:

In any of this two cases, it is possible to directly profile the bad-perfomant process.

When there is a high CPU consumption, or the application overall performance degrades over time, but it is not know what is causing it, it is possible to do some monitoring on the application.

There are different approaches to do so: using some tool that facilitates the task or using through command line the tools JDK provides for it.

VisualVM

VisualVM 1.3.4 058.png

VisualVM is a visual tool that integrates several commandline JDK tools and lightweight profiling capabilities. There are many other similar tools for this purpose, we are focusing in this one because it is included within JDK. Note the purpose of this document is to give an overview of the tools, for a comprehensive guide, please visit the site of each of them.

This tool facilitates monitoring and profiling and it can be used in both, development and production environments because the overhead it adds to the application is small.

Configuration

When running the application in a development environment, or in a server capable to display graphics, VisualVM can be run locally. To attach the application to profile you only need to select it in the Applications tab.

To attach to an application running in a remote server, there are different approaches. One of them is:

What to look at

Usually, the problems can come from two different ways: memory or CPU consumption. Depending on which of this two cases the problem resides in, the action to be taken is different.

If using VisualVM (or any other graphical tool), you can see in the monitor tab graphs for CPU and Memory consumption.

When memory issues are detected, the recommended step is to take a Memory dump.

In case of not having a visual tool to do this analysis, it can be done in command line. The problem is it is not possible to determine how much of CPU consumed by java process is spent in GC and how much in actual process execution.

It is possible to know which the stack of current processes executing jstack PID, where PID is the one shown in top. Note this command must be executed with the same user that launched the java process. In case this user cannot log in, root user can execute su -s /bin/sh user -c 'jstack PID'. Note this gives just a snapshot of current processes, but not an idea of how much is consumed by each of them.

To determine which is the current heap usage the following command can be used jmap -heap PID. This command requires to be executed as root.

Thread dump

A thread dump provides information about stacks of all process in execution at certain moment, but it does not highlight how CPU usage is shared among them nor how long each of them has been in execution. It can help in case profiling is not possible in production server. A way of trying to identify the slow parts is to do several dumps is short periods of time (one each second for example), to compare them looking for methods appearing in all of them, these methods are probably taking long to finish, so they are candidates for being causing performance issues.

Thread dumps are generated quickly so taking them in a production instance does not damage performance.

They can be obtained with VisualVM, in Threads tab by clicking on Thread Dump button.

To get them from command line use jstack -F -m -l PID > filename where filename is the name of the file that will contain the dump.

Profile

VisualVM, as many other tools, allows to do sampling on CPU usage (from Sampler tab), when sampling is active, it periodically takes thread dumps and based on this info, it is able to proximatly determine time spent on each method. After some time of sampling, it is possible to take an snapshot that facilitates navigation through threads and methods. In this way can be known where the CPU resources are spent during some lapse of time.

Memory dump

When the problems are caused by an excessive allocation of memory, the best way to analyze it, is creating a memory dump. This dump contains a snapshot of heap memory in a concrete moment. Note taking a dump is expensive in terms of time (it can last up to 2 minutes) and when it is in progress, jvm is frozen, so application cannot be used. Therefore, it must be used cautiously in production environments, but it is highly recommendable to take one in case of the application becoming unusable because of heap problems.

Getting the dump

It can be obtained from VisualVM from Monitor tab, Heap Dump button. Note by default VisualVM opens the dump just after creating it, this is not necessary, once created you can save it (left panel, right click on the new snapshot and Save as) and close VisualVM, as opening it also requires much memory as well as some time.

It can also be taken in command line with the following instruction: jmap -dump:format=b,file=filename.hprof PID (note there is not a blankspace after format=b,).

The size of the file is similar to the actual size used in that moment by heap. In case of wanting to transfer this file to be analyzed in another computer, it is recommendable to compress it.

Similarly to other commands explained above only the user that owns the java process or root can get the dump. In some cases, when dumping with root, -F parameter is required, note -F takes much more time to be executed and does not include thread stacks.

Depending on Openbravo installation, user that executes tomcat can vary:

$ sudo su - openbravo
$ jmap -dump:format=b,file=filename.hprof PID
$ su -
# su - tomcat -s /bin/bash -c "jmap -dump:format=b,file=filename.hprof PID"

Analyzing it

Mat.png

A very good tool to analyze a memory dump is Eclipse Memory Analyzer (MAT).

When opening an existent dump (File > Open Heap Dump), mat processes it to collect its information, this process takes some minutes to be completed (depending on the size of the dump) and it requires at least the same amount of heap the dump uses. Maximum size for mat is defined in MemoryAnalizer.ini file. Next times the dump is opened, this process is not required so it is fast and doesn't require so much memory.

The following sections briefly explain some of the most useful features mat has.

Overview

This first windows shows a diagram that in a single shot visualizes the biggest objects in memory. In case there are a few objects consuming most of the memory, they are clear candidates to be doing something wrong.

Dominator tree

Dominator tree allows to easily identify the biggest chunks of retained memory and the keep-alive dependencies among objects. In this tree view, parent objects dominate their children, meaning that if parent was released, all its children could be collected by garbage collector.

It is important to differentiate between shallow and retained heap. Shallow heap is the memory directly consumed by one object, retained heap is the sum of retained heap for the object and all its children in the dominator tree, this is, all the memory that would be removed by GC when the object is garbage collected.

Thread stacks
Bulbgraph.png   Memory dumps obtained using jmap with -F parameter do not contain thread stacks information
Mat-thread-stack.png

Memory dumps can also contain information about threads! This is very useful to understand which process is generating the big object shown in the dominator tree.

Thread stack view can be opened from Mat-qry-brwsr.png Open Query Browser > Java Basics > Thread Stacks. It displays a stack trace of all threads that were running when the dump was taken, with a sum of the retained heap of each of them. Going down in the stack, for each method it is also shown the values of the variables.

Let's go through a very common example (see image in the right). We can see:

  1. there is a thread retaining 1.5GB
  2. it is performing a DAL flush. Flushes can be programatically invoked, but they are also automatically called at the end of each request, the time and resources they consume depend on how big is the memory used in that moment by DAL,
  3. this case is latter one: it is invoked on request finalization, as can be seen because DalRequestFilter.doFilter method is in the thread.
  4. Reached this point, we want to know which servlet and what paramters it received. DalRequestFilter.doFilter method has a local variable of RequestFacade class containing the Request object
  5. this object has a requestDispatcherPath object within a strValue variable that contains the path (URL) of the servlet, its value can be seen by selecting it in left panel, Value tab. In this case it is /org.openbravo.service.datasource/Invoice, this means the thread retaining 1.5GB is a call to Invoice datasource, but which are the parameters it received to become so big?
  6. request object has also the answer to that question, postData variable in it contains all the parameters passed in the post request to the Servlet. Selecting it, left panel will show the contents.

As mentioned above, this case is quite typical: a process working with DAL is doing a bad memory usage. Data Access Layer Performance article explains how DAL processes can manage it better.

DB Monitor

When problems are in database side, it is needed to understand which queries are performing incorrectly. Actions to be taken to determine it depend on the database in use.

PostgreSQL

Logging slow queries

PostgreSQL allows to display in log all queries, including values for its parameters, taking longer than a minimum time. This is very useful to understand which are the slower queries, how much time they take to be executed and how often they are used.

This is defined with the log_min_duration_statement parameter in postgresql.conf file (location of this file varies on the system installation). The value assigned to this parameter is an integer number indicating the minimum time in milliseconds to log queries taking more than this time. The amount of time to log depends on the system, but 1000 to log queries longer than 1 second is a good rule of thumb to start with. Also note, it logs "wall time", not CPU time, so in case of a lock affecting a query, this query will appear as slow even in other circumstances it wouldn't have any problem and the query to fix might be the one causing the lock.

After modifying this value, it is not required to restart postgresql. Reloading it is enough, so it can be modified without affecting users in the application. Depending on system, reload can be done executing /etc/init.d/postgresql reload

Log file location also depends on configuration. It typically, can be found in /var/log/postgresql directory.

Knowing what is going on now

pg_stat_activity view contains a row per process connected to PostgreSQL, displaying queries that are currently in execution, the time they were started, and if they are waiting for some resource to be unlocked.

Querying this view is useful in case of high CPU consumption by postgres process to know which are the querying taking this CPU time.

How a query is executed

Once an slow query has been detected it is possible to know how it is executed, for this purpose explain command is used.

Note this section is not intended to give detailed information about explain usage but just some hints, for more information visit PostgreSQL documentation.

Explain can help to understand if a query should be refactored to be improved or if an index would be useful.

Let's see a simple example:

 
EXPLAIN UPDATE c_bpartner  SET name ='aa' WHERE c_bpartner_id='100';
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 INDEX Scan USING c_bpartner_idsalesrep ON c_bpartner  (cost=0.00..8.34 rows=1 width=3224)
   INDEX Cond: ((c_bpartner_id)::text = '100'::text)
(2 rows)

In this case, we are updating a row in c_bpartner table filtering by its primary key (PK), as PKs define an index by default this index is used to find the row to update, and finally the cost of the query is small (8.34).

In the following example:

 
EXPLAIN UPDATE c_bpartner SET name ='--'||name WHERE isactive='N';
                           QUERY PLAN                           
----------------------------------------------------------------
 Seq Scan ON c_bpartner  (cost=0.00..8757.82 rows=1 width=3237)
   Filter: (isactive = 'N'::bpchar)
(2 rows)

All rows inactive rows are updated, cost is much bigger (8757.82) because a sequential scan is done to know which rows should be updated. Sequential scan implies to loop over all records in the table whereas index scan is able to directly go to the required records.

Now let's try to improve this figures:

 
CREATE INDEX em_mymod_bp_active_idx
   ON c_bpartner (isactive ASC NULLS LAST);
CREATE INDEX
 
EXPLAIN UPDATE c_bpartner SET name ='--'||name WHERE isactive='N';
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 INDEX Scan USING em_mymod_bp_active_idx ON c_bpartner  (cost=0.00..8.29 rows=1 width=3237)
   INDEX Cond: (isactive = 'N'::bpchar)
(2 rows)

Creating an index for isactive column makes a difference, now it is not sequentially scanning the table but it uses the index reducing cost from 8757.82 to 8.29.

Just a couple of notes about indexes:

Another way to look for missing indexes

As seen in previous section, indexes prevents sequential scans, which are very harmful in tables with big amounts of data. It is possible to know how many times a table has been secuentially scanned. Even this does not directly tell where the index should be included, it is giving a hint that one (or more) indexes would reduce these sequential scans.

To see for each table how many times it has been read and how these reads have been done since last time statistics were initialized. This query can be executed:

 
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch 
  FROM pg_stat_user_tables 
 ORDER BY seq_tup_read DESC;

It shows all tables, ordered descendetly by the number of rows read in sequential scans.

Statistics are reinitialized to 0 with:

 
SELECT pg_stat_reset();
Bulbgraph.png   After running a reset of the statistics a VACUUM ANALYZE should be run in the instance, otherwise, the detection of the launch of AUTOVACUUM will get affected

Knowing a process to be slow, it is useful to reset statistics and execute it to see which tables and how it queries.

More information about statistics can be found in PosgreSQL documentation.

Tracking Functions

Tracking functions allows to know number of times each PL function has been invoked and how much time they consumed.

 
  SELECT * 
    FROM pg_stat_user_functions 
   ORDER BY self_time DESC

To track this information track_functions property needs to be enabled in postgresql.conf.

Tracking SQL Statements

It is also possible to track SQL statements.

This tracking can optionally track the statements executed within PL functions and triggers, giving in this way information about which parts inside them are slow. For example executing:

 
  INSERT INTO c_order ...

would show several statements: one for the insert and also all the ones executed by triggers associated to c_order table.

To configure this:

shared_preload_libraries = '$libdir/pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all

Once it is active statistics can be seen with:

 
  SELECT * 
    FROM pg_stat_statements 
   ORDER BY total_time DESC

These statistics are reset separately from the rest by executing:

 
  SELECT pg_stat_statements_reset()

For more information visit PostgreSQL documentation

Oracle

Open cursors

When a PreparedStatemnt is executed (usually, using directly jdbc without sqlc nor DAL) and it is not closed an open cursor is kept open. This has two problems, one is the maximum number of open cursors can be reached causing errors when trying to open more, the other is the memory used for them is leaked as it never can be collected.

A query to get open cursors:

 
SELECT  sql_text, COUNT(*)
  FROM v$open_cursor
 GROUP BY sql_text
 ORDER BY COUNT(*) DESC;

It shows each query that has opened a cursor that has not been already closed and for each of them how many open cursors has. Note that having open cursors while the application is running is normal. What is important here are the numbers of them and to see if they are kept and never closed.

Detecting open cursors and knowing the query helps to look for the processes executing those queries.

Retrieved from "http://wiki.openbravo.com/wiki/Finding_Perfomance_Issues"

This page has been accessed 19,985 times. This page was last modified on 7 June 2017, at 08:40. Content is available under Creative Commons Attribution-ShareAlike 2.5 Spain License.