Troubleshooting High CPU Utilisation

High CPU Utilisation


Whilst using HANA i.e. running reports, executing queries, etc. you can see an alert in HANA Studio that the system has consumed CPU resources and the system has reached full utilisation or hangs.

Before performing any traces, please check to see if you have Transparent HugePages enabled on your system. THP should be disabled across your landscape until SAP has recommended activating them once again. Please see the relevant notes in relation to TransparentHugesPages:

HUGEPAGES 

SAP Note 1944799 - SAP HANA Guidelines for SLES Operating System Installation
SAP Note 1824819 - SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP2
SAP Note 2131662 - Transparent Huge Pages (THP) on SAP HANA Servers
SAP Note 1954788 - SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP3


The THP activity could also be checked in the runtime dumps by searching “AnonHugePages”. Whilst also checking the THP, it is also recommended to check for:

Swaptotal = ??
Swapfree = ??

This will let you know if there is a reasonable amount of memory in the system.

Next you can Check the (GAL) Global allocation limit:  (search for IPM) and check the limit and ensure it is not lower than what the process/thread in question is trying to allocate.

Usually it is evident what caused the High CPU’s. In many events it is caused by the execution of large queries or running reports from HANA Studio on models.

To be able to use the kernel profile, you must have the SAP_INTERNAL_HANA_SUPPORT role. This role is intended only for SAP HANA development support.

The kernel profile collects, for example, information about frequent and/or expensive execution paths during query processing. It is recommended that you start kernel profiler tracing immediately before you execute the statements you want to analyze and stop it immediately after they have finished. This avoids the unnecessary recording of irrelevant statements. It is also advisable as this kind of tracing can negatively impact performance.

When you stop tracing, the results are saved to trace files that you can access on the Diagnosis Files tab of the Administration editor.

You cannot analyze these files meaningfully in the SAP HANA studio, but instead must use a tool capable of reading the configured output format, that is KCacheGrind or DOT (default format).

You activate and configure the kernel profile in the Administration editor on the Trace Configuration tab. Please be aware that you will also need to have run the runtime dumps also.

The Kernel Profiler Trace results reads in conjunction from the runtime dumps to pick out the relevant Stacks and Thread numbers. To see the full information on Kernel Profiler Trace’s please see Note 1804811 or follow the steps below:

Please be aware that you will also need to execute 2-3 runtime dumps also. The Kernel Profiler Trace results reads in conjunction from the runtime dumps to pick out the relevant Stacks and Thread numbers.


To see the full information on Kernel Profiler Trace’s please see Note 1804811 or follow the steps below:
     
Kernel%20Profiler.PNG

Connect to your HANA database server as user sidadm (for example via putty) and start HDBCONS by typing command "hdbcons".
To do a Kernel Profiler Trace of your query, please follow these steps:
1. "profiler clear" - Resets all information to a clear state
2. "profiler start" - Starts collecting information.
3. Execute the affected query.
4. "profiler stop" - Stops collecting information.
5. "profiler print -o /path/on/disk/cpu.dot;/path/on/disk/wait.dot" - writes the collected information into two dot files which can be sent to SAP.


Once you have this information you will see two dot files called
1: cpu.dot
2: wait.dot.

To read these .dot files you will need to download GVEdit. You can download this at the following:

Once you open the program it will look something similar to this:

Graph%20Viz.PNG
     
      The wait.dot file can be used to analyse a situation where a process is running very slowly without any reasons In such cases, a wait graph can help to identify whether the process is waiting for an IndexHandle, I/O, Savepoint lock, etc.

So once you open the graph viz tool, please open the cpu.dot file. File > open > select the dot file > open > this will open the file:
Once you open this file you will see a screen such as

graphviz%201.PNG
     

The graph might already be open and you might not see it because it is zoomed out very large. You need to use the scroll bar (horizontal and vertical to scroll).

CPU_DOT%201.PNG

From there on it will depend on what the issue is that you are processing.
Normally you will be looking for the process/step that has the highest amount on value for
E= …
Where "E" means Exclusive
There is also:
I=…
Where "I" means Inclusive
The Exclusive is of more interest because it is the exclusive value just for that particular process or step that will indicate if more memory/CPU is used in that particular step or not. In this example case we can see that __memcmp_se44_1= I =16.399% E = 16.399%. By tracing the RED colouring we can see where most of utilisation is happening and we can trace the activity, which will lead you to the stack in the runtime dump, which will also have the thread number we are looking for

CPU_DOT%202.PNG

CPU_DOT%203.PNG





By viewing the CPU.dot you have now traced the RED trail to the source of the most exclusive. It is now that you open the RTE (Runtime Dump). Working from the bottom up, we can now get an idea of what the stack will look like in the RTE (Runtime Dump).

CPU_DOT%204.PNG




By comparing the RED path, you can see that the path matches exactly with this Stack from the Runtime dump. This stack also has the Thread number at the top of the stack.

So now you have found the thread number in which this query was executed with. So by searching this thread number in the runtime dump we can check for the parent of this thread & check for the child’s related to that parent. This thread number can then be linked back to the query within the runtime dumps. The exact query can now be found, giving you the information on the exact query and also the USER that executed this query.

For more information or queries on HANA CPU please visit Note 2100040 - FAQ: SAP HANA CPU

I hope you find this instructive,

Thank you,

No comments: