Pyrus is used daily by several thousand organizations worldwide. The service’s responsiveness is an important competitive advantage, as it directly affects user experience. Our key performance metric is “percentage of slow queries.”
One day we noticed that our application servers tend to freeze up for about 1000 ms every other minute. During these pauses several dozen queries piled up, and customers occasionally observed random delays in UI response times.
In this post we search out the reasons for this erratic behavior, and eliminate the bottlenecks in our service caused by the garbage collector: