Most notable web application performance failures – a top 5 list

Unintentional infinite loops

An infinite loop or unproductive loop happens in computer programs when loop never ends. This happens either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over again. Infinite loop causes programs to consume all of the processor time which in turn make systems unresponsive.

Here is one type of error message that user gets to see on browser when concurrency issues appear –

Such concurrency issues if addressed during the testing phase show very less occurrence in production and significantly less business and image loss to the organizations. Such issues can be reproduced by doing real time concurrency tests. In my next article I will explain how a user can do such high load concurrency tests using Jmeter. 

Memory Allocations and Garbage Collection

Memory allocation and garbage creation logic if not handled properly consumes a lot of processing time and affects performance in a application. Programs that manipulate strings tend to allocate a lot of memory, particularly if they are’nt designed carefully to prevent unnecessary allocations. Since allocating memory requires synchronization, we need to ensure that memory regions allocated by different threads will not overlap.

On the other hand, allocating a lot of memory typically means that we will also need to do a lot of garbage collection to reclaim memory that has been freed. If the garbage collection dominates the running time of program, it will only scale up the garbage collection.

Locality Issues

Sometimes modifying a program to run in parallel negatively affects locality. For example, let’s say that we want to perform an operation on each element on an array. On a dual-core machine, we could create two tasks: one to perform the operation on the elements with even indices and another to handle odd indices.

But, as a negative consequence, the locality of reference degrades with each thread. The elements that a thread needs to access are interleaved with elements it does not care about. Each cache line will contain only half as many elements as it did previously, which may mean that twice as many memory accesses will go all the way to the main memory.

In this particular case, one solution is to split up the array into a left half and a right half, rather than odd and even elements. In more complicated cases, the problem and the solution may not be as obvious, so the effect of parallelization on locality of reference is one of the things to keep in mind when designing parallel algorithms.

Database deadlocks

Database deadlocks happen in a system when multiple processes wait for the other(s) to release a lock(s) and neither could process its request unless the other releases the lock.

Eg – In the below example, both Process1 and Process 2 wait for each other to complete the other for it to proceed with its request and a deadlock is created.

Load Balancing

Load balancing is a computer networking method for distributing workloads across multiple servers (computers) or a computer cluster, network links, CPUs, or other resources. Successful load balancing optimizes resource use, maximizes throughput, minimizes response time, and avoids overload. Using multiple components with load balancing instead of a single component may increase reliability through redundancy. Offloading certain chores and duties of your web server to the load balancer, like SSL acceleration, can yield more capacity from them.

Lack of appropriate load balancing can wreak havoc on systems. Lack of visibility and under-utilized features and capabilities can lead to such problems. It can also be the result of non-optimized algorithms. Having high capacity CPUs behind your load balancers don’t always guarantee fast response times.

For effective loadbalancing, we need to ensure that the load is evenly distributed over the different machines. This is not as simple as it sounds, since different “chunks” of work may differ widely in the time required to execute them. Also, we often don’t know how much work each chunk will require until we execute it till completion.

Eg – Let’s assume that all iterations of a loop require similar processing and take the same amount of time, we could simply divide the range into as many contiguous ranges as we have cores, and assign each range to one core. Unfortunately, that’s not the case and work per iteration varies. Thus it is possible that one core will end up with many process consuming iterations, while other cores will have less work to do. In an extreme situation, one core may end up with nearly all work, and we are back to the sequential case.