Risks of Testing Performance in Scaled down environments


Performance Testing, planned considering the capacity of the production environment, ideally should be performed well before the Go-Live date. It needs to be carried out in production like environment to identify maximum performance issues in testing phase. But maintaining a production environment replica is expensive and most companies try avoiding these expenses by switching to a low capacity environment. The end result is scaled down Performance Test environment. This leads to gaping holes in the system –

2 + 2 = 5

Yes that’s exactly how things are in performance testing. Every test (and thus the factors associated with it) is governed by a number of parameters and one can’t just half (or another percentage) the CPU and memory requirements of the environment and test it with half (or another percentage) the number of users. One needs to be overly cautious about this approach that most app development teams tend to take this approach and make mistake in it. The performance of a business transaction, as I said, depends on many factors like – CPU, Memory, type of load balancing, geographical latency, etc. By reducing the number of users in the test, we tend to make mistakes like –

  • Every user logged into an application from a client consumes resources on the app, db and web servers. Session-state which is associated with the user’s login is the best example. The session state information gets stored on the server in its memory and remains there till the user is logged in to the system. Only when he logs out, his session expires and his details clear out. In many apps, this accounts for significant memory consumption on the servers. If less number of users is used in PT, less session / memory issues are identified. There is a possibility that the development team would not come to know about these memory issues in the tests and may lead to customers facing it.
  • In addition, by under representing the number of users, a compromise is made on the number of simultaneous connections made by the clients to the servers. App / web server settings may have connection limits and the issues arising due to crossing these limits not get exposed during the tests. Further-more, each simultaneous request may require a Db connection and the Db server may have a limit on simultaneous connections made. Ideally, a Db should be able to process the amount of concurrent reads and writes during execution and this concurrency is primary cause of many load associated failures.

Thus when the environment is halved, its capacity is not halved and thus you can’t test the environment with the halved capacity since the response time results won’t be reliable.

Deadlocks and Livelocks

Database deadlocks happen in a system when multiple processes wait for the other(s) to release a lock(s) and neither could process its request unless the other releases the lock.

Eg – In the below example, both Process1 and Process 2 wait for each other to complete the other for it to proceed with its request and a deadlock is created.

LiveLock is similar to a DeadLock, except for the processes that are not blocked forever and are being processed constantly by the CPU. Primary difference between Deadlock and Livelock is the fact that Livelock responds after long time and a Deadlock never does.

In case of scaled down environments since less number of transaction load is created on the system, the process entanglement and locks are sometimes not observed in the tests leading to its occurrence in the production.

What’s my total Capacity?

total_capacity

With scaled down environments you would never be able to estimate the true capacity of your environment. You can’t just extrapolate the numbers. Thus in scenarios where your business would be expecting real time production capacity from you (the capacity engineers), you probably would not be able to give them the numbers or may end up giving numbers which may not be close to the environment’s capacity.

Many such situations made me lead to the conclusion that performance testing in scaled down environment is not worth it and hence whenever I don’t have sufficient capacity environment, I use cloud to test my systems. With cloud, I am able to identify deadlocks and session related issues. I also make sure that the business and other stake holders are made aware of the risks of testing on virtual environment (if the production is on physical infrastructure).

I would thus like to end with a note that Capacity Planning tools give you numbers that fail in production. On the other hand, these tools are way too expensive. If you are a performance test analyst (or have an expert performance tester in your team) with concepts well understood and ingrained, you should study the system, its architecture, its business users and understand their requirements. Based on this you should plan the test (and thus the environment based) in such a way that you will replicate most performance issues right in test environment and may not allow any of them to move in production EVER. In my next article, I will talk about how to plan a test so that it mimics production like scenario and helps us identify maximum number of performance issues.