Friday 28 August 2009

Unit Performance Testing


I mentioned in my previous post about the need to test performance early. I always intended to follow up with more detail on how this is achieved and this entry is designed to provide some detail behind this. The fallacy of only looking at performance after system integration testing is now becoming well documented so there is no need to repeat my earlier points.

Note: I am not saying that system performance testing shouldn’t be done, just that Unit Performance Testing should be done as well to catch problems early on and therefore reduce costs/rework. So Unit Performance Testing should be thought of in the same way as Unit Testing (e.g. not a replacement for system testing but as a precursor to)

However, the next question is how can you measure TAP (Total Application Performance) at a unit level before the whole application has been put together? To expand on this, supposing you are developing a functional area of the system which must respond to the user within 10 seconds. So you have a TAP Metric of 10 seconds for this functional point/use case (this being the typical post integration performance test target metric).

However, in this functional area, there are 5 discrete units of code that need development. These units all require execution to deliver the functionality to the user. They most likely will have different levels of complexity, performance profiles, and could possibly behave slightly differently (e.g. in terms of performance) depending on how they invoke each other.

The point is, although you can record the performance of each unit of code, how can you possibly derive a target pass/fail mark from the TAP given the variables above? It is this sticking point which has lead many to give up pursuing this further.

The solution is heuristics and probability. Whilst it doesn’t guarantee to catch 100% of all performance bugs, it has a high degree of probability to catch the majority of the really bad performance bugs (and it is these which generally require expensive re-writes).




Heuristics:

Using the previous example of a TAP target of 10 seconds comprising 5 separate units of code (A,B,C,D,E), I have presented 3 scenarios above (1-3).

Imagine that only unit “A” has been built yet. What should its pass mark be?

A starting place (without weighting or analysing any of the unit specifications) is 2 seconds. This simply divides each unit up equally (see scenario 1). If “A” is within this metric then it is probably low risk.

In Scenario 2, unit “A” is taking 5 seconds to execute and should immediately flag a warning. It may be that taking 50% of the TAP target is not an issue as this particular unit of code is doing most of the work (e.g. getting data from the database) but it does flag the need for inspection and clearly redefines the acceptable Uniform Performance Target of 1 second for each of the subsequent Units (B-E).

Scenario 3 is actually not uncommon. One – often significant unit – of code performs exceptionally badly and uses up (and sometimes exceeds) the TAP target. The remaining units of code (B-E) must use ZERO time or otherwise fail the TAP Target. Given that this is unlikely, Unit A needs to be profiled and optimised in order to reduce its execution time.

So to conclude a very simple approach (without needing a lot of upfront analysis) provides a mechanism to identify units of code that may contribute to failure when full system performance testing is conducted. The three scenarios are indicative (Green: within allocated time limit; Amber: above allocated unit limit but not above TAP Limit; RED: up to or exceeding TAP limit)

It is also worth noting, that sometimes “A” has already been written in a previous development cycle and whilst performant then, is now performing badly in the context of its usage/invocation by unit “B”. For example, say the purpose of unit “A” was to process thousands of widgets in a day, but unit “B” is asking it process a single widget over a thousand days. The subtle difference may not have been apparent at the time, but “A” had been optimised to iterate over widgets not days and the new use case is causing it to badly perform.

In this example, unit “A” isn’t even on the radar (as it already exists), but through the “RED/AMBER” flagging of unit “B”, the subsequent profiling reveals that 80% of Unit “B” processing time is taken up by “A”. In response to this it may be necessary to rewrite/enhance/optimise Unit “A”. This again may affect any other units relying on Unit “A”. Again all this is identified long before system testing is conducted.

Weighting:

As shown above, even using a simplistic rule of thumb approach can identify and capture issues early on and is worth piloting to gain further insight into how this approach can be customised within the context of your development processes.

The Unit Targets can be further refined with the upfront involvement from architects on the unit specifications. Analysis up front should be able to weight the units in terms of likely performance impacts and used to modifiy the time allocation given across the units (e.g. from the equal distribution of time to weighted).

For example, an obvious default weighting might be 3 for database activity, 2 for processing/business logic, 1 for gui/presentation. The higher the weighting the longer the predicted execution time.

You would then add up the weightings and divide the total by the TAP target and this would give you the weighting value in seconds (or part of) which you then apportioned to the unit based on its weighting.

For example using say that “A” was getting data from the database and was weighted as 3. The remaining 4 units (B-E) were simply presentation and were weighted as 1 each. This would give you a total of 7 divided by the TAP of 10 gives you an individual weighting of 1.428 seconds.

From this you derive the acceptable performance for Unit “A” is 4.285 with each of the other units limited to 1.428 seconds.

Whilst no formula is guarenteed, the use and refinement of these tools will enable you to capture a greater degree of performance issues early on reducing significant rework, costs and delays.