Automation in QA has quickly gone from “sophisticated and cutting edge” to “mainstream and necessary”. However, introducing automation practices into a firm’s regular workings is not without its costs: it requires additional investments in talent acquisition, automation tools, and frameworks. Even with the additional resources, improperly written scripts can cause unnecessary hassles and inefficiencies. Measuring automation success is thus crucial in understanding whether automation efforts have been fruitful or deserve a revisit.
A TALE OF THREE CRITERIA
Automation success can be boiled down to the satisfaction of three criteria: More Efficient, More Effective, and Less Manual Effort.
With time being our primary resource of value, efficiency refers to any direct or indirect reduction in wastage of it. Effectiveness refers to how successfully defects are identified by our automation scripts. Less manual effort refers to any direct or indirect reduction of workload handled by human testers. Understanding these three criteria is important since the metrics that we will be using to measure automation success are all related to these in one way or another.
METRICS AND MEASUREMENT
While there are no truly “standard”, “one size fits all” metrics to determine the success of automation, the following metrics give us a basic idea of how effective our current implementations are. The best measurement practice would be to assign priorities to each of the metrics and take a weighted value (or) combine two or more metrics to form hybridized metrics that are relevant to the specific workings of your firm.
This metric refers to the share of test cases that have been successfully automated out of the total number of cases. The higher the share of automated test cases, the more successful our automation success is perceived since automation tends to reduce both time consumed and human efforts.
P.A = A.T/T.T
Where P.A = the Share of automated test cases
A.T = Number of test cases that have been automated
T.T = Total number of test cases
Simply automating test cases isn’t enough. All test cases need to be explored thoroughly by the computer, accommodating any changes that the AUT may undergo in the meanwhile. Fragility is the measure of how much time is being spent on fixing and updating test scripts at each stage. A high level of fragility means that more time and efforts are being wasted on fixing and updating test scripts, which is a highly undesirable scenario.
The ATP measures how much of the product’s functionality is being covered through automation. It helps us understand the “completeness” of automated testing being performed. The higher the coverage percentage, the more the functionality of the product is tested – resulting in a highly effective testing overall. It is measured using the formula:
ATP = (AC)/(TC)
Where ATP is Automated Testing Coverage Percentage
AC = Automation Coverage
TC = Total Coverage
One of the most important metrics, the defect identification ratio, tells us how effective is the automation. It measures how effective our automated scripts have been at identifying defects. It is also an indicator of how high the quality of the final product is. The formula is as follows:
DIR = TD/(TD+TAD)
Where, DIR = Defect Identification Ratio
TD = Number of defects identified during testing
TAD = Number of defects identified after delivery
This self-explanatory metric is usually calculated with respect to test cases whose manual testing times we already know (or) whose manual testing times we can accurately estimate. Time Saved is the difference between known/estimated time of manual execution and time consumed by automation. Fragility is also introduced as a parameter in Time Saved since fixing and updating scripts consumes a significant amount of time.
T.S = ET-(AT+F)
Where T.S = Time Saved
E.T = Estimated time/Known execution time of a test case
AT = Time taken for execution of same test case by automated script
The main reason this metric has been placed last is because it needs to be taken with a pinch of salt. Defect Discovery Rate refers to the rate at which defects are being identified for each test case. While this metric may be acceptable as a way of obtaining a rough idea of how effective our scripts are, it is definitely not recommended as a major metric since it is unsuitable in cases where developers write effective code low on bugs from the very beginning.
DDR = (Number of Identified Defects)/Single Test Case
Anything that isn’t measured cannot be improved. The assumption that automation unconditionally leads to better performance is incorrect. Automation works only when done right. It is important that firms keep a constant record of how successful their automation is, making improvements at each stage whenever possible.