Skip to main content

Data needs are flaring up for businesses, with over 95% of businesses facing unprecedented needs to manage unstructured data. Another 40% of businesses indicated the need for regular management of data. In this scenario, where enterprises are increasingly seeking to gain flexibility and agility, Continuous Delivery becomes a sought after feature that could assure higher ROI. This is where ETL plays an indispensable role.

ETL Testing

A complete ETL data validation involves verifying the source and target table structure & data according to the mapping specifications and applied transformation rules. ETL testing is performed to ensure that the data loaded into the target system/warehouse is complete, accurate, and without data losses.

ETL Testing Challenges

Being consistently different from other forms of conventional data testing procedures, and owing to the layers in its testing procedure, it is cast as a vast procedure that also summons in various problems. The vastness of this process also makes it expensive and time-consuming, including the cost of downtime and a labor-intensive procedure. Apart from these, there are technical difficulties that also affect the ETL testing process.

  • Testing of heterogeneous data coming from varied sources
  • Data loss and data bugs following the transformation procedure
  • If the data warehouse system contains historical data, the data might be too large and complex to execute the test in the target system
  • Unstable testing environment
  • Achieving 100% test coverage is almost impossible
  • In cases of manual testing, it is difficult to incorporate source to target data models and mapping requirements, job schedules, Business Intelligence report requirements, and complex SQL data coding systems

Here are the two best methods of ETL Testing:

1. Sampling or Stare & Compare Method

This method involves building a test case based on a review of the data mapping and business rules. First source files are validated to make sure that they are properly structured, have Headers & Footers, and contain data. Next, the data type and format in the source and target are checked for similarity. The length of data types in the source and target is also checked, as per the transformation rules. Also, relationships between tables are verified as per the defined Integrity constraints.

This is followed by creating test cases. Each test case has two SQL queries—one for the source data, and one for the target data warehouse. After performing the tests, the results are collated into two Excel sheets.

This labor-intensive process becomes almost impossible to complete, and eventually, less than 1% of the data might be validated. According to a report by Gartner, nearly 40% of enterprise data is either inaccurate, incomplete, or unavailable, which results in businesses failing to achieve their data-driven goals.

2. Minus Queries

This method uses the MINUS query function in SQL to evaluate the source minus target set and target minus source set. If a resulting set is returned, it displays the data discrepancies.

However, there are multiple business and technical discrepancies in this test case. For example, if there is no resulting set that is returned, it prevents historical review or data analysis from past analyses, which also in turn prevents presenting summary reports of test results. Secondly, it returns only distinct values so that you see the values only in one table. The presence of duplicate rows is not displayed.

An Optimal Solution - Automated ETL Testing

Finding an automated ETL testing solution that is tailored for your business requires professional insight. Incorporating a completely new system over your legacy system can be a cruel process and to stay on the scene, you need a seasoned team of professionals that can help you with making these major decisions according to the whims of the market. This is an important concern as there is no guiding principle to what test cases to automate and what to not automate.

The below figure illustrates the Integrated Automation Platform for ETL Testing

Lastly, it is imperative to remember that the commitment to incorporate automated tools for ETL testing is contingent on an additional budget to meet the requirement. While the cost of bad data can continue to snowball year over year, this blog notes that the savings from accurate data can be as huge as $8,495,000. Ergo, it is better to have a customized automated ETL testing process with sincere guidance and constant support than having no test automation at all.

A Portfolio of Pyramid Consulting

Pyramid Consulting has been operating in the business technology solutions for over two decades through the dotcom bubble burst and the 2008 financial crisis, ushering in new, consistent solutions for clients.

To get a deeper and more technical understanding of how automated ETL testing enhances business processes, data quality & validation, and accelerates time-to-market, read Pyramid Consulting’s comprehensive whitepaper.

Carl Johanson

About the author

Carl Johanson

Practice Director, QA & Testing

Carl has been creating custom, forward-focused and effective QA solutions for Pyramid Consulting since 2012. Implementing the latest tech and thought leadership delights Carl just as it does clients. In his personal life, Carl focuses on doing things right and doing the right things—an important distinction—as efficiently as possible. If you need Carl urgently, check out your local drumming scene or the closest beach, otherwise you can find him at home with his beautiful and ever-growing family.

Cookie Notice

This site uses cookies to provide you with a more responsive and personalized service. By using this site you agree to our privacy policy & the use of cookies. Please read our privacy policy for more information on the cookies we use and how to delete or block them. More info

Back to top