Part 2: The influence of Big Data
When looking at the Internet of Things (IoT) and the volume of new data that can be collected and distributed (relating to human behaviour and interactions) and the explosion of social media content, I see why Big Data is garnering such attention.
We’ll be collecting so much new data and using it in ways we’ve never done before. We’ll face a serious challenge to make sure we store the data appropriately, make it available to a large number of resources and make decisions with this data that we’ve never previously been able to.
As the second key influencer, Big Data will help Machine Learning to evolve, which in turn will make the IoT devices smarter. This means organisations of every kind will be impacted in one way or another by Big Data, they are either generating this new data or they need access to it to deliver their services. Regardless of who owns the data, everyone will have reasons why they want to tap into it and/or share it.
Impact on Testing
With the huge volumes of data captured and shared by the IoT, comes the challenge for testing that the data being captured will be used in real-time. As batch processing disappears, the opportunity to validate data sources before verifying the quality and veracity of the data will be increasingly difficult.
If we think of Big Data as a fast-flowing river, then Continuous Automation, and Performance Testing and Performance Engineering (where we extend the testing across into Production monitoring and Scalability) will become key services to help keep data flowing. A blockage in the real-time data flow will be catastrophic for business operations – poor IT performance will be immediately obvious and embarrassing.
Testing with Big Data will require familiarity with new frameworks for storage and processing of large data sets such as Hadoop, NoSQL and Statistical Analysis and extend engineering knowledge with languages like R, Erlang and TCL. Testing will extend beyond validating the functions that use the data and will need to test the accessibility of data in a large variety of situations involving transactions being performed by machines without human interaction.
Prior to being stored in one or more databases, the data will arrive in several formats and unlikely to be well structured. This will create a challenge for performance test engineers when establishing workload models as the unstructured data will likely break the model which will need to be optimised many times. Scalability could be an issue for organisations who underestimate the volume of data they collect and the speed at which they are required to share it.
Another challenge when developing in a world of Big Data is that part of the testing focus will be on ensuring data is being extracted from the correct sources. This will change the focus of our testing away from the User Interface. We will increase the amount of testing directly against databases or via APIs with the purpose of validating that data is coming from a single source of truth or at least correlating the various sources.
This gives us an opportunity to develop automation approaches designed around validating data (and their sources) on a large scale. It should remove the need for us to create and repeat mundane manual tests and enable us to get on to the more complex and positive types of testing around the business processes that will use the data.
Years ago, Business Intelligence (BI) tools seemed to be the ‘must have’ solution that every business was missing. While most organisations have found one way or another of making sense of their own and their clients' information, we’re going to see a new explosion of Big Data Business Intelligence (Data Analytics) tools that will be needed for the large volumes of new information that won’t simply work with the existing tools.
One of the reasons for this is that the data itself will change. Data will come in many forms such as images, videos and other sources with large file sizes. New roles focusing on Data Science and Predictive Analysis will become more prevalent, extending the typical skillset of test engineers.
Test environments will quickly become a constraint in the development lifecycle if organisations don’t invest in ways to replicate their existing environments. This will become more complicated as the variety of information and its sources rapidly expand and continually change.
Depending on the volume and variety of data, we may have to adopt several approaches to our testing; with ‘sampling’ of the data as a likely approach if the risks are acceptable to the business. In time, this will be made easier through machine learning algorithms but, in the meantime, we’ll need to implement approaches for automating test data creation or sourcing to speed the test preparation process.