10 Things to Consider when Testing AI
The world we live in has changed significantly, no doubt you that haven’t noticed. Data is available in larger volumes, from more sources and at a faster speed. Data is collected about your actions, in both the real and online world by thousands of organisations and from hundreds of countries. Individually, each person on this planet produces over 300 megabytes of data a day. Processing power is more freely and cheaply available, and thanks to ‘big data’ and ‘data science’, we now have the ability to process and analyse this data at lightning speeds.
Machine Learning has also enabled software to learn, which is a huge step forward as we move away from a world where software developers need to specify exactly how software should behave in every situation. The corollary of this is that artificial intelligence has become an area of great focus across the industry as a whole.
So what does this mean for testing processes? How do you set about testing an AI-being?
1. Your regression testing pack does not need to include a TrivialPursuits set
I hate scoping regression testing, but I particularly hate it with AI-enabled technologies. Testing people tend to approach the problem by assessing a production solution and checking that a test solution does everything identically.
That’s really overkill with AI. General AI or ‘Strong AI’ of the type that can talk to you about anything is decades away. Most of the AI solutions that are going to be available in the near future fit into the category of “Narrow AI”, i.e. they are applied to a small enough problem domain in which the AI problem can be solved – or multiple Narrow AIs joined together to create the perception of General AI – e.g. Alexa, which contains hundreds of intents you can trigger, such as the intent to play a song.
So when faced with a natural language interface, you don’t need to start trying to ask it every question from your general knowledge book, you need to focus on the functionality that it is actually trying to provide.
2. Repeating the same test with the same inputs may not produce the same results
Machine Learning is a ‘Markov Process’, that is, an algorithm where the last calculation impacts the next. This is really important when both designing and executing your tests. ‘Clearing down’ your algorithm is obviously the way to go, but how well does that scale beyond unit testing? If your product has just started responding the way the users expect, do you really want to delete all that ‘learning’ so that you can run a regression test?
3.The more you test something, the more it is going to work
Carrying on the theme that using the software changes the software – you should also be aware that the more you test something, and the AI sees the result as a success, the better that functionality will work. The very nature of running a test procedure, increases the chances that the functionality will work, given the same algorithm and data at a later date.
4.Managing the outputs is critical
As I mentioned above, your algorithm may improve so much in testing artificial intelligence that you don’t want to throw it away, so you want to promote it to production. Great, but do you know every test that was executed? Was anything done, say to test some error handling, that might negatively impact the production algorithm? Do you have ‘chain of custody’ on everything the machine learnt? As you can imagine, Configuration Management has a whole litany of questions to answer on this.
5.There’s no AI without architecture and design
Anybody who asks you to test any kind of AI technology without some kind of documentation, is using you to test how people ACTUALLY interact with their technology, or doesn’t really have a viable product. It isn’t credible to build an AI solution without putting significant thought into things like dialogue models and algorithm design. In fact, the biggest enabler to AI is data, and therefore arguably the biggest enabler to AI design is information architecture.
6. Don’t expect 100%
Modern machine learning algorithms identify patterns in data, and use those patterns as rules – as heuristics – so rather than being explicitly programmed the algorithm has experiences, learns, and this changes its behaviour in the future. Heuristics are techniques the human brain uses too, they are an imperfect but fast method to speed up the process of finding the solution to a problem – they are mental shortcuts that make it easier to make a decision. These may be optimised with modern AI algorithms, but don’t expect them to get it right 100% of the time. After all, machine learning essentially relies on averages to come up with the ‘best-guess’ response.
7. Is your test data REALLY representative?
This topic is too large to cover in this article, but machine learning (a crucial element of AI technologies) is only as good as the data it is trained and tested with. It is as simple as that. Consider your test data, is it representative of your target user base? Consider your testers, are they representative – are there edge cases in society that will impact the software you are testing?
8. What’s the ethical / legal / societal impact?
Given the much hyped risks of demented killer robots and rash algorithms ruling our world, it is important to think through the exact risks introduced by any AI tech. As testers we need to consider legal and ethical requirements as ‘unwritten requirements’. We can rely on designers and developers to provide us with the feature requirements, but we have something of a duty to educate ourselves on the ethical, legal and societal requirements, and assess edge cases in software against those. Using technology to do something as simple as deciding whether to offer same delivery to certain post-codes can have unexpected impacts.
9. Where are the outputs of your testing?
As I mentioned above, every time you test machine learning, you change it. This is the opposite of when I started testing, where the golden rule was that testing did not improve a product, just provided information about it. Now, testing is part of the feedback loop that improves the model – and it’s important to understand how your testing will change a model, and if that model will be promoted to production. You may not wish to load test negative scenarios if the result will end up skewing production behaviours for the rest of time.
10. Why do I care about all of these things? I’m a tester?
As software becomes more human, and the effort involved in testing becomes increasingly automated, we need to look at our profession carefully. In my opinion, testers are the only people who look at new software critically, before it reaches public use, and with an influence on product development. I think it is key that we evolve our profession to consider the wider societal, legal and ethical influences, before we give our stamp of quality to any technical product. If not us, who exactly?
Adam Smith is Piccadilly’s chief technology officer and leads the company’s technology innovation. Adam also has extensive experience leading, driving and solutioning across a range of testing disciplines, including test automation, performance and penetration testing as well as the traditional functional testing.