May 6 2024

Bug Hunt: Are AI Faults Playing Hide and Seek?

In a world where Artificial Intelligence (AI) is increasingly integrated into everyday life — assisting doctors with disease diagnoses, powering self-driving cars, and more — the reliability of these systems has never been more critical. However, when AI systems fail, the consequences can be dire. Unlike traditional software systems, AI systems are complex and constantly evolving, making finding and fixing problems particularly challenging.

Traditional software bugs typically manifest as crashes or error messages, but AI bugs often lurk unseen, silently leading to incorrect or unsafe decisions, such as misrouting in self-driving cars. In my previous posts, I discussed the growing need for reliable and safe AI systems, as well as the importance of AI debugging in real-time, which involves finding, explaining, and correcting bugs. But debugging AI systems is a different ball game, and in this blog, we will dive into the intricacies of the first step: finding the AI bug, also known as bug localization.

To understand the challenges of bug localization in AI systems, let us consider what makes these systems unique. First, AI models heavily rely on data to be trained on. Suppose the data contains biases or errors, such as racial bias in facial recognition. In that case, it will unpredictably impact the system’s decisions and lead to incorrect outcomes, such as unfair denials of loans or services. The reliability of AI systems is heavily dependent on the quality and relevance of their training data, and anomalies or biases in this data can lead to bugs that are hard to trace to their origins.

Moreover, bugs do not always stem from simple code errors. Instead, they can arise from how the system interacts with the hardware it runs on (like a computer or a server) or the environment it is used in, which can vary widely from one location to another. A key finding from recent research is the concept of “extrinsic bugs”. My research has shown that bugs in AI systems are more likely to be extrinsic compared to bugs in traditional software systems. These bugs are not in the AI’s programming itself but arise from the AI’s interaction with other system components, such as the device it is running on (like a GPU) or the operating system (Windows, Linux, macOS). Imagine playing a video game that lags not because it is poorly made but because it does not work well with your graphics card. Similarly, an AI might fail not because its algorithm is incorrect but because it is not fully compatible with the hardware it is using. This presents a challenging situation because finding such bugs may require a different approach where we not only investigate the code or data, but also consider the broader system interactions and environment.

Lastly, AI models often consist of millions of interacting components with millions of parameters, making it extremely difficult to pinpoint which specific combinations contribute to an error. Unlike traditional software that follows a predictable set of instructions to execute a program, AI systems learn from data and make decisions based on their interpretations, which are often opaque. This earns them the nickname “black boxes”. Peering inside these black boxes to understand why an AI made a particular decision can be as tricky as deciphering human thought processes. For example, an AI diagnostic tool might accurately identify disease symptoms but fail to explain its reasoning behind the result, which is crucial for debugging the system.

Researchers are rising to these challenges with innovative solutions. Visualization tools, for instance, can demonstrate how AI interprets data, exposing hidden quirks. Moreover, incorporating explainability — designing AI to clarify its reasoning processes — is crucial for demystifying these “black box” systems. In conclusion, the ongoing journey towards reliable AI is complex and challenging, yet it is crucial to maximize the potential of AI technologies while ensuring their safe operation in our increasingly AI-driven world. With continued advancements in research techniques, AI bugs will soon have nowhere to hide.

Photo by MEUM MARE

Written by Sigma Jahan

Computer Science PhD (’25)

OpenThink

Bug Hunt: Are AI Faults Playing Hide and Seek?

Meet Dalhousie’s other OpenThinkers