
In my last blog post, we explored how AI-based systems present unique challenges in bug reporting and localization. Today, let us shift our focus to another fascinating aspect of bug reports — the problem of duplicate bug reports in any software system. Imagine you are part of a team working on a massive software project like Google Workspace at Google or Slack at Salesforce. Every day, hundreds of bugs are reported by users and developers. Each bug is submitted to a bug-tracking system as a bug report. However, there is a catch — many of these bug reports are duplicates. This means multiple people often report the same bug without knowing it. This duplication creates a significant headache for developers, wasting valuable time and resources.
Why does this happen? Well, the issue lies in how bug reports are written. They are written in natural language, just like we speak or write daily. Because of this, different people describe the same problem in diverse ways. For example, if your car breaks down, one person might say, “The car won’t start,” while another says, “The engine is dead.” Both describe the same issue but with different words. The same thing happens with bug reports, making it hard to spot duplicates.
Now, manually sifting through hundreds of bugs reports to find duplicates is not practical. This is where automated systems come in. Researchers have developed various techniques to automate this process, including Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML). Each method has its strengths and weaknesses. We can categorize duplicate bug reports into two types. The first type involves reports that describe the same issue using similar words. The second type, which is more challenging, involves reports that describe the same issue but use entirely different words. Our research shows that about 19% to 23% of duplicate bug reports fall into this second category, which we call “textually dissimilar duplicates.”
Most existing techniques focus on detecting duplicates that use similar words. However, our study found that these methods struggle with textually dissimilar duplicates. We analyzed 93,000 bug reports from Eclipse, Firefox, and a mobile system (Android and iOS) to address this. Traditional techniques often miss these dissimilar duplicates. Inspired by domain-specific approaches, we tried something new. We trained our models on bug reports from the same software domain to better understand their unique language and context. For instance, Gmail bug reports might include issues like “inbox synchronization errors” or “spam filter misclassifications.” Our model could more accurately detect duplicates by focusing on these specific terms. This approach showed mixed results — it improved the detection of dissimilar duplicates but was not as effective for textually similar ones.
In summary, while progress has been made in automating duplicate bug detection, challenges remain, particularly with textually dissimilar duplicates. Our research highlights the need for better techniques to understand the nuances of bug descriptions. As we refine these methods, we aim to make software development more efficient and less frustrating for developers.