
A Broader Examination of Fake News
In India, there has been a surge in AI-generated videos featuring businessmen, politicians, journalists, and sports personalities, aimed at trapping people in financial scams. While fact checkers, aided by detection tools, have managed to identify some instances of content manipulation, the effectiveness of these tools has been inconsistent. Detection results often vary across tools when analyzing the same content. More importantly, the most significant challenge remains the inability of detection tools to accurately analyze and identify AI-generated content in regional languages.
Why Are AI-Detection Tools Failing to Debunk Regional Language Content?
A study published by Sage Journals and reported by Nieman Lab found significant biases in the data used to train detection tools. According to the study, most commercial detection systems—and even models developed by academic researchers—are primarily trained on datasets sourced from the Global North.
Audio detection models are heavily trained on English and other Western languages, while image and video detection models tend to focus on imagery featuring white people and those with lighter skin tones. As a result, cultural nuances and references from the Global South are often overlooked, leading to lower accuracy for journalists trying to verify clips in regional Indian languages.
A Broader Examination of Fake News
Research conducted by Sujit Mandava and Sahely Bhadra from the Indian Institute of Technology Palakkad, along with Deepak P. from Queen’s University Belfast, highlights how cultural dominance by the Global North influences the way AI models operate.
Instead of examining deepfake detection specifically, the researchers evaluated a broader category of AI models built to identify fake news. Their focus was on FNDNet, a model released in 2020 that reportedly achieved over 98% accuracy in traditional benchmark tests. However, when tested against content originating from the Global South, FNDNet’s performance faltered.
The researchers suggested that AI models trained mostly on Global North data learn word patterns that do not work well everywhere. Words common in the Global North are strongly linked to decisions about whether news is real or fake. But the language and cultural styles of the Global South are often missing, causing many mistakes—especially false negatives, where fake news is wrongly identified as real. Their tests showed that when FNDNet was used on fake news from the Global South, it made false-negative errors about 40% more often than false positives.
Although the researchers describe their evidence as “coarse-grained” and stress the need for deeper studies, the implications are clear: AI models built and benchmarked using Global North-centric data are likely to reproduce and reinforce existing cultural biases.
How to Overcome the Bias Embedded in AI Algorithms
To tackle this issue, the authors advocate for greater transparency. They propose that AI developers release "geopolitical model cards"—documents that disclose which regions, languages, and demographics are represented in a model’s training data. This transparency, they argue, would help users understand the limitations of AI systems, especially when applied to news content from underrepresented regions.
This perspective aligns with sentiments expressed by Indian researchers. In May 2024, deepfake expert Mayank Vatsa from IIT Jodhpur summed up the concern vividly: “Data is the new oil. But you need to ask yourself—what kind of oil are you putting into your engine? Many existing tools are built for populations very different from those we work with daily.”