AI chatbots fail vulnerable, non-native users; Says study

AI chatbots designed to expand global access to information may be delivering less-accurate responses to the very users they are meant to support, according to new research from the MIT Center for Constructive Communication (CCC).

The study, conducted at the MIT Media Lab, found that leading large language models (LLMs) produce less-accurate and less-truthful responses for users with lower English proficiency, limited formal education, or those based outside the United States.

In some cases, the systems refused to answer questions at higher rates or generated patronizing and condescending language when interacting with these groups.

The research evaluated three prominent AI models: GPT-4 by OpenAI, Claude 3 Opus by Anthropic, and Llama 3 by Meta. Findings were presented at the AAAI Conference on Artificial Intelligence in January under the title ‘LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users’.

AI Chatbots accuracy decline at different levels

Researchers assessed AI chatbots’ performance using two established benchmarks:

TruthfulQA: Measuring truthfulness by testing common misconceptions,
SciQ, Evaluating factual accuracy in science exam questions.

Lead author Elinor Poole-Dayan, a technical associate at the MIT Sloan School of Management who conducted the research as an affiliate of the MIT Center for Constructive Communication (CCC) and as a master’s student in media arts and sciences, said the team began the study with the goal of examining whether large language models could genuinely expand equitable access to information worldwide.

She noted that this objective cannot be achieved unless biases and harmful tendencies embedded in the models are effectively identified and mitigated to ensure fair treatment for users across languages, nationalities and demographic backgrounds.

To mirror real-world use, researchers prefixed each question with brief user biographies varying in education level, English proficiency and country of origin. Accuracy of AI chatbots consistently declined when models responded to users described as having lower formal education or limited English skills.

The sharpest drop was recorded among users who combined both traits. Non-native English speakers with lower educational attainment experienced the most pronounced reduction in response quality across all three models of AI Chatbots and both evaluation datasets.

AI chatbots-risk- vulnerable, non-native-users-GCC Business News — Rep. Image credits: vectorjuice@Freepik | Cropped by GBN

Country of origin further affected outcomes. When comparing users from the United States, Iran and China with similar education levels, Claude 3 Opus demonstrated a notable decline in accuracy for users from Iran across both benchmarks. The findings suggest that national origin, alongside other demographic factors, can intensify disparities in performance of AI chatbots .

Researchers cautioned that such compounded effects may increase misinformation risks for vulnerable groups who already face challenges in independently verifying information.

Higher refusal rates and patronizing responses

Beyond accuracy gaps, the study found notable differences in refusal rates of AI Chatbots. Claude 3 Opus declined nearly 11 percent of queries when interacting with users described as less educated and non-native English speakers, compared to 3.6 percent in control cases without user biographies.

A manual review of these refusals showed that patronizing or condescending language appeared far more frequently in responses to less-educated users. Some replies featured exaggerated dialects or mimicry of broken English, patterns rarely observed when identical questions were posed by profiles described as highly educated.

AI chatbots- vulnerable, non-native-users-GCC Business News — Rep. Image credits: vectorjuice@Freepik | Cropped by GBN

Researchers also identified cases in which models withheld information from specific national groups. Questions related to nuclear power, anatomy and historical events were declined for less-educated users from Iran or Russia, while the same queries were answered accurately for other user profiles.

The findings indicate that alignment and safety mechanisms of AI chatbots embedded in large language models may result in differential treatment based on perceived user characteristics, potentially restricting accurate information for those considered more likely to misunderstand it.

Equity concerns with surge in AI adoption

Large language models have been widely promoted as tools to democratize knowledge and advance personalized learning worldwide. Their rapid uptake across education, business and public services has been matched by significant investment and deep integration into everyday digital platforms.

However, the CCC study suggests that without rigorous oversight and effective bias mitigation, AI chatbots risk widening existing inequalities rather than narrowing them. Users with lower educational attainment or limited language proficiency may be more likely to receive inaccurate, incomplete or withheld information.

As governments and institutions across the GCC and beyond accelerate AI adoption in public services, education and digital transformation initiatives, the findings highlight the need for continuous auditing, inclusive testing and transparent governance frameworks.

Equitable performance of AI chatbots across languages, nationalities and educational backgrounds will be essential to realizing AI’s potential as an inclusive information tool. The research underscores the responsibility of developers and policymakers to embed fairness, accountability and safeguards as AI systems become further embedded in daily life.

AI Chatbots accuracy decline at different levels

Higher refusal rates and patronizing responses

Equity concerns with surge in AI adoption

YOU MAY LIKE

LATEST ARRIVALS

MORE FROM THE TOPIC