When AI Gets It Wrong: Lessons from a Women-Centered Design Experiment Highlight the Need for Inclusive GPTs

Guest Articles

Monday
September 29
2025

When AI Gets It Wrong: Lessons from a Women-Centered Design Experiment Highlight the Need for Inclusive GPTs

In emerging markets, women entrepreneurs continue to face systemic barriers to financial inclusion — from limited access to credit and digital tools to exclusionary banking practices shaped by social norms. Do we really need AI to amplify these disparities? Research has shown that ChatGPT and other large language models (LLMs) can reinforce societal biases, perpetuating harmful stereotypes. When these biases go unchecked, they don’t just constrain access, they limit women’s potential for business growth and long-term financial health.

Researchers have found that LLMs are often guilty of:

Stereotypical role assignments: For example, AI often associates careers like nursing and teaching with women, while linking engineering or leadership roles with men.
Language use: LLMs tend to describe men with formal or positive adjectives such as “strong” and “intelligent,” while women are more often described as “caring” or “emotional.”
Underrepresentation: Women are frequently overlooked in historical or professional contexts, even in fields where they’ve made major contributions.

The list goes on. And these biases have real-world consequences.

A powerful example from the Center for Financial Inclusion highlighted how ChatGPT can reflect gendered assumptions in financial advice. When researchers posed identical financial scenarios, changing only the applicant’s sex, the guidance shifted: Women were more often directed toward meal planning and household budgeting, while men were steered towards asset management, estate planning or work-life balance strategies.

On the surface, both sets of recommendations include useful advice. The problem lies in the assumptions — that women’s primary financial role is cost-cutting withing the home, while men are expected to accumulate wealth, manage assets and serve as the main breadwinner. These stereotypes reinforce outdated roles. Women miss out on guidance that could strengthen long-term financial growth and inclusion, while men are denied practical advice on everyday financial management. By narrowing opportunities on both sides, AI risks deepening inequities in how financial health is understood and supported.

That’s not just a glitch. It’s a warning.

Which brings us to our own experiment.

Our Experiment: Building a Women-Centered Design GPT

CARE set out to build a women-centered design GPT: one trained exclusively on curated, balance-focused data on women’s lived realities and financial health. With the gender bias problems in mainstream AI models clearly documented, we wanted to see if a more intentional tool — one trained exclusively to better meet the unique needs and preferences of women, particularly in the context of financial inclusion — could help. CARE’s Women’s Entrepreneurship initiative piloted a minimum viable product of a women-centered design (WCD) chatbot using Chatbase, an AI-powered chatbot builder. In collaboration with our Strive Women program, here’s how we approached it:

Dataset curating: We sourced reports, articles and case studies from organizations focused on advancing women’s opportunities, especially in financial inclusion.
Pitch testing: We presented the concept to trusted practitioners in our community to validate the idea and gather feedback.
Prototyping: We trained the model exclusively on our curated dataset to ensure it reflected the perspectives and experiences represented in the data.
User testing: We invited users, a group of practitioners interested in financial services for women entrepreneurs, to try the chatbot and provide feedback on its accuracy, user experience and areas for improvement.
Iterative improvements: Over three months, we made changes — including adding source transparency and experimenting with prompts that encouraged context-aware responses and that included women and girls.

What We Learned

Our key measurement criteria was simple: Would our users return to the WCD GPT even after the pilot had ended — ultimately adopting it as their default resource for product design?

Here’s what we found:

Users’ experience with AI and GPTs ranged widely, affecting how they interacted with the tool and what they expected from it.
Many users appreciated the chatbot’s conversational, back-and-forth style, especially those who hadn’t used AI chatbots before.
Users valued the ability to quickly access industry-specific insights tailored to different client segments, confirming the need for accessible, (ideally!) unbiased tools.
The chatbot struggled with culturally contextual relevance due to its limited dataset, and it lacked features like document analysis or visual generation, which limited its usefulness for organizations developing products and services for women entrepreneurs.

While this feedback highlighted real value in the concept, most users defaulted to more fully featured, mainstream AI chatbots like ChatGPT — signaling that we didn’t have the sustained engagement needed to move forward with ours.

As we discovered, the reality is that maintaining a narrow, pre-trained model wasn’t the best use of resources, especially as mainstream tools evolved faster than we could keep up. But what we learned in the process matters just as much as what didn’t work. Because sometimes, innovation isn’t about what you build. It’s about knowing when to let it go.

How to Mitigate Bias in Existing GPTs

While our experiment didn’t yield a perfect solution, it validated the need for AI that reflects diverse user experiences and contexts. We found that effectively prompting existing GPTs often resulted in better, more comprehensive responses than building a specialized tool from scratch.

But even when working with a mainstream GPT, as an AI user, you can take practical steps to recognize and mitigate bias against women:

Be explicit in your prompts: Clearly state your expectations for language, perspectives and representation that includes women and girls. Example: “Summarize the barriers women face in accessing digital financial services, and suggest solutions tailored to their lived experience.”
Push for other perspectives: Prompt AI to consider women and girls in specific scenarios. Example: “Analyze how social expectations might influence women’s uptake of mobile banking in South Asia.”
Audit and improve women and girls’ representation: Ask AI to identify and revise misleading or biased language in documents. Example: “Review this product brochure and suggest changes to ensure it uses balanced language.”
Request sources and verify information: Ask the AI to provide sources and independently verify them to ensure accuracy. Example: “Provide sources for your response.”
Reflect on your own perspective: Consider your assumptions and the language you use in prompts. Instead of: “Describe the decision-making process of a family when choosing a loan product,” try: “Describe the decision-making process for choosing a loan product in households where a woman is the primary financial decision maker. What unique factors might influence her choices?”

And foremost — review AI responses critically. No matter how advanced AI becomes, it’s essential to apply your own judgment and experience.

At the end of the day, AI is a tool. It reflects the data it’s trained on, which often mirrors social norms that pigeonhole both women and men in outdated roles. These stereotypes are harmful in different ways: They constrain women’s access to wealth-building guidance and financial inclusion, while also reinforcing rigid expectations for men. While AI can and does perpetuate these patterns, it also has the potential to challenge them and support more equitable financial health and inclusion. Achieving this requires vigilance, intentionality and ongoing human oversight to ensure that AI systems expand, rather than restrict, opportunities for all.

If you’re interested in experimenting with customized GPTs or running lean trials using human-centered design, please reach out. We’d love to collaborate and learn together.

In the meantime, we welcome you to share your best prompts for using AI chatbots to get tailored, accurate and improved insights, so we can continue to learn and improve as a community.

Koheun Lee is the Human-Centered Program Manager for CARE’s Strive Women Program.

Photo credit: Media Lens King

NextBillion - A WDI Publication

Guest Articles.