Home / Editorial / A Case Study of ChatGPT Bias

A Case Study of ChatGPT Bias

Jun 28, 2024

Benjamin DaigleSoftware Development Expert

ChatGPT is now a well-known language model developed by OpenAI — and is aiding us in a variety of tasks. However, recent research has shown that it is not immune to human flaws. After all, AI programs are trained by people with different belief systems and opinions, so it’s no wonder the platform may also operate under some biases and stereotypes. Users and educators are advised to tread carefully and always review the content they generate with this AI’s help. While developers are making continuous efforts to minimize biases, they are welcoming any feedback for improvement. Below are some key points and recent findings about bigotry in ChatGPT.

General Biases in ChatGPT

Language and Cultural Preference: The model performs best in English, although it is improving in that regard, and recently even invented its own language. Nevertheless, it is primarily attuned to Western viewpoints, potentially leading to misconceptions against non-Western perspectives.
Reinforcement of User Beliefs: ChatGPT’s conversational style has the potential to strengthen a user’s preconceived notions. For example, it can align with a user’s firm stance on a political matter without any proof, After all, it is programmed to affirm the user’s statement, thus further cementing their viewpoint.
Impact on Education: Biases in ChatGPT could have a detrimental impact on students, potentially leading to unfair treatment of those who are learning English as a second language.
Critical Thinking in Education: Educators can use ChatGPT to teach students about bias and critical thinking by demonstrating how certain questions lead to biased responses. This exercise helps students recognize biases across various platforms and promotes responsible digital citizenship.

Specific Findings on Bias Against Disabilities

Case Study: Resume Screening

Automated screening has been common in hiring for years. Last year, when Kate Glazko, a University of Washington graduate student, was searching for research internships, she noticed that recruiters were using OpenAI’s ChatGPT and other artificial intelligence tools to summarize resumes and rank candidates. But Kate, a doctoral student in the UW’s Paul G. Allen School of Computer Science & Engineering, studies how generative AI can replicate and amplify real-world biases, particularly those against disabled people. She wondered how such a system would rank resumes that implied someone had a disability.

Prejudgment in Resume Ranking: Researchers from the University of Washington discovered that ChatGPT consistently gave lower rankings to resumes that boasted disability-related honors and credentials. When analysts asked it to justify the rankings, it implied that a resume with an autism leadership award called the “Tom Wilson Disability Leadership Award,” placed “less emphasis on leadership roles,” perpetuating the stereotype that individuals with autism are not effective leaders.
Attempts To Reduce Bias: Then researchers tried to provide the tool with specific instructions to avoid having prejudice against individuals with disabilities, and ChatGPT managed to reduce its prejudice for almost all the disabilities tested, but its results were still far from perfect. Even though we are analyzing the effect such bias has on the recruiting process powered by AI, it is important to remember that these programs are developed by humans, and even in-person hiring could suffer from these fatal flaws.

Detailed Study Findings

Methodology: Researchers used a publicly available CV and created six modified versions, each implying a different disability. These resumes were put through ChatGPT’s GPT-4 algorithms along with a real job listing.
Results: The modified CVs with disability-related credentials were ranked as the top choice only 25% of the time. The system’s explanations for these rankings noted explicit and implicit ableism.
Customization Efforts: Instructions were added to the GPT-4 model to avoid bias against disabilities, leading to improved rankings in 37 out of 60 evaluations. However, improvements were minimal or non-existent for certain disabilities, like autism and depression.

Of the six disabilities examined—deafness, blindness, cerebral palsy, autism, and the general term “disability”—five showed improvement. Still, only three of the disability-related resumes ranked higher than those that did not mention disability at all.

The research team shared these findings on June 5 at the 2024 ACM Conference on Fairness, Accountability, and Transparency in Rio de Janeiro. They emphasized that even with detailed instructions and further training, biases can persist. All users need to be mindful of this throughout the hiring process to ensure fair treatment of all candidates, including those with disabilities.

Importance of Awareness and Further Research

GPT-4 can try to defend its rankings, but its responses show explicit and implicit ableism, making things worse. For example, it mentioned that a candidate with depression had “additional focus on DEI and personal challenges,” which “detract from the core technical and research-oriented aspects of the role.”

Senior author Jennifer Mankoff, a professor at the UW Allen School, believes that in a fair world, a resume showcasing enhanced skills should always be ranked first. She argues that a candidate who has been recognized for their leadership skills, for example, should be ranked ahead of someone with a similar background but without such recognition.

AI tool users must be aware of the inherent prejudice these programs potentially operate under. Despite customization efforts, biases can persist, and users need to be mindful of this when using such tools for tasks like hiring.

Further research is needed to identify and mitigate AI prejudgments comprehensively. This includes testing other AI systems, exploring the intersection of disability bias with gender and race, and investigating whether more extensive customization can consistently reduce favoritism.

Addressing AI Discrimination in Hiring Practices

Researchers say that disabled jobseekers face discrimination whether or not AI is used for hiring. They want to do more research to identify and fix AI unfairness. This includes testing other systems like Google’s Gemini and Meta’s Llama, considering more disabilities, studying how favoritism against disabilities intersects with gender and race, and exploring if more customization can reduce discrimination toward disabilities. It also includes checking if the base version of GPT-4 can be made less ableist.

Mankoff, one of the researchers in the above-mentioned study, said, “It is so important that we study and document these biases. We’ve learned a lot from and will hopefully contribute back to a larger conversation — not only regarding disability but also other minoritized identities — around making sure technology is implemented and deployed in ways that are fair.” The research was funded by the National Science Foundation, UW’s Center for Research and Education on Accessible Technology and Experiences (CREATE), and Microsoft.

Conclusion

Biases in AI, such as ChatGPT, are a big concern because people use them in a vast array of industries, ranging from simple tasks like creating niche social media posts to discerning candidates in a hiring process. Efforts to reduce its ableism and favoritism are underway, like the current development of accessible software for visually impaired users. But there is still so much work we have to do, especially because it is an interesting tool with a versatile free version available to all ages. Hence, using the right technology can help us to effectively govern data and AI. Also, by including regular AI audits in well-thought-out governance policies, your recruiters will be able to spot problematic areas more quickly. Continued research, awareness, and thoughtful implementation will pave the way to a fair and equitable future for all AI users, including our peers with disabilities.