Translate this page


What do chatbots, voice assistants, and predictive text have in common? They all use computer programs called language models. Large language models are new kinds of models that can only be built using supercomputers. They work so well that it can be hard to tell if something was written by a person or by a computer! 

We wanted to understand how a large language model called GPT-3 worked. But we wanted to know more than whether GPT-3 could answer questions correctly. We wanted to know how and why. We treated GPT-3 like a participant in a psychology experiment. Our results showed that GPT-3 gets a lot of questions right. But we also learned that GPT-3 gets confused very easily. And it doesn’t search for new information as well as people do. Knowing how and why large language models come up with wrong answers helps us figure out how to make even better versions in the future.

Share this article

About this article

Reading level
Scientific topic
Key words
NGSS standards
AP Environmental science topics
IB Biology topics
Scientific methods
Type of figure
Location of research
Scientist Affiliation
Publication date
November 2023

Looking for something else?

Check out the blackboard video version of this article.

Let us know what you think!