
Five artificial intelligences are challenged to read like humans: the winner was not ChatGPT
The Washington Post tested five AIs with real texts. Only one was consistent and outperformed the others in all areas
How well can a chatbot understand what it reads? To find out, a team from the Washington Post tested five of the leading AI bots on the market.
They analyzed everything from novels and scientific papers to political speeches and legal contracts. The results brought surprises among the world's most widely used virtual assistants.

Can AI really understand what it reads?
AI bots promise to be reading superpowers: they summarize contracts, books, or research just by uploading a file. But do they really understand what they're reading, or are they just imitating comprehension?
To answer that question, the Washington Post organized a test with the five most popular chatbots: ChatGPT, Claude, Copilot, Meta AI, and Gemini.
- Four types of text were used: literature, medical science, legal contracts, and political speeches.
- The texts were evaluated by experts in each field.
- They formulated 115 questions to analyze comprehension, critical analysis, and accuracy.

Literature: many failed when reading a historical novel
In the literary area, the bots performed poorly. Only Claude got all the key facts from the book right, while ChatGPT provided the best overall summary, although it omitted characters and themes such as slavery.
Gemini was the worst. The book's author compared it to the "Seinfeld" character who watched the movie instead of reading the novel.

Legal contracts: Claude stood out again
According to Sterling Miller, a corporate lawyer, Claude was the only one who understood the most important clauses well. It even proposed useful improvements and detected details that other bots ignored.
Meanwhile, ChatGPT and Meta AI summarized key parts in a single line, something Miller described as "useless."
Medical research: high performance
All five bots showed an acceptable level when reading scientific papers, perhaps because studies have predictable structures and human-written summaries.

Claude received the highest score (10/10) for explaining a paper on long COVID. It was clear, technical, and useful for doctors. In contrast, Gemini left out essential parts of the study on Parkinson's.
Politics: ChatGPT understood Trump better
Donald Trump's speeches were the biggest challenge in terms of critical analysis. ChatGPT achieved the best balance between context and accuracy.

Copilot, although technically correct, didn't capture the tone of the speeches.
Claude was the most consistent and took first place
Overall, Claude achieved the best performance. It was the only one that stood out in both scientific analysis and legal writing, and it maintained consistent responses.
Unlike other bots that summarized poorly or ignored key parts, Claude proved to be more complete and accurate. According to the judges, it came closest to being a good real assistant.

Can we trust these bots to read for us?
Claude and ChatGPT proved to be the most capable, but no bot exceeded 70% overall accuracy. All of them, to a greater or lesser extent, omitted key data or caused misleading answers.
While they can be useful as reading assistants, they still don't replace human comprehension. Many times, it's clear that "the robot hides behind a human mask."
More posts: