OpenAI and Gemini logos next to a humanoid robot reading a book
ARGENTINA

Five artificial intelligences are challenged to read like humans: the winner was not ChatGPT

The Washington Post tested five AIs with real texts. Only one was consistent and outperformed the others in all areas

How well can a chatbot understand what it reads? To find out, a team from the Washington Post tested five of the leading AI bots on the market.

They analyzed everything from novels and scientific papers to political speeches and legal contracts. The results brought surprises among the world's most widely used virtual assistants.

An autonomous artificial intelligence with lights and cables connected to its head.
They put five of the leading AI bots on the market to the test | La Derecha Diario

Can AI really understand what it reads?

AI bots promise to be reading superpowers: they summarize contracts, books, or research just by uploading a file. But do they really understand what they're reading, or are they just imitating comprehension?

To answer that question, the Washington Post organized a test with the five most popular chatbots: ChatGPT, Claude, Copilot, Meta AI, and Gemini.

  • Four types of text were used: literature, medical science, legal contracts, and political speeches.
  • The texts were evaluated by experts in each field.
  • They formulated 115 questions to analyze comprehension, critical analysis, and accuracy.
A pensive humanoid robot in front of a whiteboard full of complex mathematical formulas represents the dangers of Artificial Intelligence according to former Google CEO Eric Schmidt.
Can an AI really understand what it reads? | La Derecha Diario

Literature: many failed when reading a historical novel

In the literary area, the bots performed poorly. Only Claude got all the key facts from the book right, while ChatGPT provided the best overall summary, although it omitted characters and themes such as slavery.

Gemini was the worst. The book's author compared it to the "Seinfeld" character who watched the movie instead of reading the novel.

A mobile phone displays the ChatGPT page in the foreground, with the OpenAI logo and name blurred in the background.
ChatGPT made the best overall summary | La Derecha Diario

Legal contracts: Claude stood out again

According to Sterling Miller, a corporate lawyer, Claude was the only one who understood the most important clauses well. It even proposed useful improvements and detected details that other bots ignored.

Meanwhile, ChatGPT and Meta AI summarized key parts in a single line, something Miller described as "useless."

Medical research: high performance

All five bots showed an acceptable level when reading scientific papers, perhaps because studies have predictable structures and human-written summaries.

Text
Claude received the highest score (10/10) for explaining a paper on persistent COVID | La Derecha Diario

Claude received the highest score (10/10) for explaining a paper on long COVID. It was clear, technical, and useful for doctors. In contrast, Gemini left out essential parts of the study on Parkinson's.

Politics: ChatGPT understood Trump better

Donald Trump's speeches were the biggest challenge in terms of critical analysis. ChatGPT achieved the best balance between context and accuracy.

Blond-haired man in a blue suit and red tie walking on a red carpet accompanied by other people in a dark hallway
ChatGPT understood Trump better | La Derecha Diario

Copilot, although technically correct, didn't capture the tone of the speeches.

Claude was the most consistent and took first place

Overall, Claude achieved the best performance. It was the only one that stood out in both scientific analysis and legal writing, and it maintained consistent responses.

Unlike other bots that summarized poorly or ignored key parts, Claude proved to be more complete and accurate. According to the judges, it came closest to being a good real assistant.

Comparative table of scores for five chatbots, with Claude having the highest score, followed by ChatGPT, Gemini, Copilot, and Meta AI.
In the overall balance, Claude achieved the best performance | La Derecha Diario

Can we trust these bots to read for us?

Claude and ChatGPT proved to be the most capable, but no bot exceeded 70% overall accuracy. All of them, to a greater or lesser extent, omitted key data or caused misleading answers.

While they can be useful as reading assistants, they still don't replace human comprehension. Many times, it's clear that "the robot hides behind a human mask."

➡️ Argentina

More posts: