OpenAI logo on the left and a hand holding a phone with the ChatGPT screen open in front of a laptop on the right
ARGENTINA

How they managed to manipulate ChatGPT with simple psychological tricks

Researchers proved that ChatGPT can be manipulated with simple tactics, exposing flaws in its security

Researchers at the University of Pennsylvania demonstrated that artificial intelligence chatbots like ChatGPT can be convinced to bypass their own rules. They used persuasion strategies based on psychological principles and achieved surprising results.

The work raised serious doubts about the resilience of safety filters in large language models. Even a system with limits designed to curb risky requests can be manipulated with simple prompts.

ChatGPT home screen showing the capabilities section with partially visible text.
The study raised serious doubts about the resilience of the safety filters of large language models | La Derecha Diario

The psychology behind chatbots

The scientists applied seven persuasion techniques described by psychologist Robert Cialdini in his book Influence: The Psychology of Persuasion. These included authority, reciprocity, commitment, likability, and social proof.

The effect of each tactic depended on the query. For example, when a recipe for lidocaine was requested directly, the chatbot complied only 1% of the time. However, if it was first asked about a substance like vanillin, compliance rose to 100% due to the "commitment" principle.

How the manipulations were achieved

The same pattern was repeated with insults. The model almost never used the word "imbecile" directly, but if it was first asked to say "fool," the probability of escalating to the stronger insult increased to 100%.

Hands typing on a computer keyboard with a ChatGPT digital interface overlay and violet light in the background
How the manipulations were achieved | La Derecha Diario

It was also found that techniques such as flattery or peer pressure increased obedience. Telling it that "other AI models already do it" multiplied the chance of obtaining risky responses by 18.

A security problem that raises concern

Although the study focused on GPT-4o Mini, its conclusions raise doubts about the true strength of protections in artificial intelligence. For the authors, the fact that a chatbot can be manipulated with such basic tactics shows that security remains fragile.

Person holding a mobile phone displaying the ChatGPT homepage in the browser.
A security issue that raises concern | La Derecha Diario

Companies like OpenAI and Meta are constantly seeking to strengthen the limits of their systems. Nevertheless, the findings reveal that human persuasion techniques remain a huge challenge for AI.

More safety for minors in ChatGPT

Meanwhile, OpenAI announced new parental control features in ChatGPT. These allow parents to link accounts, restrict access, and receive alerts regarding risky activities. The goal is to provide a safer environment for teenagers and children who use the platform.

Adults will also be able to set time limits and review interaction history. With these measures, the company reinforces its commitment to digital safety and family protection.

➡️ Argentina

More posts: