
How they managed to manipulate ChatGPT with simple psychological tricks
Researchers proved that ChatGPT can be manipulated with simple tactics, exposing flaws in its security
Researchers at the University of Pennsylvania demonstrated that artificial intelligence chatbots like ChatGPT can be convinced to bypass their own rules. They used persuasion strategies based on psychological principles and achieved surprising results.
The work raised serious doubts about the resilience of safety filters in large language models. Even a system with limits designed to curb risky requests can be manipulated with simple prompts.

The psychology behind chatbots
The scientists applied seven persuasion techniques described by psychologist Robert Cialdini in his book Influence: The Psychology of Persuasion. These included authority, reciprocity, commitment, likability, and social proof.
The effect of each tactic depended on the query. For example, when a recipe for lidocaine was requested directly, the chatbot complied only 1% of the time. However, if it was first asked about a substance like vanillin, compliance rose to 100% due to the "commitment" principle.
How the manipulations were achieved
The same pattern was repeated with insults. The model almost never used the word "imbecile" directly, but if it was first asked to say "fool," the probability of escalating to the stronger insult increased to 100%.

It was also found that techniques such as flattery or peer pressure increased obedience. Telling it that "other AI models already do it" multiplied the chance of obtaining risky responses by 18.
A security problem that raises concern
Although the study focused on GPT-4o Mini, its conclusions raise doubts about the true strength of protections in artificial intelligence. For the authors, the fact that a chatbot can be manipulated with such basic tactics shows that security remains fragile.

Companies like OpenAI and Meta are constantly seeking to strengthen the limits of their systems. Nevertheless, the findings reveal that human persuasion techniques remain a huge challenge for AI.
More safety for minors in ChatGPT
Meanwhile, OpenAI announced new parental control features in ChatGPT. These allow parents to link accounts, restrict access, and receive alerts regarding risky activities. The goal is to provide a safer environment for teenagers and children who use the platform.
Adults will also be able to set time limits and review interaction history. With these measures, the company reinforces its commitment to digital safety and family protection.
More posts: