Thursday, September 25, 2025

Hallucinations or Lies

My hallucinations by August Natterer

Since ChatGPT appeared in late 2022, I have been warning that the answers provided by Large Language Models (LLM; I refuse to call these tools Artificial Intelligence) are unreliable and should be treated with the utmost caution. Often, these answers seem plausible and are written linguistically correctly, but they are false. These types of answers have been called hallucinations.

This is not surprising. It is a logical consequence of the algorithm used by these programs, which I described in another post in this blog, which I simulated by means of a program with only 18 instructions. The algorithm works by adding words extracted from the most frequent ones that follow the previous words, chosen from billions of files taken from the Internet. It is evident (just think about it) that this algorithm cannot guarantee that the answers these tools provide are true.

I've used these tools several times to solve the following kind of problem: I know the plot of a literary work, a story or a novel, but I can't remember the title or the author (or both), and I want the tool to help me find them. I formulate the question by describing the plot and add the author's name if I know it. When the tool offers me an answer, I investigate it to see if it's true. So far, in 100% of the cases in which I've posed a problem of this type, the answer has been hallucinatory. Or rather, false.

Julio Verne

Let me describe the latest case: I wanted to remember the title of a Jules Verne novel I read years ago, where an expedition attempts to witness a solar eclipse. I provided this description to GEMINI, Google's LLM, that replied that the novel in question is The Green Ray. Since I knew the answer was false, because I remember well the plot of The Green Ray and there is no eclipse, I asked the question again, changing the phrasing, to see if GEMINI would give me a different answer. It did; this time it told me the novel in question was The Chase of the Golden Meteor. Since I haven't read this novel by Verne, I knew from the start that the answer could be false, but just in case Verne had written two novels with similar plots, I looked up the description of this novel on Wikipedia, which confirmed that it doesn't feature a solar eclipse.

I then decided to do my own research. Since Wikipedia lists the titles of all of Verne's works (68 Voyages Extraordinaires and a few others), I eliminated the works whose plots I knew well and those I hadn't read, and from those that remained, I chose the one that seemed most likely. I got it right the first time. The work in question is The Fur Country. But GEMINI's failure was absolute, as has been the failure of all these tools, whenever I've asked them a question of this type.

It might be said that my user experiences aren't statistically significant. True. But it turns out that OpenAI, the creator of ChatGPT, has conducted an in-depth study on the subject and concluded that hallucinations are mathematically inevitable, not just engineering flaws that could be resolved by improving the programs. See this recent article in COMPUTERWORLD, which complements this older article in the same journal, which points out that, in addition to hallucinations, these tools cheat in various circumstances and refuse to admit they're lying when caught. The title is telling: You thought genAI hallucinations were bad? Things just got so much worse.

News of this kind, which warns of the dangers of using LLM-type tools and blindly believing their answers, is mixed with exaggerated news stories that loudly announce that these tools will soon pave the way for generalized artificial intelligence, that is, machines as intelligent (or more so) than humans. We experts tend to deny that this will happen. Some go so far as to say that LLM research may even be detrimental to that other goal, which for many won't even be possible, at least not in the near future.

Meanwhile, news arrives that an LLM has just been appointed minister in Albania. It seems that the stupidity and incompetence of today's politicians know no bounds.

Speaking about the Turing Test, Evan Ackerman wrote this in 2014, in IEEE Spectrum:

The problem with the Turing Test is that it’s not really a test of whether an artificial intelligence program is capable of thinking: it’s a test of whether an AI program can fool a human. And humans are really, really dumb. We fall for all kinds of tricks that a well-programmed AI can use to convince us that we’re talking to a real person who can think.

Unfortunately, as time goes by, we must agree with him.

The same post in Spanish

Thematic Thread about Natural and Artificial Intelligence: Previous Next

Manuel Alfonseca

No comments:

Post a Comment