![]() |
My hallucinations by August Natterer |
Since ChatGPT appeared in late 2022, I have been warning that the answers provided by Large Language Models (LLM; I refuse to call these tools Artificial Intelligence) are unreliable and should be treated with the utmost caution. Often, these answers seem plausible and are written linguistically correctly, but they are false. These types of answers have been called hallucinations.
This is not surprising. It is a logical consequence of the algorithm used by these programs, which I described in another post in this blog, which I simulated by means of a program with only 18 instructions. The algorithm works by adding words extracted from the most frequent ones that follow the previous words, chosen from billions of files taken from the Internet. It is evident (just think about it) that this algorithm cannot guarantee that the answers these tools provide are true.
I've used these tools several times to solve the
following kind of problem: I know the plot of a literary work, a story or a
novel, but I can't remember the title or the author (or both), and I want the
tool to help me find them. I formulate the question by describing the plot and
add the author's name if I know it. When the tool offers me an answer, I
investigate it to see if it's true. So far, in 100% of the cases in which I've
posed a problem of this type, the answer has been hallucinatory. Or rather, false.
![]() |
Julio Verne |
Let me describe the latest case: I wanted to
remember the title of a Jules Verne novel I read years ago, where an expedition
attempts to witness a solar eclipse. I provided this description to GEMINI,
Google's LLM, that replied that the novel in question is The Green
Ray. Since I knew
the answer was false, because I remember well the plot of The Green
Ray and there is
no eclipse, I asked the question again, changing the phrasing, to see if GEMINI
would give me a different answer. It did; this time it told me the novel in
question was The Chase of the Golden Meteor. Since I haven't read this novel by Verne, I knew
from the start that the answer could be false, but just in case Verne had
written two novels with similar plots, I looked up the description of this
novel on Wikipedia, which confirmed that it doesn't feature a solar eclipse.
I then decided to do my own research. Since
Wikipedia lists the titles of all of Verne's works (68 Voyages Extraordinaires and a few others), I eliminated the works whose
plots I knew well and those I hadn't read, and from those that remained, I
chose the one that seemed most likely. I got it right the first time. The work
in question is The Fur Country. But GEMINI's failure was absolute, as has been the failure of all
these tools, whenever I've asked them a question of this type.
It might be said that my user experiences aren't
statistically significant. True. But it turns out that OpenAI, the creator of
ChatGPT, has conducted an in-depth study on the subject and concluded that hallucinations
are mathematically inevitable, not just engineering flaws that could be resolved by improving the programs.
See this
recent article in COMPUTERWORLD, which complements this
older article in the same journal, which points out that, in addition to
hallucinations, these tools cheat in various circumstances and refuse to admit
they're lying when caught. The title is telling: You thought genAI hallucinations were
bad? Things just got so much worse.
News of this kind, which warns of the dangers of
using LLM-type tools and blindly believing their answers, is mixed with exaggerated
news stories that loudly announce that these tools will soon pave the way for
generalized artificial intelligence, that is, machines as intelligent (or more
so) than humans. We experts tend to deny that this will happen. Some go so far
as to say that LLM research may even be detrimental to that other goal, which
for many won't even be possible, at least not in the near future.
Meanwhile, news arrives that an LLM has
just been appointed minister in Albania. It seems that the stupidity and incompetence of
today's politicians know no bounds.
Speaking about the Turing Test, Evan Ackerman wrote
this in 2014, in IEEE
Spectrum:
The problem with the Turing Test is that it’s not really a test
of whether an artificial intelligence program is capable of thinking:
it’s a test of whether an AI program can fool a human. And humans are really,
really dumb. We fall for all kinds of tricks that a well-programmed AI can use
to convince us that we’re talking to a real person who can think.
Unfortunately, as time goes by, we must agree with
him.
Thematic Thread about Natural and Artificial Intelligence: Previous Next
Manuel Alfonseca
No comments:
Post a Comment