Popular Science: Some problems in Automatic Natural Language Generation

Alan Turing

ChatGPT and similar tools have more than met the Turing test, for they are capable of fooling many human beings (I don’t know how many, but certainly more than 30%) into believing that there is a mind behind such simple algorithms. But, quoting Evan Ackerman (Senior Editor of IEEE Spectrum):

The problem with the Turing Test is that it’s not really a test of whether an artificial intelligence program is capable of thinking: it’s a test of whether an AI program can fool a human. And humans are really, really dumb.

In a previous post I explained the algorithm used by ChatGPT and similar tools to generate human-looking text. It is very simple. A similar (although longer) explanation can be seen in this Wolfram article. As I explained in another post, this algorithm, together a huge amount of data contained in a huge neural network that stores information and has no intelligence, is capable of generating texts that look very similar to those written by a human being.

The article by Wolfram mentioned above denies, as I do, that these tools are intelligent. He says that their true usefulness is having proved that human languages have unknown properties that make it possible for these tools to generate human-looking texts. Here I disagree with Wolfram. I believe it’s not necessary to resort to strange properties of languages; that the use of an algorithm as simple as the one we both have explained makes it inevitable that the texts produced emulate those of human origin. I’ll go further: the texts ChatGPT generates are actually human texts cleverly put together.

In a comment to my last-mentioned blog post, a reader pointed out that ChatGPT had generated a perfect Latin-to-Spanish translation of Fedro’s fable The Fox and the Grapes, better than the version generated by Google Translate. When I consulted an expert, he explained that the translator actually translates the text that is offered, while ChatGPT, when generating the answer to the question "translate such text from Latin to Spanish", resorts to translations of human origin present in the training data included in its memory, which probably include part of the World Wide Web. Of course, the element of randomness in the algorithm makes it possible, not just using a single text, but jumping from one to another when both refer to the same thing. Thus, in the translation of the fable, I have found in the WWW the possible origin of a couple of phrases.

We know that the algorithm chooses, from the information used to train the neural network, a word that can follow those it already has got. If what has been accumulated comes from a specific text, there is a high probability that the next word will also belong to the same text. If this text carries copyright, the answer of the tool will be recognizable and will violate those rights. This has already happened. In fact, many copyrighted texts are known to have been used to train ChatGPT’s neural network, and some authors (and the New York Times) have filed lawsuits against OpenAI (the parent company of ChatGPT) for violation of copyright.

Given this situation, Business Insider reports that OpenAI, with the help of Microsoft and Google, which also have tools of this type, are trying to blame final users, rather than the parent company or the programmers, when generative AI tools show copyrighted material, if the answer is published.

If this were to happen, it would mean that anyone who used one of these tools to generate text and wanted to make it public, should investigate first whether any of its parts violate copyright. Otherwise, they would be prone to litigation. Which could result in less people daring to use ChatGPT and kin.

On the other hand, the announcement of ChatGPT caused alarm in some sectors, such as teachers, who saw a threat to their way of grading their students: if they gave them an assignment, there was the possibility that the response had been prepared by ChatGPT. In January 2023, OpenAI made publicly available a tool that supposedly distinguished whether a text was of human origin or had been artificially generated. The tool was withdrawn after a few days, because it was as reliable as flipping a coin.

It seems to me that this has an explanation: as the texts generated by the tools are actually human texts that have been manipulated, it is difficult for other tools to discover their origin. Modern versions focus more on style. For example, if the user makes a correction to a ChatGPT response, the next answer always begins by apologizing and agreeing with the user. This, and other stylistic properties, can be used to detect the origin of some of the texts.

But they are not fully reliable, for I have seen a case where one of these tools, given a text that began with a typical ChatGPT phrase and continued with a paragraph extracted from Wikipedia, concluded that the first part was of human origin, while the Wikipedia paragraph was artificial. In other words, the conclusion was the opposite of the truth. I have observed that texts from Wikipedia tend to be considered artificial by these tools, perhaps because ChatGPT makes extensive use of them.

The same post in Spanish

Thematic Thread about Natural and Artificial Intelligence: Previous Next

Manuel Alfonseca

Popular Science

Thursday, January 4, 2024

Some problems in Automatic Natural Language Generation

No comments:

Post a Comment

Follow by email