Wednesday, October 26, 2022

Information on the Internet is ephemeral

Manuel Calvo Hernando
I'll start with a personal anecdote:

  • At the end of the 1980s, I became a member of the Spanish Association of Scientific Journalism, which had been created in the early 1970s by the famous Spanish science popularizer Manuel Calvo Hernando, whose articles in a major newspaper I had been following since the 1960s. By then, I was writing many science popularization articles, which were published by another Spanish newspaper, La Vanguardia. Calvo Hernando was delighted to receive me at the Association.
  • Around the year 2000, this association changed its name to the Spanish Association for Scientific Communication (AECC in Spanish). These initials happened to be the same as those of the Spanish Association Against Cancer. Therefore, the acronym has recently been changed to AEC2.
  • In May 2008 I began to collaborate with the AECC blog, which was being organized by the professor of journalism Juan Carlos Nieto. Since then, until June 2019, I published 121 scientific popularization posts in this blog (one post per month), several of which were, during a decade, among the most visited posts in the blog.
  • In the discussions following their publication, some of my posts sparked the longest, most passionate debates and the greatest number of comments in the blog's history. As a result of one of those discussions, associated with one of my posts, which dealt with abortion, one of the members of the board of directors tried to have me expelled from the Association. He did not succeed, because the president at the time (Antonio Calvo Roy, the son of the founder) did not agree, although an official message was sent to me, enjoining me to put an end to that post’s discussion.
  • In April 2020, after thirty years of active participation, I decided to stop being a member of the AEC2. Immediately, all the posts I had written for that blog disappeared, along with every associated comment. No trace was left. Fortunately, almost all had been published in my own blog, so it’s still possible to find them. My website contains a list of all the posts published in both blogs, and although the links to my posts in the AEC2 blog are broken, I have decided to leave them there, so that they provide a historical record of their publication.

Information on the Web Wide Web is ephemeral. This is a well-known fact. Sometimes documents are missing for compelling reasons, such as possible copyright infringement. This has happened with several scientific articles, especially famous ones, that various universities made public, but later had to withdraw, because the magazines where these articles were originally published protested, as they intend to go on selling electronic copies, many decades after their publication. It also may happen that a web has been attacked and information has been lost. Or that a web has disappeared.

During the first years of this century, Internet links appeared in many books, usually in footnotes. Later, as many of those links stopped working after some time, it became fashionable to add the following clarification:

Link checked on xx/xx/20xx.

Meaning that the link is not guaranteed to work after the specified date.

Because of this, it might be inadvisable to put Internet links in books published in hard copy, or hypertext links in electronic publications, for it’s likely that they won’t work forever. Sometimes the specific document may be found in another address (as in the case of my posts for the AEC2 blog), but it’s usually more convenient to use a good search engine to locate it, if it is available.

Another figure frequently given is the global flow of data over the Internet, which has been growing hugely from the beginning. Thus, the first paragraph of the document Economy of data and artificial intelligence of the Spanish Ministry of Economic Affairs and Digital Transformation, dated November 2020, says this:

The volume of data generated in the world in 2018 is estimated at 33 zettabytes. The predicted estimation for 2025 is 175 zettabytes.

where one zettabyte is equal to 1021 bytes. But that is the total volume of data exchanged. Every transaction counts. Each time someone visits a web page, the corresponding flow of data is counted. However, most of these data are not stored by anyone, they are transmitted, read and lost, they are ephemeral. I don't know if this value is useful, except to estimate the total expenditure of energy on the Internet, which can also be deduced in other ways. In the parallel case of the human nervous system, I have never seen anyone bothering to estimate how many nerve discharges are produced per unit of time. If this value were available, I can’t see how it could be useful.

A different question is how many data are stored on the WWW. By 2021 there were 1.88 billion websites. From that value, the total information could be deduced, by estimating statistically the average in each website. A decade ago, the total amount had reached one exabyte (1018 bytes). Today it may be about one zettabyte (1021 bytes). On the other hand, almost 90% of the websites may be inactive (the figures vary depending on who gives them), so the actual total information available may not be that great.

The same post in Spanish

Thematic Thread about Natural and Artificial Intelligence: Previous Next

Manuel Alfonseca

No comments:

Post a Comment