By popculturegeek.com Originally posted to Flickr as Comic-Con 2004 Terminator statue, CC BY 2.0 https://commons.wikimedia.org |
As I said in a
previous article, automatic learning is one of the areas of weak artificial
intelligence which has been object of research for at least 40
years. Strictly speaking, rather than a field of application, automatic learning is a methodology
or technique used by other fields of application, such as neural networks, expert systems or data analysis. Automatic learning is
divided into two main branches:
- Supervised automatic learning, which has been
used most frequently up to now. This post is dedicated to explain it.
- Unsupervised automatic learning, related to the
field usually called Data Mining.
It has lately been widely advertised by the media in relation to a program
(AlphaGo
Zero) that, learning by itself, has reached a level comparable to the
world champion of the game called Go (at the end of this post I’ll talk
more about this).
Neural network with four layers |
To explain supervised
automatic learning, I’ll take as an example a specific expert system developed by means of this technique. This system
is beginning to be used in practice to help lower court judges to decide whether
they should (or shouldn’t) remand those accused of a crime into preventive
detention, taking into account the possibility that the accused may commit new
crimes if they are left in provisional freedom, and also the cost of preventive
prison for the public coffers (both criteria are opposed, because the more
accused are sent to prison, the less recidivism, but the greater the cost). The
procedure used to build the expert system, which I will explain here, was
devised over 30 years ago and is also applied in other fields, such as neural networks.
The automatic learning
system consists of two algorithms:
- An algorithm to solve the problem in question
(in our case, to advice that the defendant be remanded in custody or not)
in a deterministic way, based on a set of parameters (sometimes thousands)
whose concrete value is left open. Obviously, if this algorithm is not
well designed, the final system won’t work well.
- A second algorithm −called the learning
algorithm− whose objective is to adjust the parameters of the
first algorithm, those whose value was left unspecified, in such a way
that the system works in the best possible way.
- To help the second algorithm to adjust the
parameters of the first algorithm, a very large set of real cases is available. In the expert system for
judges, there were hundreds of thousands. All those cases took place
actually, at some point in time, before a human judge, who made a
decision, and there is also information about what were the consequences
(if a defendant was released, whether there was a relapse or not, during the
provisional release) along with the personal data of the accused and their
record.
- The available historical cases are divided into
two groups: the training cases, which are provided to the
first algorithm together with the actual result, so that the second
algorithm can adjust the optimal values of the parameters in the first in
such a way that the number of cases whose result was correctly predicted be
as large as possible. The second group are the validation cases. Once the
parameters of the first algorithm have been adjusted, this algorithm is
used by itself on the new cases without knowing the actual result in real
life, to see whether the results it predicts are comparable to the real
ones. If this is satisfactory, the first algorithm (in our case the expert
system to assist judges) can be considered complete and will be used in
practice, unlinked from the learning algorithm, which is no longer needed.
If the result is not acceptable, it will be necessary to start all over
again using different algorithms, either the solution algorithm or the learning
algorithm or both. There are many types of learning algorithms, although
none is better than the others in all possible cases, as proved by the no-free-lunch
theorem.
This type of learning
is called supervised because
the parameter adjustment takes place starting from a set of cases whose
solution is known. In the case of a neural network, the parameters are the
weights of all the connections in the network.
Let us now look at the
AlphaGo Zero
program, which recently reached a very high level in the Go game. What is the
difference in this case with respect to supervised learning?
- First,
the two algorithms, execution and learning, are joined into one.
- Secondly,
rather than starting from a set of training data, the program
automatically generates them by playing against itself. That is precisely
why it is called unsupervised learning.
The achievement −which
is important− has been presented in the media as the beginning of a revolution
in automatic learning procedures. Keep in mind, however, that the field of
computer games is very appropriate for this type of algorithms. In the first
place, the result of each specific case is straightforward (the game is won or lost) and the
training cases can be generated automatically in a simple way, making the
program play against itself.
It is to be expected that
similar programs will appear, specialized in different games (perhaps chess?).
But it is clear that this procedure cannot be applied to more real cases, such
as the expert system for judges. How could the program generate its own cases,
and how could it know what was the practical outcome of the decision? No way.
What we have here is a new learning procedure, which can only be applied in
very specific and determined circumstances. The media, as usual, are counting
their chickens before they are hatched.The same post in Spanish
Thematic thread on Natural and Artificial Intelligence: Preceding Next
Manuel Alfonseca
No comments:
Post a Comment