Polls and opinion
surveys often predict results that never happen. Is there a scientific reason that
can explain it? I think so. The problem could be that the mathematical theories
behind the polls are misapplied.
A branch of
statistics is called sample theory.
It was invented to solve the problem of estimating whether the products of a
factory are well made or defective, without having to analyze them one by one,
which would be too costly.
Let us say,
for example, that a factory produces one million screws
a day. In theory they should be checked one by one, but since
that is impossible, only one part is analyzed. Which part? This is what sample theory
tries to solve.
Suppose we
analyze just 2000 screws, and find that one of them is defective (0.05%). Can
we extend this result to the million screws and assert that in that population
there will be approximately 500 defective
screws?
There is a
theorem of sample theory that computes the confidence we can have in the
assertion that the result of the sample applies to the whole of the population.
Interestingly, if certain conditions are met, with
a sample of 2000 “individuals,” regardless of the
population size, we can have 95% confidence that the results of
the analysis can be extended to the population. In other words, if we analyze
2000 screws, we can have 95% confidence that the result will apply to the
entire set of screws, regardless of whether there are one hundred thousand, one
million or ten million screws.
Electoral
polls often apply the theorems of sample theory without due consideration. If
we look at the technical data that come with these surveys, we will see that
they often say things like these:
Size of the population
surveyed: 2000 people.
Confidence coefficient: 95%.
But let us
look at the sentence highlighted in red two paragraphs above. What are the
conditions that must be met in order to apply the theorem? Essentially there
are two:
•
The
population
must be uniform.
•
The
sample
must be meaningful.
That the
population is uniform means that all the screws must be equivalent in
principle, that different sets are not mixed;
such as large screws with small screws.
That the
sample must be meaningful means that, before extracting the sample, we must mix well the million screws; otherwise we could
take a sample formed exclusively by screws produced by a concrete machine that
has a problem, or by a perfect machine, while none of those produced by other
machines would be analyzed. In such a case, the results of the analysis could
not be extended to the total population with the same confidence.
What
happens when the theorem is applied to a human population to predict the
outcome of an election?
- The most serious problem is that the population is not uniform. We know
very well that the votes of some people are worth much more than those of
others. In the U.S. elections, for instance, the constituency is the state.
Although states with a large population, such as New York or California, may
elect more representatives, each candidate requires more votes to be
elected than in states with less population, such as Oklahoma.
- Whether the
sample is significant depends on the survey being well-designed.
For instance, the respondents to a poll should be chosen from all states in
proportion to their populations. But that means that, in a sample of 2000
people, there will be very few from Oklahoma. Can the result of the
election in that state be predicted with 95% confidence level from such a
small sample? The simple and straightforward answer is that it can not.
- There is an additional problem: people are not screws. When a screw is
analyzed, it cannot lie; we can trust that the properties we detect
are real, unless we are using defective instruments to measure them.
Instead, people can lie, or they can refuse to tell whom they are
going to vote for. Pollsters take this into account, and apply corrections
to estimate the possible vote of those who do not want to give their
opinion. But can it be held that the degree of confidence is still the
same as stated by the theorem? The simple answer is again negative.
Manuel Alfonseca
Happy Christmas and New Year
We'll meet again in January
No comments:
Post a Comment