The Law of Large Numbers & Central Limit Theorem:
A Polling Simulation
TruthIsAll
WHO SHOULD READ THIS?It's for everyone who voted in 2004 or plans to vote in 2006.
It's for those who say: "Math was my worst subject in high school".
If you've ever placed a bet at the casino or race track,
or played the lottery, you already know the basics.
It's about probability.
It's about common sense.
It's not all that complicated.
It's for individuals who have taken algebra, probability and
statistics and want to see how they apply to election polling.
It's for graduates with degrees in mathematics, political science,
an MBA, etc. who may or may not be familiar with simulation concepts.
It's for Excel spreadsheet users who enjoy creating math models.
Simulation is a powerful tool for analyzing uncertainty.
Like coin flipping and election polling.
It's for writers, blogs and politicians who seek the truth:
Robert Koehler, Brad from BradBlog, John Conyers, Barbara Boxer,
Mark Miller, Fitrakis, Wasserman, USCV, Dopp, Freeman, Baiman, Simon,
Scoop's althecat, Krugman, Keith Olberman, Mike Malloy, Randi Rhodes,
Stephanie Miller, etc.
It's for Netizens who frequent Discussion Forums.
It's for those in the Media who are still waiting for editor approval
to discuss documented incidents of vote spoilage, vote switching and
vote suppression in recent elections and which are confirmed by
impossible pre-election and exit poll deviations from the recorded vote.
It's for naysayers who promote faith-based hypotheticals in their
unrelenting attempts to debunk the accuracy of the pre-election
and exit polls.
People forget Selection 2000. Gore won the popular vote by 540,000.
But Bush won the election by a single vote.
SCOTUS voted along party lines: Bush 5, Gore 4.
That stopped the Florida recount in its tracks.
Gore won Florida. Why did they do it?
And why did the "liberal" media say he lost?
But Gore voters did not forget 2000.
So in 2004, they came out to vote in droves.
Yet the naysayers claim Gore voters forgot that they voted for him
and told the exit pollsters that they voted for Bush in 2000.
It's the famous "false recall" hypothetical.
The naysayers were forced to use it when they could not come up
with a plausible explanation for the impossible weightings of
Bush and Gore voter turnout in the Final National Exit poll.
Put on the defoggers.
We had enough disinformation
We had enough obfuscation.
Now we will let the sunshine in.
This is a review of the basics.
________________________________________________________________________
A COIN-FLIP EXPERIMENTConsider an experiment:
Flip a fair coin 10 times.
Calculate the percentage of heads.
Write it down.
Increase it to 30.
Calculate the new total percentage.
Write it down.
Keep increasing the number of flips...
Write down the percentage for 50.
Then do it for 80.
Stop at 100.
That's our final coin flip sample-size.
When you're all done, check the percentages.
Is the sequence converging to 50%?
That’s the true population mean (average).
That's the Law of Large Numbers.
The coin-flip is easily simulated in Excel.
Likewise, in the polling simulations which follow,
we will analyze the result of polling experiments
over a range of trials (sample size).
_____________________________________________________
THE POLLING CONTROVERSYNaysayers have a problem with polls.
Especially when a Bush is running.
Regardless of how many polls or how large the samples,
the results are never good enough for them.
They prefer to cite their two famous, unproven hypotheticals:
Bush non-responders (rBr) and Gore voter memory lapse ("false recall").
How do pollsters handle non-responders?
Simple.
They just... increase the sample-size!
Furthermore, statistical studies indicate that there is no
discernible correlation between non-response rates and survey results
How do pollster's handle false recall?
Simple.
They know that in a large sample, forgetfullness on the part
of Gore and Bush voters... will cancel each other out!
There's no evidence that Gore voters forget any more than Bush voters.
On the contrary.
If someone you knew robbed you in broad daylight,
would you forget who it was four years later?
Gore was robbed in 2000.
They claim that polling bias favored Kerry
in BOTH the pre-election AND exit polls.
They offer no evidence to back up these claims.
In fact, National Exit Poll data shows a pro-Bush bias.
They maintain that the polls are not random-samples.
Especially when Bush is involved.
_____________________________________________________
THE MARGIN OF ERROR (MOE)Naysayers ignore the fact that each poll has a Margin of Error (MoE).
Are we to ignore the MoE provided by a professional pollster?
The MoE is the interval on either side of the Polling Sample mean
in which there is a 95% confidence level (probability) of containing
the TRUE Population Mean.
Here is an example:
Assume a poll with a 2% MoE and Kerry is leading Bush by 52-48%.
Then there is a 95% probability that Kerry's TRUE vote is in the range
from 50% to 54% {52-MoE, 52+MoE}.
Futhermore, the probability is 97.5% that Kerry's vote will exceed 50%.
Here is the standard formula that ALL pollsters use to calculate MoE:
MoE = 1.96 * sqrt(p*(1-p)/n) * (1+CF)
where
n is the sample size.
p and 1-p are the 2-party vote shares.
CF is an exit poll "cluster effect" factor (see the example below).
The MoE decreases as the sample-size (n) increases.
The poll becomes more accurate as we take more samples.
It's the Law of Large Numbers again.
Makes sense, right?
Remember the coin flips?
This result is not so obvious.
For a given sample size (n), the MoE is at it's maximum value
when p =.50 (the two candidates are tied).
To put it another way:
The more one-sided the poll, the smaller the MoE.
In the 50/50 case, the formula can be simplified:
MoE = 1.96 * .5/sqrt(n) =.98/sqrt(n)
Let's calculate the MoE for the 12:22am National Exit poll.
n = 13047 sampled respondents
p = Kerry's true 2-party vote share = .515
1-p = Bush's vote share = .485
MoE = 1.96 * sqrt (.515*.485/13047)= .0086 = 0.86%
Adding a 30% exit poll cluster effect:
MoE = 1.30*0.86% = 1.12%
The cluster effect is highly controversial.
We can only make a rough estimate of its impact on MoE.
The higher the cluster effect, the larger the MoE.
But cluster is only a factor in exit polls.
There is no MoE adjustment in pre-election or approval polls.
Why would a polling firm include the MoE if the poll was
not designed to be an effective random sample?
Pollsters use proven methodologies, such as cluster sampling,
stratified sampling, etc. to attain a near-perfect random sample.
________________________________________________________________
THE MATHEMATICAL FOUNDATIONThis model demonstrates the Law of Large Numbers (LLN).
LLN is the foundation and bedrock of statistical analysis.
The model illustrates LLN through a simulation of polling samples.
In a statistical context, LLN states that the mean (average)of a
random sample taken from from a large population is likely
to be very close to the (true) mean of the population.
Start of math jargon alert...
In probability theory, several laws of large numbers say that
the mean (average) of a sequence of random variables with
a common distribution converges to their common mean as
the size of the sequence approaches infinity.
The Central Limit Theorem (CLT) is another famous result:
The sample means (averages) of an independent series of
random samples (i.e. polls) taken from the same population
will tend to be normally distributed (form the bell curve)
as the number of samples increase.
This holds for ALL practical statistical distributions.
End of math jargon alert....
It's really not all that complicated.
The naysayers never consider LLN or CLT.
They would have us believe that professional pollsters are
incapable of creating accurate surveys (i.e. effectively random
samples) through systematic, clustered or stratified sampling.
Especially when a Bush is running.
LLN and CLT say nothing about bias.
__________________________________________________________________
USING RANDOM NUMBERS TO SIMULATE A SEQUENCE OF POLLSRandom number simulation is the best way to illustrate LLN:
These are the steps:
1) Assume a true 2-party vote percentage for Kerry (i.e. 51.5%).
2) Simulate a series of 8 polls of varying sample size.
3) Calculate the sample mean vote share and win probability for each poll.
4) Confirm LLN by noting that as the poll sample size increases,
the sample mean (average) converges to the population mean ("true" vote).
It's just like flipping a coin.
Let Kerry be HEADS, with a 51.5% chance of winning a random voter.
This is Kerry's TRUE vote (the population mean)
Bush is TAILS with a 48.5% chance.
A random number (RN) between zero and one is generated for each respondent.
If RN is LESS than Kerry's TRUE share, the vote goes to Kerry.
If RN is GREATER than Kerry's TRUE share, the vote goes to Bush.
For example, assume Kerry's TRUE 51.5% vote share (.515).
If RN = .51, Kerry's poll count is increased by one.
If RN = .53, Bush's poll count is increased by one.
The sum of Kerry's votes is divided by the poll sample (i.e. 13047).
This is Kerry's simulated 2-party vote share.
It approaches his TRUE 51.50% vote share as poll samples increase.
Once again, the LLN applies as it did in the coin flip experiment.
________________________________________________________________
SIMULATION GRAPHICSThese graphs are a visual summary of the simulation.
________________________________________________________________
RUNNING THE SIMULATIONPress F9 run the simulation
Watch the numbers and graphs change.
They should NOT change significantly.
The graphs illustrate polling simulation output for:
Kerry's 2-party vote (true population mean): 51.50%
Exit Poll Cluster effect (zero for pre-election):30%
The exit poll "cluster effect" is the incremental adjustment
to the margin of error in order to account for the clustering
of individuals with similar demographics at the exit polling site.
Play what-if:
Lower Kerry's 2-party vote share from 51.5% to 50.5%.
Press F9 to run the simulation.
Kerry's poll shares, corresponding win probabilities and
minimal threshold vote (97.5% confidence level), all DECLINE,
reflecting the lowering of his "true vote".
________________________________________________________________
POLLING SAMPLE-SIZEJust like in the above coin-flipping example, the
Law of Large Numbers takes effect as poll sample-size increases.
That's why the National Exit Poll was designed to
survey at least 13000 respondents.
Note the increasing sequence of polling sample size as we go
from the pre-election state (600) and national (1000) polls
to the state and National exit polls:
Ohio (1963), Florida (2846) and the National (13047).
Here is the National Exit Poll Timeline:
Updated ; respondents ; vote share
3:59pm: 8349 ; Kerry led 51-48
7:33pm: 11027 ; Kerry led 51-48
12:22am:13047 ; Kerry led 51-48
1:25pm: 13660 ; Bush led 51-48
The final was matched to the vote.
So much for letting LLN and CLT do their magic.
Especially when a Bush is running.
________________________________________________________________
CALCULATING PROBABILITIESThe Kerry win probabilities are the main focus of the simulation.
They closely match theoretical probabilities obtained from
the Excel Normal Distribution function.
The probabilities are calculated using two methods:
1) running the simulation and counting Kerry's total polling votes.
2) calculating the Excel Normal Distribution function:
Prob = NORMDIST(PollPct, 0.50, MoE/1.96, true)
The simulation shows that given Kerry's 3% lead in the 2-party vote
(12:22am National Exit Poll), his popular vote win probability
was nearly 100%. And that assumes a 30% exit poll cluster effect!
For a 2% lead (51-49), the win probability is 97.5% (still very high).
For a 1% lead (50.5-49.5), it's 81% (4 out of 5).
For a 50/50 tie, it's 50%. Even money. Makes sense, right?
The following probabilities are also calculated for each poll:
1) The 97.5% confidence level for Kerry's vote share.
There is a 97.5% probability that Kerry's true vote will be greater.
The minimum vote share increases as the sample size grows.
2) The probability of Bush achieving his recorded two-party vote (51.24%).
The probability is extremely low that Bush's actual vote would deviate
from his true 48.5% two-party share.
The probability declines as the sample size grows.
________________________________________________________________
DOWNLOADING THE EXCEL MODELWait one minute for the Excel model download.
It's easy.
Just two inputs -
Kerry's 2-party true vote share (51.5%) and
exit poll cluster effect (set to 30%).
Press F9 to run the simulation.
http://us.share.geocities.com/electionmodel/MonteCarloPollingSimulation.xls
Or go here for a complete listing of threads from
TruthIsAll: www.TruthIsAll.net
________________________________________________________________