Once again, don't take it from me. Here are related paragrpahs from Mark Blumenthal, the Mystery Pollster, the guy who squashed Steven Freeman and caused him to redo his simpleton analysis. This is the link:
http://www.mysterypollster.com/main/2004/12/what_is_the_sam.htmlSampling Error in Exit Polling
"Unfortunately, calculating the margin of error gets a lot more complicated for an exit poll. The reason is that exit polls use “cluster sampling.” Since it is wildly impractical to station interviewers at every polling place (the U.S. has roughly 225,000 voting precincts), exit pollsters sample in two stages: They first randomly select precincts within a state, then randomly intercept voters as they exit the selected polling places. Unfortunately, this clustering of interviews within precincts adds variability and error. Generally speaking, the additional error is a function of the number of clusters and the variability across clusters of the thing being measured. Error will INCREASE in a cluster sample (as compared to error for simple random sampling) as...
1)The number of clusters decreases relative to the number of interviews or
2) The thing being measured differs across clusters (or precincts)
(12/14: Clarification added in italics above. Another way of saying it: Error will increase as the average number of interviews per cluster increases).
Here is the way the NEP methodology statement describes the potential for error from clustering:
If a characteristic is found in roughly the same proportions in all precincts the sampling error will be lower. If the characteristic is concentrated in a few precincts the sampling error will be larger. Gender would be a good example of a characteristic with a lower sampling error. Characteristics for minority racial groups will have larger sampling errors.
Another problem is that calculating this additional error is anything but straightforward. To estimate the additional error produced by clustering, statisticians calculate something called a “design effect.” The calculation is hard for two reasons: First, it can differ greatly from question to question within the same sample. Second, in order to calculate the design error, you need to know how much the thing being measured varies between clusters. Thus, as this website by the British market research MORI explains, “it is virtually impossible to say what
‘between-cluster’ variability is likely to be until one has actually conducted the study and collected the results” (emphasis added). Another academic web site explains that an estimate of the design effect calculated before data is collected must be “based on past survey experience as statistical literature provides little guidance”
A few weeks ago, I posted my reaction to a paper by Steven Freeman widely circulated on the Internet. Freeman did his own calculations of the significance of exit polls data that ignored the greater rate of error for cluster samples. To make this point, I cited an article that had calculated the design effect for the 1996 exit polls by two analysts associated with VNS (the forerunner to NEP). They had estimated that the clustering “translates into a 30% increase in the sampling error” as compared to simple random sampling. It was the only discussion of the design effect applied to exit polls that I could find in a very quick review of the limited academic literature available to me.
Freeman has recently updated his paper with calculations that rely on this 30% estimate. However, following my post I received an email from Nick Panagakis informing me that the 30% estimate was out of date (Panagakis is the president of a Market Shares Corporation, a polling firm that has conducted exit polls in Wisconsin and other Midwestern states). Panagakis had checked with Warren Mitofsky, director of the NEP exit poll, and learned that the updated design effect used in 2004 assumed a 50% to 80% increase in error over simple random sampling (with the range depending on the number of precincts sampled in a given state). Blogger Rick Brady (Stones Cry Out) has subsequently confirmed that information in an email exchange with Mitofsky that he posted on his website.
Thus, the calculations in Freeman’s revised paper continue to understate the sampling error for the 2004 exit polls (more on this in a post to follow).
All of this brings us to the sampling error table that the NEP included in their statement of methodology, which I have copied below (click the image to see a full size version). These are the estimates of sampling error provided to the networks on Election Day for a range of percentages at various samples sizes. They appear to represent an increase in sampling error of at least 50% over simple random sampling. NEP intended the table as a rough guide to the appropriate sampling error for the data they have publicly released, with the additional warning that highly clustered characteristics (such as racial groups) may have even higher error.
The table ( available at the link pasted above) assumes a 95% confidence level. However, Nick Panagakis brought another important point to my attention. NEP requires a much higher level of confidence to project winners on Election Night. The reasoning is simple: At a 95% confidence level, one poll in twenty will produce a result outside the margin of error. Since they do exit polls in 50 states plus DC, they could miss a call in 2-3 states by chance alone. To reduce that possibility, NEP uses a 99.5% confidence level in determining the statistical significance of its projections (something I was able to confirm with Joe Lenski of Edison Research, who co-directed the NEP exit polls with Warren Mitofsky). More on the implications of that in my next post."