Democratic Underground - Actually, let me be more specific:

On the most conservative test I can think of, making no assumptions about distributions, I make the probability of the Ohio exit poll discrepancy being due to chance as about 1 in 16,500. That’s from a chi square, assuming, from the ESI chart, that there were 9 blue shifted precincts and 40 redshifted ones.

So, yes, the discrepancy was not due to chance, whatever the MoE on the individual precincts.

But that is precisely why we are having this debate, and why so many of us have spent so much time trying to figure out what caused the discrepancy. ESI concluded:

that the non-response rate theory is much more likely than the fraud accusation theory to account for most, if not all, of the observed discrepancy between the exit polls and the actual results. The more detailed information allowed us to see that voting patterns were consistent with past results and consistent with exit poll results across precincts.

NEDA takes issue with this conclusion, and the present study claims to offer

Virtually Irrefutable Evidence of Vote Miscount.

So how does it do so?

Well, it offers evidence that the discrepancy cannot have been due to chance. Fair enough. But what else?

The paper claims that

In an October 31st paper, NEDA mathematically proved that ESI’s and Mitofsky's analyses were incorrect because many counterexamples exist to its basic premise. In other words, NEDA proved mathematically that ESI's and Mitofsky's analysis of Ohio's and national exit poll data is of no analytical value and no conclusions about the presence or absence of vote fraud can be drawn from them.

This “proof” appears to me to be invalid. Indeed if valid, it would also invalidate the conclusions drawn in the smoking gun paper, but actually, it is not, so it doesn’t, although a number of other things do. So we’ll move on.

The authors then claim:

The ESI report had made no attempt to explain or mathematically analyze the actual 2004 exit poll discrepancies and the ESI report was missing key data. To date, Mitofsky and ESI have provided no explanation for the exit poll discrepancy that is supported by data and analysis.

Which is not true. The ESI report does analyze the 2004 exit poll discrepancies in considerable detail, and while what we have is a “brief” report, I understand a full report is currently undergoing peer-review, as such analyses should. I am not clear what “key data” is supposed to be missing from the ESI study, but they studied 49 precincts, and issued tables with data on those 49. The ESI study does not, indeed, provide an explanation, they merely conclude that the discrepancy is consistent with the poll-bias explanation and does not support the fraud explanation. They no-where say that it “rules it out”. On the other hand, the January E-M report does provide a fairly full explanation in the forms of crosstabs showing that WPE/WPD was greater where a number of factors likely to affect random sampling, notably low interviewing rate. Like NEDA, I had issues with that report, as no statistical details were given, plus, unlike NEDA, I do not consider WPE a good measure of precinct-level discrepancy. But it is simply not true that “no explanation” was given.

Now things get difficult as the smoking gun paper is not couched as a scientific report and it is difficult to find the hypothesis, the methods or the results in any clear order. But picking through the rubble, it appears that the authors attempted to match the ESI precincts with the Roper precincts, and found that they didn’t match; they nonetheless matched them “conservatively” – and used those matches to deduce the sample size on the assumption that the total sample was double that in the Roper sample. Well, firstly their matches are almost certainly wrong, because if the Roper samples were half the size, in many cases, sampling error in the subsampling process would generate greater discrepancies between the two sets than the matching algorithm assumes. So even if their “doubling” heuristic was justified, the wrong sample sizes would be assigned to the wrong precincts. But in fact the “doubling” heuristic is also wrong; because the subsampling process was designed to net about 50 responses from each precinct for the crosstab analyses. If a total sample was small, more than half might be subsampled – in fact the Roper set might comprise the entire sample. But the point is that the NEDA authors don’t know which. So any analysis by precinct is based on two faulty guesses. However, the authors do not include this error source in their probability calculations.

So what follows is based on faulty analysis. However, this doesn’t actually matter, because you can reach the same conclusion on the basis of the chi square I just did. We know that the discrepancy can’t have been chance.

NEDA then presents a plot of WPE against Kerry’s vote share. Bizarrely, they plot this as a bar graph rather than a scatterplot, which leads them into trouble later on as there are precincts that share the same value on the x axis, and when this happens, they even more bizarrely take the mean. They plot a regression line through their bar chart, but do not give the regression coefficient. They then ignore the bar chart for a bit.

Their next plot, page 10, appears to plot Kerry’s official vote values against Kerry’s official vote ranking (although the axes are poorly labelled so I may have got this wrong) as well as Kerry’s exit poll share against Kerry’s official vote ranking. They then fit a poly curve (polynomial unspecified, but it looks like a quadratic; they don’t give the equation but it looks as though only the linear term would be significant) – from which, I assume we are to deduce – dada! – that Kerry’s exit poll’s share tends to be higher than his vote share.

OK, my chi square did that.

Let’s move on.

At this point they say:

Ohio's exit poll discrepancy pattern is statistically implausible and has not been supportably explained in terms of any factors that cause exit poll error. Edison/Mitofsky and their NEP media clients have not publicly released information on the exact sample sizes, type of voting system, locations of precinct, or other exit poll factors to allow investigation or independent analysis.

Quite. So you can’t do the analysis that they claim to have done.

Next, page 11: WPD plotted against vote share. Again it’s a bar graph, so they have to average WPD where precincts share a vote share (which happens more often in this dataset than in actuality as a result of the blurring procedure). Given that they’ve done this it is just as well they don’t offer a correlation coefficient – they just point out that the regression line is negative, which it clearly appears to be. No probability value is given either, which again is just as well, because as I’ve pointed out in my other responses to Ron, the plot is completely meaningless. It just shows that Kerry’s exit poll share will tend to be greater where the exit polls were in his favour, and less where the exit polls were in Bush’s favour. Big people are significantly bigger than little people.

They then assert:

If vote miscounts cause discrepancies, then the trend line is negatively sloped when WPD is ordered by exit poll shares. Combined with a trend line with positive slope when WPD are ordered by official vote and the fact that the discrepancies shift across the 50/50 line when ordered by exit polls, Ohio's WPD is consistent with vote miscounts that altered the outcome of Ohio's presidential election.

So this is the smoking gun! They show an apparently positive regression line (without any statistical backup) between WPD and Kerry’s vote share, and a negative regression line (without any statistical backup) between WPD and Kerry’s exit poll share – and assert that this is “consistent with vote miscounts that altered the outcome of Ohio’s presidential election”.

Well, no.

The first is not statistically significant – and even if it was “approaching significance” (as my statistics tutor used to say “how do you know it was approaching? How do you know it’s not leaving?”) – the WPD itself tends to have a positive correlation with the Democratic candidate’s vote share if there is any bias at all, in either direction, even if it was "uniformly" distributed. So that isn’t a smoking gun.

And the second is meaningless, as I’ve just explained.

So no, the combination of the two regression lines is not suggestive of vote miscounts at all. It's what the math will produce in the presence of bias (which we know there was, whether in count or in poll) and sampling error (which, clearly, there was).

The fact that the discrepancies “shift across the 50:50 line” is, of course suggestive of vote miscounts because there is a significant discrepancy. But it is equally suggestive of poll bias

And in any case, it's exactly what I demonstrated with my chi square.

This response is already too long. The rest of the paper tiddles about with more regression lines through more bar charts (averaged precincts and all) again, with no statistical tests (they might have tested the significance of the difference between their two regression lines, but they don’t – it wouldn’t have been valid, for the reasons given above – but hey, it would have been a hypothesis, with a test, and a result.)

And they continue to accuse ESI of inadequate work.

As I said, elsewhere, the election reform movement deserves better than this. Good, hardworking, honest people are treading Ohio trying to find good evidence of the fraud we all suspect. Many others rely on “experts” to provide them with good evidence of what they feel they know in their gut – that the election was not fair, and may have been stolen outright. For sure many Kerry voters were either unable to cast their votes, or have their votes counted. For sure the DREs and tabulators are insecure, as the GAO report attested. That’s why we need good analyses from good analysts, to tell us what went wrong and what needs fixing. Because sure as hell something needs fixing.

What we don’t need is shoddy analysis from people purporting to be experts, who won’t even verify the limitations of the data they are working with, who produce an incomprehensible report with no decent hypothesis, methods section or results, yet bill it as:

The Gun is Smoking: 2004 Ohio Precinct-level Exit Poll Data Show Virtually Irrefutable Evidence of Vote Miscount.

No wonder it got 123 votes on GD. It's a great headline.

Unfortunately it means that 123 good people were sold a pup.

Reply #70: Actually, let me be more specific: [View All]