as it appeared that it might account for the strange inferences drawn in the Baiman Dopp Smoking Gun paper.
What Dopp appears to have done is to derive a formula with which to simulate the patterns of Within Precinct Discrepancy that would occur for a) bias in the poll and b) vote miscount, or combinations of the above, when WPD is plotted against Kerry's vote share.
Which is fine. I did something similar a while back, as I wanted to know the answer.
She then uses her formula to simulate a number of scenarios. Bizarrely she plots these as bar graphs, which causes her problems, because, as in the Baiman Dopp paper, when precincts share a value on the on the x axis, they cannot be plotted. This is why graphical representations of correlations are normally represented as scatterplots, not bar graphs. She solves this problem, however, by taking the mean of two data points when ever they share an X value. If applied to real data (as is done in the Baiman-Dopp paper) this will, of course, invalidate any statistical inference regarding the co-variance of the variables. However, as she doesn't draw any at this point, this is fairly trivial.
She then simulates two scenarios where (it would appear) every precinct has a poll bias of 56:50 or 50:56 respectively. She plots a regression line through a plot of the resulting WPDs, which appears to be flat, although she does not give the value of the regression coefficient. She has extensively elongated the X axis, so it is hard to tell whether there is a slope or not. However, the function can in fact be calculated by algebra, and is given in my paper
Edison/Mitofsky Exit Polls 2004: differential non-response or votecount corruption?. It takes the form of an assymmetrical U shape, going, as Kathy rightly observes, to zero at both ends of the X axis, but because it is assymmetrical, a linear regression line through the plot will tend to have a positive value. If sampling error is included (as it is in this simulation) this will tend to flatten the slope. It will, however be a real effect, and will be detectable given enough iterations.
She then demonstrates that a scenario in which error arises from vote miscounts will produce a marked positive slope. Here, I agree with her, and in fact Josh Mitteldorf demonstrated this some time ago, and I produced a model that also demonstrated it. I sent this to Kathy, and in fact it is still, I believe, dowloadable on her website
here. The effect is more pronounced if you use, as I did, ln(alpha) as your measure of bias rather than WPD, but it will be true of WPD as well. The problem with using WPD is that because WPD will produce also produce a slight positive slope in the presence of polling bias alone, it makes the effects harder to disambiguate. But I will agree with Kathy that fraud will tend to produce a more marked positive correlation with voteshare than polling bias. Here we do indeed have a potential "fingerprint".
However, she then goes on to plot her simulated WPDs against not vote share, but against
exit poll share. She finds that in the "polling bias" scenario, that she gets a "slight non-zero slope", although she doesn't explain it. However, it is readily explained by the fact that her simulation formula includes an error term for sampling error. As this error term affects both the values on the y axis (WPD) and the x axis (exit poll share), there will, inevitably, be a non-zero slope. The two variables are mathematically correlated.
She then repeats this for the "vote-miscount" scenario, and again finds a negative slope, for exactly the same reason***.
She then states:
The signature for WPD caused by vote miscounts seems to be a negative slope trend line whenever WPD is plotted by exit poll share irregardless of which candidate vote miscounts benefits.
What in fact Kathy has demonstrated is that the more variance there is in WPD, whether caused by vote miscounts or polling bias, the more WPD will be correlated with exit poll share. This is simply because the "error" term is common to both axes. It will be greater for non-uniform fraud than for uniform bias; similarly it will be greater for non-uniform bias than for uniform fraud.
In other words all the slope tells you is the amount of variance in WPD. Not what causes it. There is no signature in that slope for anything other than variance.
It remains worth investigating whether redshift in a suspect election is correlated with vote-share. For example: is Bush's vote share as high as it is because fraud put it there? If so, WPD is a poor measure to use, because of its built-in correlation with vote-share. It is not easy to find a measure in which this is not a problem, although I think that a measure developed by Mark Lindeman and myself, and dubbed "tau prime" is fairly sound*. As far as I can tell, however, even using WPD, there is no significant correlation between vote share and WPD in Ohio. Moreover, even after controlling for variance in Bush's vote share predicted by his vote-share in 2000**, as was done by
ESI, and which ought to increase the statistical power, there was apparently no significant correlation.
The remainder of the paper seems to involve various scenarios in which some hypothetical "uniform" bias is subtracted from all polls. As no-one has postulated "uniform" bias (what conceivable factor could produce such a thing? In any case we know from the
E-M report that WPE was not "uniform" - it covaried with a number of important factors, such as interviewing rate and the distance the interviewer stood from the precinct) this must remain a theoretical exercise only - and in any case, the magnitude of the bias cannot be known,
a priori. It is why we bother to investigate in the first place. Finally she presents a strategy for paper ballot audits which is fine and useful, and is given in more detail in another of her papers,
How Can Independent Paper Audits Detect and Correct Vote Miscounts?.
In summary: Kathy's conclusions as to the patterns that disambiguate fraud and polling bias are largely invalid; the one valid pattern she notes is not best detected by WPD, and in fact was suggested by Josh Mitteldorf months ago, and indeed tested on the national dataset by Mitofsky, using my proposed measure, and no significant correlation was found.
I do not rule out fraud in the 2004 election; I do not even rule out the possibility that some forms of vote corruption may have contributed to the exit poll discrepancy. But I do not consider that this paper offers any valid way of distinguishing vote corruption from polling bias, and is thus an inappropriate tool with which to draw "virtually irrefutable" conclusions regarding vote-corruption in Ohio. And, of course, I also reject all the author's allegations concerning my honesty. This is an honest critique.
Elizabeth Liddle
* A section of the paper is also devoted to attempting to demonstrate that the measures devised by myself and Mark Lindeman: "distort the effects of random sampling error, and are not useful for analyzing exit poll discrepancy data." I have addressed this critique elsewhere, but briefly: the measure we propose does not distort the effects of sampling error, but reflects it. In contrast, WPD is a distorted measure of the underlying factors that create bias, whether vote miscount of polling bias, giving larger values for a given propensity to bias in the centre of the voteshare distribution than at the extremes.
**Dopp, of course, claims that this procedure is illogical. It is, nonetheless, the basis of all multivariate statistical analysis. If Dopp is correct, her paper means the end of Statistical Analysis As We Know It.
***(on edit) On closure inspection of the various scenarios, there would appear to be a second reason for Dopp's fraud "signature" pattern: she postulates a uniform percentage shift in Kerry's votes to Bush. Clearly, a given percentage of 20% of the votes will result in a smaller WPD than the same percentage of 80% of the votes, so in this scenario, a correlation is built into the model - it models a scenario in which more votes are switched in higher Kerry precincts than in high Bush precincts. This, ironically, cancels out the tendency for WPD to be greater in high Bush precincts, and reverses the expected positive correlation between WPD and Kerry's vote share, whereby WPD will be more negative at the Bush end of the spectrum. In this "miscount" scenario, it would appear that the "signature" of fraud is a NEGATIVE correlation between WPD and Kerry's vote share - with more negative WPD at the high Kerry end of the spectrum, the opposite pattern to that Dopp claims is the pattern in Ohio, and which Baiman-Dopp claim is "irrefutable evidence of vote miscounts".
I give up.