Febble Donating Member (1000+ posts) Send PM | Profile | Ignore Fri Jun-15-07 05:12 PM
Response to Reply #28
29. Well, it would probably
be easier to understand if you did not assume a) that I was going off at a tangent and b) that I was being deliberately confusing.

OK, let's forget for now that there is no correlation between the exit poll discrepancy and increase in Bush's vote share.

You want to know why the reweighted exit polls caused results that you find inexplicable.

Well, first of all, consider how the weighting is done.

Ideally, the voters in the poll should be a random sample of all voters. If the exit poll sample really was a random sample of all voters, then if the votes were counted correctly, there should be a very close correspondence between the proportion of voters in the sample who report having voted for each candidate, and the proportion of votes counted for each candidate, and we could calculate very precisely how probable it is that a given discrepancy between the proportion in the sample and the proportion in the count would have occurred by chance.

However, the sample of voters is not a random sample of voters. First of all the pollsters select a sample of precincts, and these are not random, but selected deliberately so as to be representative of the country. Then, at each precinct, an attempt is made to interview a random sample of voters. However, unless the selection is completely unbiased, and unless the response rate is 100%, this is unlikely to be a random sample. The pollsters know this, and so they ask their interviewers to record the age, race and sex of anyone they select who either refuses, or who cannot be interviewed because the interviewer is busy with another respondent. Clearly only visible characteristics of non-respondents can be observed - by definition, the pollsters do not know how the non-respondents voted. But by noting the age, race and sex of the non-respondents, when the responses come in, the pollsters can check whether, say, the proportion of women among the actual respondents is the same as the proportion of women who were selected, including the non-respondents. If they note that the proportion of women in the respondent sample is greater than the proportion of women in the whole selected group, then they figure that women are over-represented in the respondent sample.

So they apply weights to each respondent in their spreadsheet. There is an actual column, headed "weights" that you could download for free (though you now have to pay) and I spent many hours staring at it. OK, so the thing to do, if you find you have too many women, is to upweight the men and/or downweight the women. So all the women may have a weighting of .9 meaning each woman in the poll represents .9 of a woman voter, but the men may have a weighting of 1.1, which means that each man in the poll represents 1.1 of a male voter.

Same with age band (approximately) and race (approximately).

However, other sources are used to reweight the sample. One is pre-election polls - if the results in a given area are out of whack with the pollsters pre-election surveys, they will assume that their exit polls are biased (because getting a representative sample face to face is harder than doing a telephone poll), and they will adjust the weights of each respondent accordingly. If pre-elections polls are more pro-Bush than the exit poll, they will downweight each Kerry respondent a bit, and upweight each Bush voter a bit. Then, as the incoming vote returns arrive, again, the respondents will be weighted according to the incoming returns, because, notoriously, the pollsters assume that if there is a discrepancy, the problem is with the sampling (they know by this stage that there ARE problems with the sampling because of the other two sources, and they regard the incoming vote return simply as another source of data on the actual voting population). So again, if the returns are "redder" than the respondents, Bush voters will be upweighted and Kerry voters downweighted.

And the crosstabulations are done on the weighted data. This is easy enough to do. Instead of computing the percentages on the actual numbers, each respondent's response is multiplied by the weight before the percentage is calculated. Obviously if all the weights are equal to 1 there will be no difference. But if some respondents have weights of greater then 1 and some of less than 1, then the reweighted cross-tabulations will be different from the unweighted crosstabulations.

The problem, however, is that apart from the reweighting done on the basis of the age, race and sex of non-respondents, there is only geographic, not demographic information on exactly who is supposed to be missing from the poll. All the posters know from the pre-election polls and the incoming vote returns is that Kerry voters need to be downweighted and Bush voters upweighted. And there is no guarantee that the missing Bush voters (assumed missing, I mean) are drawn equally from all demographics. If, say, black Bush voters are more likely to evade selection than white Bush voters, then, when all Bush respondents in the poll in a given region are equally upweighted, white Bush voters will tend to be over-represented in the poll. Ditto with agebands.

And clearly, the greater the bias in the sample, the less accurate the reweighted crosstabulations will be. Add to this the fact that in any one state only a few tens of precincts are sampled, meaning that some counties will be completely unrepresented, and only large urban counties are likely to have more than one precinct in the poll. In fact, the NEP exit poll is in many ways best considered as a large number of very small polls - and even that large number is spread very thinly across America. You cannot do county-level analysis from the exit polls, because, as I said, most counties will have one, or no, precinct in the poll.

I hope that is comprehensible. Now I will go off "at a tangent". None of the above means that fraud was not the reason the exit poll had to be so substantially reweighted in 2004. We know the reweighting was substantial, and your contention, I take it, is that the reason it was necessary was that it was the count, not the poll sample, that was biased. And the contention in the OP is that this is supported by the anomalous looking cross-tabulations - made on the reweighted respondent data. My point is that if the anomalies were due to fraud in particular places - if, in other words, that Bush's anomalous looking increase in vote-share among some demographics was due to vote-switching, then it implies that the fraud was non-uniform. It happened in some places, but not others. Yes? Now, fraud will tend to produce a discrepancy between the poll and the count. And the greater the fraud, the greater the discrepancy will be. In addition, the greater the fraud, the better Bush will tend to do than expected on the basis of his 2000 vote share. So, IF we found that in just those precincts where the discrepancy was greatest, Bush's gains were greatest, and that he did relatively badly in those precincts where the discrepancy was least, or, indeed, apparently biased the other way, then that would be strong support for the fraud hypothesis. The trouble is that there isn't even a hint of that pattern. There is absolutely no tendency, observable in the data, for Bush to do better where the discrepancy was greater, or worse where it was less.

He did tend to improve his vote share most where Gore did best (which you would expect - he had more votes to win) and improved least where he did best in 2000 (which you would expect - he had nowhere to go but down). But there is no tendency for the discrepancy pattern to follow this trend, or to have any trend at all relative to Bush's performance relative to 2000.

So to summarise: the anomalies you note in the reweighted cross-tabs are perfectly consistent with the extent of the reweighting applied. They would also be consistent with fraud; but if due to fraud, we'd also expect to see a correlation between increase in apparent turnout for Bush and the extent of the precinct-level discrepancy, and we don't see this.

Diary on the way the NEP exit polls work here:

