Democratic Underground - why weight? - Democratic Underground

(Some people have questioned whether OTOH has the cojones to
create long, non-proportional posts with lots of tables.
Certainly I opt for quality over quantity, but lest anyone
doubt my ability to tabulate....)

Here is an attempt to demonstrate succinctly why weighting to
the official results is generally expected to yield more
accurate tabulations. This analysis also provides hints to how
one might detect certain forms of vote miscount (accidental or
otherwise), but my main point is just that weighting to the
official results is a legitimate method, not a scam --
although, of course, it can yield _less_ accurate tabulations.

Assume two candidates, Able and Baker, and an electorate split
into two groups, athletes (40%) and bookworms (60%). 60% of
athletes vote for Able, 40% of bookworms vote for Able, and
thus Able gets 48% of the vote (Baker gets the other 52%). In
a hypothetical sample of 1000 respondents without bias or
sampling error, the results would look like this. Note that
the percentages are row percentages (% of athletes or of
bookworms).

 (unbiased)       Able             Baker         total 
athletes       240 (60.0%)      160 (40.0%)       400
bookworms      240 (40.0%)      360 (60.0%)       600
––––––––––––––––––––––––––––––––––––––––––––––––––-––
TOTAL          480              520              1000

Now, suppose that Baker voters tend to steer around the
interviewers, so they are underrepresented in the survey.
Specifically, say that 10% (i.e., 52 out of 520) of the Baker
voters represented above don't participate. For effect, I will
take away proportionately a few more athletes than bookworms:
so I am assuming (to make the example more complicated) that
athletes are more likely to avoid being interviewed.

 (biased)        Able             Baker          total 
athletes       240 (63.2%)      140 (36.8%)       380
bookworms      240 (42.2%)      328 (57.8%)       568
––––––––––––––––––––––––––––––––––––––––––––––––––––-
TOTAL          480              468               948

In this biased tabulation, Able appears to be ahead. (You can
compute the percentage if you want, but I deliberately don't
report it because actual demographic tabs wouldn't report it.)
For present purposes, focus on the Able demographic
percentages: 63.2% among athletes, 42.2% among bookworms. Not
bad, but definitely off.

Now, the official returns indicate that Baker has won with 52%
of the total vote. To weight to the official returns, we
multiply each Able respondent by a number somewhat less than 1
(about 0.948), and each Baker respondent by a number somewhat
greater than 1 (about 1.053), so that the Able respondents are
48% of the weighted sample.

 (weighted)      Able             Baker          total 
athletes      227.5 (60.7%)     147.5 (39.3%)    375.0
bookworms     227.5 (39.7%)     345.5 (60.3%)    573.0
–––––––––––––––––––––––––––––––––––––––––––––––––––––-
TOTAL         455.0             493.0            948.0

Notice that the percentage estimates are now better (but not
perfect). There is one subtle side effect: athletes now make
up 39.6% of the weighted sample instead of 40%. (If
athletes-for-Baker and bookworms-for-Baker had been equally
likely to avoid being surveyed, then the weighting would
actually retrieve _all_ the original percentages.)

Now, what if for some reason the official returns indicate
that Baker won with _53_% of the total vote? (For instance,
what if votes for Able are subject to higher spoilage rates?)
The weights will get a bit more extreme, and we will end up
with something like this:

(misweighted)    Able             Baker          total 
athletes      227.8 (59.7%)     150.3 (40.3%)    373.1
bookworms     227.8 (38.8%)     352.1 (61.2%)    574.9
–––––––––––––––––––––––––––––––––––––––––––––––––––––-
TOTAL         445.6             502.4            948.0

These percentages are still more accurate than in the
unweighted biased tab. Loosely speaking, _if_ the official
results are more accurate than the survey results (as they
were in this case), and _if_ the bias in the poll isn't
_heavily_ concentrated in some demographic groups, weighting
will generally produce more accurate percentages.
Reply #154: why weight? [View All]