You are viewing an obsolete version of the DU website which is no longer supported by the Administrators. Visit The New DU.
Democratic Underground Latest Greatest Lobby Journals Search Options Help Login

Reply #75: Fuller response: [View All]

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Topic Forums » Election Reform Donate to DU
Febble Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Jul-19-05 04:13 AM
Response to Reply #71
75. Fuller response:
Edited on Tue Jul-19-05 04:19 AM by Febble
I cannot possibly comment on your optimizer unless I know how it works.

You are sure doing a lot of commenting, though. If you want to know how it works, just replicate the numbers, line by line. You know the formulas.

Do the math. And you will see how it works.

I do not know the formulas in your optimizer, but I have told you mine. And I do not know what your output alphas represent. Each precinct has a different alpha. If you want an aggregate value for a series of values you have to decide on the most appropriate aggregate. You have not yet said which of the possible options your aggregate represents.

The E-M report gives mean, median and absolute WPEs, so any model has to match those, but you say your optimizer is not a model. So I have no views. I do not know what it is trying to do. All I know is that the alphas that you output are not the alphas in the data, and that the alphas in the data do not vary significantly with increase in Bush's vote.

It sure is a model. I never said it wasn't. And for you to infer that it is not a model is disingenuous to say the least. It is OBVIOUSLY a model. Have you ever designed a model?

Of course. It is what I do. And if you are happy to call your optimizer a model I am more than happy to call it one too. It is.

You still do not know what the model is trying to do? That is a very interesting comment. I have given you the goals, assumptions and results of the model. And you still don't know what it is trying to do? Or are you in denial of what the model is coming up with?

No, I do not know. I know you are producing some measure of "alpha" for each category. I do not know which aggregate measure it is an estimate of. It matters. As do the confidence intervals. I am certainly not in denial. I want to know.

The alphas produced are very close to the alphas produced by Ron Baiman. What is YOUR estimate of the mean alpha for each partisan grouping?

I am afraid I cannot give you those. I can however, and have, told you directly how they are computed, from the actual data. I have also told you that there is no significant trend for arctan(alpha) or ln(alpha) to increase with Bush's vote share. You can disbelieve me if you choose.

I am talking about real numbers. Real numbers are exactly what I am talking about.

If you want the algebra, here it is: The alpha for a precinct is:

Kerry responses/Kerry votes divided by Bush responses/Bush votes. I call Kerry responses/Kerry votes "Kp" and Bush response/Bush votes "Bp", i.e. Kp for Kerry participation rate and Bp for Bush participation rate. So alpha = Kp/Bp.

If you want the mean alpha for a category you cannot simply take the mean of all the precinct alphas, because, as you say, it is a ratio, and therefore does not have a normal distribution. What you do is you take the arctangent of the ratios, take the mean of the arctans, then the tangent of the mean. Mean alpha = the tangent of the mean arctan(Kp/Bp).

That statement shows a complete lack of understanding on your part. I don't take the mean of all the alphas. For the umpteenth time, the model derives precinct category alphas which satisfy the constraints. You are so caught up in your transforms, that you cannot grasp that basic fact.

Please be civil. I am not saying you take the mean of all the alphas (although it seems to be how you derived your grand weighted mean). What I am telling you is what I have done. What I am asking you is: what do you think your aggregate alpha will most closely match? The mean? The median? or the measure I suggest? Or some other measure of central tendency?

Febble, I challenge you to prove that the optimizer's category alphas are not feasible. And I challenge you to provide alternate alphas which satisfy the constraints.

They may be feasible given the constraints. All I am saying is that they do not match the means, medians or any of the other aggregates that I have computed from the data. They may match some other aggregate, but unless you suggest what, I cannot tell whether it matches.

Obviously you can't do this because you don't have the data. So you approximate to it. Fair enough, it is the best you can do with what you have got.

The model doesn't NEED the data. It has enough aggregate data to come up with feasible results. Now, Febble, how did the model determine that aggregate alpha of 1.12 is incorrect and that it must be at least 1.15. How did the model come up with that result without all the data? How would you prove that 1.12 was too low? In fact, when did you ever show that result?

I agree, it is fairly easy to approximate, using my formula (and Ron's, and I assume, your own), the E-M aggregate data, and a guesstimate of the mean partisanship of each precinct. It is fairly clear from the E-M aggregate data that the mean alpha is probably a bit higher than 1.12. And in fact it is.

What you cannot do from the aggregate data is test the hypothesis that alpha trends significantly higher as Bush's share of the counted vote increases. That was an important question, worth testing, that could only be tested on the actual data. And it isn't.

But I am talking about real data. Mitofsky has the real data, and he has given the answer for ln(alpha). I also have checked the answer using arctan. There is no significant slope between ln(alpha) or arctan(alpha) and Bush's share of the vote. So yes, I dispute that the means are significantly different between categories. They are different, but not significantly different. It is not an assumption. It is a computation. There is no slope.

Me: I am talking about real data.

Febble, prove that the alphas I present are not correct. Prove that they are infeasible. You can't. Show us your alphas. I will gladly insert them into the model. Then we will see if the constraints are satisfied.

Can't do that, I'm afraid. I am sure yours are feasible. They just don't happen to correspond to values given by any of the measures of central tendency I have proposed.

So whether the optimizer output makes sense or not makes no difference. In the real numbers there is no slope.

Me: There is no slope?

Oh, so alpha or ln alpha is flat across partisanship groups.
Really? From 1.0 for High Kerry to 1.50 to High Bush is flat?

As I've said, the relevant test is a Pearson correlation coefficient. It is not significant. So yes, the regression line is statistically flat. If you look at the WPE_plot you will see that is the case. You will also see why the category estimates are misleading.

And they have absolutely nothing to do with the time line. They are simply the raw responses, completely unweighted, that went into the estimate. The weights were what produced the timeline changes. the raw data is the raw data. If there is bias in the raw data it is either bias in the poll or bias in the count.

The unweighted data we are working with show, unambiguously, that there was a massively significant red-shift of something around alpha=1.15 on average, with a huge amount of variance, as can be seen from the plot.

But alpha did not vary with partisanship. It varied with lots of other things, but not with that. Not at zero-order correlation level, anyway.

There is no significant slope.

Me: The optimizer has shown that the only partisan alphas which are feasible do in fact vary with partisanship - from 1.0 to 1.50.

There is no significant slope? A monotonic increase across partisanship groups from 1.0 Kerry to 1.18 to 1.50 Bush is not significant?

You cannot tell whether a slope is significant without a measure of the variance. It is why I suggest you produce confidence intervals, using the absolute WPEs, which are the only estimates of variance given in the E-M tables. A small slope can be significant if the variance is small. A large slope can be insignificant if the variance is large. The variance is large.

What slope are you looking at?

I am, of course, looking at the slope in the actual data. The one in the scatterplot. The one I have also, personally computed. The one between ln(alpha) and Bush's share of the vote. And you cannot test the significance of any slope without a measure of the variance.

Any apparent slope given by category aggregates, whether yours, medians, means, whatever, ignores the extra information, apparent in the actual data, but not in the category aggregates, that the low category alpha in the high Kerry category is being leveraged by a few extremely low alphas in precincts where Kerry had around 99% of the vote. There are some very high alphas in that category as well. There are also some very low alphas at the high end of the mod Rep category. The category boundaries are completely arbitrary. If the X axis was divided in four or six, rather than five, you would get very different answers. The question is, is there a significant monotonic trend? The answer is, no. It is a very good example of why aggregate values can be very misleading in statistics, why continous variables should not be arbitrarily categorized, and why variance is critical.

So this is why I say: if you want to know what happened in the exit polls, look at the data. And the data tells you that the slope you would expect from widespread, randomly distributed fraud is not there, even though it may appear to be there from estimates of the aggregate alphas. So a viable alternative is that fraud was greatest at the Kerry end.

But if you want to disbelieve the data, fair enough. You have reason enough to, I suppose. You cannot check it. I can only say, I am honest, I've run the regression myself, and the line is statistically flat. Not only that, but if you exclude precincts where Kerry had more than 95% of the vote (there are no equivalently extreme Bush precincts) the line is not only statistically flat but actually tilts the other way. So even the insignificant slope is actually being leveraged not by anything happening in the higher Bush precincts but by something happening in the highest Kerry precincts.

Or not happening. Which might be interesting.

You and other DUers are welcome to PM me if you want to continue.


Edited for typos. Sorry if there are some left.
Printer Friendly | Permalink |  | Top

Home » Discuss » Topic Forums » Election Reform Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002
Software has been extensively modified by the DU administrators

Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC