So I started reading Nate Silver's book and got jealous. I wanted to make my own election model. So I did.
Here's the details: I took all the polls from January in all fifty states and smoothed them using a piecwise cubic interpolation. I then performed a principal components analysis on these data to find states that tended to be correlated (to reduce the number of independent variables). I found there were ~ 7 distinct groupings of states. These 7 groups tended to exhibit similar statistical properties.
Using the last four polls from each state, I estimated the mean spread between Obama and Romney for each state, as well as the standard deviation of these spreads.
I drove each group of 7 with different normal distributions of noise, scaling each noise driving process with the appropriate state based standard dev and adding in the approprate mean spread for each state. To be conservative, I scaled the standard deviation by a factor of two (to increase out uncertainty in each state).
Running the model 1000 times I found the following results:
84% win percentage for Obama.
An average value of 313.6 electoral votes.
A maximum likelihood of 332 electoral votes for Obama (although this likelihood was ~ 15%, so there are clearly many other fairly likely winning combinations for Obama).
If I use the actual standard deviation values, rather than the conservative scaled values, the win percentage goes to 98%.
It's too late in the season to build a statistical model of the election, but I am interested in doing so. If anyone wants to put together a DU model for use in the next electoral cycle(s) please inbox me. It could be really fun.
But I did write a simple script to calculate probabilities of a win based on the last week of polls for arbitrary states. I assumed normal statistics and simply calculated 1- the probability Obama - Romney was less than zero. I call that the probability we are really ahead as of this moment.
Simple really. Anyhow here are some probabilities we are currently leading in several swing states:
Interestingly, our probability of being ahead in Ohio is higher than anywhere else. This is in part a consequence of the high polling density that Ohio has experienced.
on edit: sorry, I forget that "normal statistics" isnt' self-explanatory to non-statisticians. Basically I assume that the error in a measurement is distributed according to a bell curve. So mean values are likely to be "close" to the center of the distribution. Large excursions from this value are assumed to fall off, in their probability of occurrence, as the bell curve falls off. Using this curve, it's possible to calculate the probability that the "true mean difference" is actually on the other side of zero (that Romney is really ahead).
Compare the valuefor Florida from a month ago, in the week after the first debate, to see how far we've come. Then there was only a 26% chance that we were in the lead.