Tuesday, August 12, 2008

It's a Tight Race

I've developed a method to estimate the most likely current state-by-state two-party percentage distribution in the upcoming presidential election. The method takes advantage the fact that the two-party distribution of votes in any given state are correlated with the distribution of votes in number of other states. This basic fact means that polling data in, say, California, may be reasonably used to estimate the current distribution of votes in states as geographically disparate as Washington, Vermont, New Mexico and Illinois. This approach enables a relatively large number of polling samples to be estimated for each state and, as long as the polls themselves are relatively unbiased within and across states, the resulting estimations should give a good picture of the actual state of the election in each state.

I have used data from the the 1964 to 2004 presidential elections to calculate correlations between each state. I have chosen the 1964 election as the starting point for the data because, arguably, this election represents the first "modern-era" election. That is, the pattern of vote distribution across states established in that election are remarkably similar to the pattern of vote distribution across states in most elections since then. I have used a correlation cutoff of 0.8 to determine whether the polling data of a state would be used to estimate the percentage distribution of votes in the state under consideration. I have use the RealClearPolitics website as my source for state level polling data. I have used data from polls conducted over the previous 30 days and for each state have simply averaged the polling results across all polls taken in a given state (that is, I assume that each poll lacks systematic bias that would, uncorrected, render the model invalid). Since these averages in nearly all cases add to less than 100%, I then adjusted the average figures to derive each major party candidate's share of 100% of the two-party vote in a given state. These percentage shares then became the base percentage shares for that state in the model. Note that where no polling data was available for a given state in the past 30 days, no prior polling data from the state was used (even if available) to estimate the two-party percentage shares in other states. Because recent polling data is available for most states, the lack of recent polling data in some states did not have a major impact on estimating the candidates most-likely share of the two-party vote in any state (in other words, the model is reasonably robust if polling data from some states are not available).

The percentage shares for each state were calculated in a three-step process:

1. Regression models were estimated for each state against all other states. In the case of California, for example, regression models were calculated between California and each other state (California-Alabama; California-Alaska; etc.). These regression models would allow a poll in, say, Alaska, to be used to estimate the two-party distribution of votes in California.
2. The polling data was used to estimate the two-party distribution of votes in each state. since for each state the results of polls in a number of other states were used, each of these results were in effect separate samples. The average across all samples was then taken as the preliminary two-party distribution of votes in a given state.
3. The preliminary two-party distribution of votes in each state generated in step 2 above was then used to replace the initial polling data. The model was rerun for each state and the resulting figures were then taken as the final estimation for each state.

Note that I did make further adjustments in a total of five states. In the home states of the two candidates (Arizona for McCain and Illinois for Obama) I added five percentage points to the home state candidate to reflect their likely "home field" advantage. I also added five percentage points to McCain in the Appalachian states of West Virginia, Kentucky and Tennessee to reflect a likely resistance to Obama's candidacy from voters that might otherwise vote for a Democratic candidate in those states.

The following maps below show the model's results as of August 12, 2008. The red states are states where McCain is presently estimated to be ahead and the percentages in those states (in black) are McCain's estimated percentage of the votes in those states. The blue states are states where Obama is presently estimated to be ahead and the percentages in those states (in yellow) are Obama's percentage o the votes in those states. A handful of the smaller northeastern states are not listed in the map and are instead shown in the northeastern closeup map below:

The map below gives a closeup for the smaller northeastern states:

For those readers enjoying detail, the chart below shows the estimated two-party popular vote in each state, the percentage shares and the allocated electoral votes (please click on the table for a closeup view):

In the all-important electoral college, Barack Obama leads John McCain by a slender 295 to 243 vote margin. This electoral college vote figure is nearly the same as the estimated figure given in the 538.com website as of today. My model also suggests that Obama has a 2.2 percentage lead in the popular vote, just fractionally over the 2.0 percentage point Obama lead noted in the 538.com website. My own work in setting up this model suggests that Obama's lead has been sublimating away slowly over the past several weeks.

There's not much more lead to be lost before John McCain is in the lead. The media continue to treat Barack Obama as the presumptive president-to-be but the data seems to suggest that this presumption is very much misplaced. Obama's lead in Ohio appears to be only about 1.4 percentage points (or about 80,000 votes) while his lead in Missouri appears to be only about 0.8 percentage points (or about 23,000 votes). Their combined 31 electoral votes would give McCain 274 electoral votes and the lead. On the other hand, McCain now appears to have a 2 percentage point cushion in Colorado and a more than 3 percentage point margin in Nevada.

No comments: