Strategic Vision has released a July poll that includes the Washington state gubernatorial contest. The poll shows Gov. Christine Gregoire (D) leading challenger Dino Rossi (GOP-Party) 47% to 45%, with 8% “undecided”. The poll of 800 people was taken from July 25th to July 27th, and has a margin of error of 3.5%

This is the fourth July poll in this race. Here are the results from the four polls:

Poll | Start | End | # Polled | MOE | % Gregoire | % Rossi |
---|---|---|---|---|---|---|

Strategic Vision | 25-Jul | 27-Jul | 800 | 3.5 | 47.0 | 45.0 |

SurveyUSA | 13-Jul | 15-Jul | 666 | 3.9 | 49.0 | 46.0 |

Moore Information | 09-Jul | 10-Jul | 400 | 5.0 | 45.0 | 45.0 |

Rasmussen | 09-Jul | 09-Jul | 500 | 4.5 | 49.0 | 43.0 |

Rossi last led in this race thirteen polls ago, back in late February.

I’ll do two Monte Carlo analyses. First is an analysis of the poll numbers in the new Strategic Vision poll in order to estimate the probability that Gregoire (and Rossi) would win an election held right now. I simulated a million gubernatorial elections of 800 voters each, where each voter had a 47% chance of voting for Gregoire, a 45% chance of voting for Rossi and a 8% chance of voting for neither.

**Result 1**: Gregoire won 716,473 of the simulated elections and Rossi won 271,349 times. This suggests that, in an election now, Gregoire would have a 72.5% probability of winning and Rossi would have a 27.5% probability of winning. A statistician would point out that Gregoire’s lead in this poll is within the margin of error (i.e. her probability of winning is less than 95%).

Here is a plot showing the distribution of votes in the million elections (blue bars are wins for Gregoire and red bars are Rossi wins):

The second analysis combines the polls from all four polls in the Table, to give a July score for this race.

The combined polls yield a pool of 1127 (47.6%) votes for Gregoire, 1061 (44.9%) votes for Rossi, and 177 (7.5%) who voted for neither. Again, I simulate 1,000,000 elections.

**Result 2**: Gregoire won 919,335 of the simulated elections and Rossi won 77,493 times. The results suggest that, if a July election were held, Gregoire would have won with an 92.2% probability, and Rossi would have won with a 7.8% probability.

Here is a plot showing the distribution of votes in the million elections for the combined polls:

Strategic Vision also polled for the presidential election in Washington state. Sen. Barack Obama (D) leads Sen. John McCain (R) by a +11% margin, 48% to 37%. Obama’s lead is well outside the margin of error for this poll.

dutch spews:

Looked at the strategic Vision poll. Interesting numbers, some to be expected, others quite surprising.

Bush has low approval ratings in WA State…to be expected…Congress has even lower approval ratings….hmmm.

Job approval of Chris(tine)…48 vs 45…hmmm toss up within the margin of error…not good for the sitting Gov.

But what about: Is WA moving in the right direction? 26 yes, 65% !!!! no. Oh boy…if that number will impact the 8 percent who are currently un-decided (and heaven forbid…will actually vote)..Chris is in trouble. You guys can’t blame it all on Bush :-)

Darryl spews:

Rick D.

“I find it amusing that Darryl is calling this race in July.”What the fuck are you babbling about? If you actually read the post, you would find no such “call.”

Sheesh.! It’s WingDings like you who must be driving down reading scores….

Darryl spews:

Rick D @ 15,

‘2 mock “results” within the margin of error, but (you’re right), no “call”.’Still dumber than dirt, eh? The lack of a “call” has nothing to do with margin of error, dipshit. I know analogies are very difficult for people with your particular cognitive impairment to understand, but what I did was akin to announcing a half-time score at a football game.

“So why bother posting the thread?Because I wanted to.

“As dutch pointed out, “garbage in, garbage out”.”Sure…just like the half-time score at a football game is garbage in and garbage out.

“Perhaps just more red meat for Goldy’s circus animals here at HA?”What the fuck? A statistical analysis is “red meat”?????? What a fucking moron.

Darryl spews:

Dutch,

‘Oh yes, “statistically” there is little to say about the calcultions, but your assumption leave much open.’And…um, exactly what assumptions are those?

“a) 47 to 45 percent for a sitting governor is not a good result. Based on the success Chris(tine) is claiming…she should be well over 50 percent.”The statistical analysis is, essentially, blind to such things. And I certainly did not inject any “good” or “bad” into the analysis.

“The 2 percent is within the margin of error so it could easily be 46 to 46 or 45 to 47.”In fact, I provide that entire distribution of all outcomes based on the new poll (see the top figure). And also the entire distribution of outcomes based on pooling the four July polls (second figure). What’s your point?

“But the biggest flaw in your assumption is that you claim the 8 percent undecided are voting for neither.”No…I made no such claim or “assumptions.” The 8 percent was empirically observed by Strategic Vision. It is a fact that 8 percent of the 800 people polled chose neither Gregoire or Rossi.

“How about those 8 percent (or a large percentile of it) will decide last moment to vote for either candidate….majority go to Rossi or Gregoire…that will change everything.”You are mistaken. If you read the post carefully, you will realize that the post does not predict the outcome of the November election. Rather, I am using the evidence made available via a poll to provide evidence for every possible outcome in a hypothetical election held now. The eight percent who were undecided provide no statistical evidence either way. I could MAKE an assumption on how they might break, but I didn’t. In fact I intentionally avoid making such assumptions because they are not driven by the data and the underlying Bernoulli process.

Darryl spews:

Rick D,

“what statistical analysis, Darryl?”Don’t know much about statistics, eh Rickster? Or are you just incapable of reading with comprehension? Perhaps you can get your mommy to read it to you.

“you gave us gobbledygoop crap percentages within the statitistical margins of error,”You don’t have a clue, do ya, Squirt?

“and I’m the moron?”It would appear so!

proud leftist spews:

Rabbit @ 21

Did I actually read in a post of a day or two ago that you voted for Bob Dole in 1996? How could that have happened? Now, I will say I wasn’t a huge Bill Clinton fan; he was and is considerably to my right. Nonetheless, Bob Dole? I can proudly say I have never voted for a Republican ever, at any level, not even local sewer district. I have voted for third party candidates and written names on the ballot when I don’t like whomever the Democrat is. But, vote Republican? Man, I don’t want to go to Hell.

sarge spews:

At this point, it’s a toss-up. Not much more to it than that.

Darryl spews:

sarge @ 27,

“At this point, it’s a toss-up. Not much more to it than that.”Not quite. The evidence suggests Gregoire is at a 72.5% probability of winning and Rossi a 27.5%.

If you believe pooling polls over a month is useful, then it’s 92.2% probability for Gregoire and 7.8% for Rossi.

scotto spews:

The Monte Carlo analysis would be correct if you knew with 100% certainty that the voting probability split was truly 47/45/8 (or whatever the numbers were that were measured by the poll).

But those probabilities are themselves only estimates, and there is some probability of them deviating from the measured numbers — this is related to the “margin of error” that pollsters always give.

In order to take that uncertainty into account, you have to do a Bayesian integration over the voting distribution parameter estimates (47/45/8), assuming some probability distribution on their true value, given the estimates generated by the poll.

Darryl spews:

scotto,

“The Monte Carlo analysis would be correct if you knew with 100% certainty that the voting probability split was truly 47/45/8 (or whatever the numbers were that were measured by the poll).”Nope…the MC analysis is correct under the assumptions of a standard Bernoulli process. The distribution of outcomes from the MC analysis reflects the sampling error.

“But those probabilities are themselves only estimates”The probability is an estimate based on observed responses sampled from a much larger population.

“and there is some probability of them deviating from the measured numbers”That is called “sampling error” and it is the entire reason for doing the MC analysis (as an alternative to relying on asymptotic properties derived from a Bernoulli process).

“this is related to the “margin of error” that pollsters always give.”Right the MOE is simply one particular cut-off in the sampling distribution, but pollsters inflate the MOE by assuming the true value of

pis 0.5, rather than the value computed asp=n/N“In order to take that uncertainty into account, you have to do a Bayesian integration over the voting distribution parameter estimates (47/45/8), assuming some probability distribution on their true value, given the estimates generated by the poll.”Your comment would be appropriate under a modified Bernoulli process where there is heterogeneity in

p. But under the Bernoulli process, the maximum likelihood estimator forpis simplyn/Nand the MC analysis yields an unbiased estimate of the sampling distribution.What you are suggesting (integration over a prior distribution) is appropriate if you believe that

pitself has some distribution among individuals.Richard Pope spews:

The race is still too damned close. Only a couple points lead for Gregoire, when voters in this state generally tend to prefer Democrats over Republicans at the statewide level by about 10 points. There is probably nowhere near enough G.0.P. resurgence to change more than a few seats in the legislature, but the Governor’s race is mysteriously looking competitive.

scotto spews:

Suppose I randomly polled 3 voters. There is some chance that all three will say they will vote for Rossi.

Unless I am missing something, the next step in your methodology would be to do 1 million MC runs with p(Rossi)=1 and then you would conclude that there is a 100% chance that Rossi will win.

You’ve got to consider the original sampling error of the polls.

Daddy Love spews:

In other good news (for John McCain) Obama still leads in Florida, Ohio, Pennsylvania. Quinnipiac University Swing State Poll:

FLORIDA: Obama 46 – McCain 44;

OHIO: Obama 46 – McCain 44;

PENNSYLVANIA: Obama 49 – McCain 42

Good news for McCain, right? Gee, I’ll bet Obama’s running scared.

rhp6033 spews:

I think Dino’s hit his high point. He got lots of help from the Seattle Times, which has been publishing his campaign talking points on the front pages of the Sunday editions as if they were the result of some “investigation” by their reporters. But they’ve shot their wad, there’s nothing left to grab on to, and Gregoire’s still in the lead, and she’s just beginning her campaign push. An 8% undecided isn’t very good news for a challenger who’s still behind in the polls, there aren’t that many people who haven’t already made up their minds.

Assuming (for the sake of example) that 8% of the undecideds represented 8 voters, then for Rossi to win he has to get six of them to vote for him, and only two to vote for Gregoire. That isn’t going to happen.

But as I’ve said before, I think current polling methods are under-representing Democratic voters this year. Young people are flocking to Obama, and they all have cell phones, not land lines, so they don’t get polling calls.

One of the things the Democrats need to do is make sure these new voters are properly registered, that they aren’t DE-registered by Republican dirty tricks, and that they are properly educated about the local races.

dutch spews:

Darryl, as long as you claim that the 8 percent do not vote, your assumptions are wrong. As the poll indicated it’s a poll of likely voters, so counting them out scews the data. But yes, if you do this…your analysis will lead to the result you got.

But for now…I would agree with Sarge…result is too close to have a meaningful analysis.

Darryl spews:

scotto,

“Suppose I randomly polled 3 voters. There is some chance that all three will say they will vote for Rossi.Unless I am missing something, the next step in your methodology would be to do 1 million MC runs with p(Rossi)=1 and then you would conclude that there is a 100% chance that Rossi will win.”Isn’t this just a degenerate case? I mean if we fall back to the traditional method, and compute a standard error using sqrt(p(1-p)/N)) we end up with same problem, don’t we?

“You’ve got to consider the original sampling error of the polls.”The MC analysis being used to estimate the sampling error of the poll.

Darryl spews:

Rick D @ 36

‘It appears you’ve chosen to throw out a bunch of statistics and arrived at your own “confusions”.’Really? And just what “statistics” have I “thrown out,” there, Squirt?

Richard Pope spews:

Darryl @ 41

A reverse Monte Carlo analysis might be more appropriate.

Darryl spews:

dutch,

“Darryl, as long as you claim that the 8 percent do not vote, your assumptions are wrong.”Again…this is not an “assumption,” the 8% who chose not to select Rossi and Gregoire are

observedin the sample, and therefore contribute no information (in the statistical sense) to the Gregoire–Rossi outcome.“As the poll indicated it’s a poll of likely voters, so counting them out scews the data.”Screws the data????? The data are the closest thing to reality here. Perhaps you meant “screws the analysis?”

“But yes, if you do this…your analysis will lead to the result you got.”If I “do” what? You seem to be complaining that I

didn’t dosomething—specifically, you seem to believe that I should make some assumptions about those 8%. What assumptions would you make? That they all break for Rossi? That they all break for Gregoire? Something in between? How would I select the “best” strategy? How could one actually justify layering on assumptions that are not supported by actual data?A couple of years ago I read a doctoral dissertation where the author used MC analyses to predict electoral college outcomes (using data from 2000 and 2004). The author tried a number of strategies for dealing with the “other” category in polls. His conclusion was that it doesn’t make any difference. (I wish I could point you to it, but I cannot find it now.)

“But for now…I would agree with Sarge…result is too close to have a meaningful analysis.”Suit yourself.

scotto spews:

Darryl @41,

The problem is that the methodology you’re using can’t tell the difference between polls with large and small margins of error; for a given measured p, the MC runs will get the same answer for a poll of 10 people and a poll of 10 million people. But clearly, you can be more certain about the outcome when you’ve polled 10 million.

Maybe the escape hatch is some assumption that makes MC analysis always legit when the measured p is accurate to within some vanishingly small margin. If so, is there a proof we could read that explains this? And also if so, how does that margin compare to the 3.5-5% margins in the polls used here?