Follow-up: Poll aggregation and more on bogus pollsters

I was hoping for a bunch of last minute polls being released today, but nada. Perhaps they were all released all last night.

I’m a bit unhappy with the way this election cycle is ending. Since I started aggregating state presidential polls in 2007, one of my objectives has been to see how accurate a simple binomial model can be at predicting the presidential election, using only polls as the data. The model did very well in the 2008 and 2012 cycles.

It is also a more difficult business in 2016. Obviously, the demise of the norm that there be a landline in nearly every home is an emerging problem. Many pollsters supplement landline polls with a cell phone sub-sample. This is expensive, because these calls must be done by people. Interactive voice or digit response polls aren’t permitted for cell phones.

A more difficult problem for my analyses is that there has been a proliferation of internet-based polls. You’ve probably seen them: YouGov, Insights West, Google Analytics, Ipsos/Reuters, SurveyMonkey, to name a few. Most of these polls work by surveying from a very large opt-in panel, and then mathematically adjust the results to match a demographic profile. So, if not enough 50-60 year old White women respond, the few who do respond are, essentially, cloned.

I’m sure this technology can work very well with a huge panel and careful background work to perfect the methods. It may be that some internet pollsters, say YouGov, have done so. But for every YouGov, there are probably a dozen internet polls that haven’t, and they frequently have bizarre results. I don’t want to be in the business of making subjective judgments about whose polls are “good enough” to accept. Therefore, I simply exclude polls that survey internet panels.

Another problem is that internet polls generally don’t have well defined sample sizes. My analyses need to know the “effective” size of each poll to work. “Effective” means the equivalent number of individuals who, if live-polled, would generate the same statistical information as the internet poll. This should, in principle, be easy to estimate using simulation methods like bootstrap resampling, but I am not aware of direct analytical solutions (which would, at a minimum, have to use the distribution of weights). In any case, “effective” sample sizes don’t see to have made it in internet polls yet.

Unfortunately, a number of pollsters switched—sometimes right in the middle of the election season—to using an internet panel. Pollsters that come to mind include Rasmussen, Gravis, and Public Policy Polling. These three pollsters make up a substantial fraction of the total state polls done in any season. I suppose I should accept an internet panel as an approximation of the cell phone sub-sample in a poll, but I didn’t this year. Rules and all that….

Finally, there is the bogus pollsters. Regular readers may recall my fisking of a bogus Pennsylvania poll earlier this year. That was easy…they were amateurs. But the issue could be difficult to identify in real time with a bit more competence.

If you haven’t read Goldy’s takedown of Remington Research and, to a lesser extent, Trafalgar Group, read it now. This development has me down, because I am now in the predicament of ignoring suspicious polls, or playing “poll cop,” something I detest.

I frequently laugh (and mock) right-wingers and their beefs with individual polls or pollsters—think “Unskewed Polls” from 2012. The charges usually amount to nothing but hating the fact that their candidate is losing. Get a grip!

But now, that is what I seem to be doing—calling out a pollster (or two) for bogus polls. There is one difference: my motivation is to be as accurate as possible. Bogus polls fuck with me, whether they come up showing the Democrat up or the Republican up. If lefties launched an effort to dump bogus pro-Democrat polls on the market in order fuck up the aggregated results, I would call the out as well, because their bogus polls would fuck with the accuracy of my polling analyses.

As an aside, remember Birther Queen Orly Taitz? She wrote a blog post a couple of days ago calling upon both Remington Research and Trafalgar Group to swoop into New Hampshire to stop the polling trend there. Here is part of her screed:

However, Democratic Party came up with an absolutely bogus poll from WMUR/UNH, showing Trump behind by 11%, which lowered his average from winning to being 0.6% behind and his total electoral votes from winning 270 to 266. This is a clear psy-op. You can see that all the polls showed Trump leading from 5% to Clinton leading by 1%. This poll showing Clinton ahead by 11% in NH is a complete farce. By the way, Republican senator, Kelly Ayotte, is leading by the same margin and the same WMUR/UNH came with a similar bogus poll to show her far behind.
Remington and Trafalgar pollsters need to post a true poll for New Hampshire ASAP. This poll will move Trump into the lead in NH and his total electoral votes will go up from 266 to winning 270!
Readers of Taitz report are asked to call Remington and Trafalgar pollsters and urge them to publish asap a true poll for New Hampshire

Huh…

So, now I am in a position of having TWO sets of results. One set based on my stated rules and assumptions set out over a year ago. And I have the results excluding Remington Research and Trafalgar Group, whose polls I strongly suspect are bogus. Fuck, I hate that!

If these two pollsters are legit (full write-up): Clinton wins with 84.5% probability with, on average, 282 votes. The most likely outcome is 276 electoral votes with a 9.03% probability
If they are bogus, as I suspect (full write-up): Clinton wins with 98.7% probability with, on average, 310 votes. The most likely outcome is 305 electoral votes with a 5.09% probability

With any luck, we’ll know which version corresponds most closely to reality very soon. Finally, here are the relevant graphics, side by side. Click for a larger image.

Comments

1

Mark Adams spews:

Tuesday, 11/8/16 at 10:46 pm

The poll that matters is actually almost done. Looks like there has been a problem with the polls this around as there has been all this election.

Maybe people are lying to the pollsters. Maybe there are faulty assumptions, Maybe the right voters are not being polled. Or it could just be those pesky third candidates. Not big numbers for those pesky guys, but enough to impact a close election where neither candidate is getting over %50 of the vote in most states, even those they won.
2

Chris Stefan spews:

Wednesday, 11/9/16 at 12:35 pm

As I didn’t follow the individual polls much how many were likely voter vs. registered voter polls? Did any of the pollsters give insight as to the turnout models they were using?

Follow-up: Poll aggregation and more on bogus pollsters

Related

Comments

Can’t Bring Yourself to Type the Word “Ass”?

Search HA

Follow Goldy

HA Commenting Policy

Share:

Related

Comments

Can’t Bring Yourself to Type the Word “Ass”?

Search HA

Follow Goldy

HA Commenting Policy