Where Does Polling Data Come From?

"Hillary Clinton holds a 12-point lead over Bernie Sanders nationally, but in a hypothetical match-up against Donald Trump, Sanders does much better than the current Democratic front-runner."

That was the lead paragraph from an NBC News article about the 2016 presidential election. And yes, that's what the polling data said. 

But - there's almost always a but - did you see where the data comes from? At the bottom of the article, they disclose that the poll was conducted online among adults who say they're registered to vote, and who take surveys on the SurveyMonkey platform.

Do you see the concerns? 

  • The poll was conducted online - which means if you don't go online, you wouldn't have been part of the sample set
  • It relied on self-reported data - that is, adults who claimed that they're registered to vote
  • The only people surveyed were those who use SurveyMonkey's platform

To their credit, according to the full methodology, data was "weighted for age, race, sex, education, region, and voter registration status" using Census Bureau and other data "to reflect the demographic composition of the United States." But that doesn't erase the fact that, as they state, "the sample is based on those who initially self-selected for participation," which in this case means "no estimates of sampling error can be calculated."

Are people who take online surveys - often for a chance to win a prize - representative of the general population? Even if they're weighted after the fact, we'd want to take a closer look at the data. Maybe it's perfectly valid. But, as I explain in my book, these are exactly the types of questions you should be asking - as an educated consumer of data - when you look at polls in the news.

Trump and Bush - Tied?

A New York Times article says that Donald Trump is "in a statistical tie" with Jeb Bush. Trump has 17%. Bush has 14%. The difference is within the margin of error for the Suffolk University / USA Today poll.

What's interesting to us is how the data has been cherry picked by the media. Consider these three different headlines, all based on the same survey:

Is Trump "on top," "tied" or "on the rise"? Perhaps it depends on your perspective. (It also depends on statistical significance - was the sample size large enough, how did they calculate the margin of error, etc.).

Oh, and another example of cherry picking - 30% of Republicans are undecided - roughly the same percentage who named Trump and Bush combined.

But that doesn't make the headlines.