"Hillary Clinton holds a 12-point lead over Bernie Sanders nationally, but in a hypothetical match-up against Donald Trump, Sanders does much better than the current Democratic front-runner."
That was the lead paragraph from an NBC News article about the 2016 presidential election. And yes, that's what the polling data said.
But - there's almost always a but - did you see where the data comes from? At the bottom of the article, they disclose that the poll was conducted online among adults who say they're registered to vote, and who take surveys on the SurveyMonkey platform.
Do you see the concerns?
- The poll was conducted online - which means if you don't go online, you wouldn't have been part of the sample set
- It relied on self-reported data - that is, adults who claimed that they're registered to vote
- The only people surveyed were those who use SurveyMonkey's platform
To their credit, according to the full methodology, data was "weighted for age, race, sex, education, region, and voter registration status" using Census Bureau and other data "to reflect the demographic composition of the United States." But that doesn't erase the fact that, as they state, "the sample is based on those who initially self-selected for participation," which in this case means "no estimates of sampling error can be calculated."
Are people who take online surveys - often for a chance to win a prize - representative of the general population? Even if they're weighted after the fact, we'd want to take a closer look at the data. Maybe it's perfectly valid. But, as I explain in my book, these are exactly the types of questions you should be asking - as an educated consumer of data - when you look at polls in the news.