Not very reassuring. Struggling to model the electorate. 2020 was worse, somehow, than 2016. We're going to have to hope they've found the variables they were missing before (that led to Trump overperformance relative to poll predictions) This should make any democrat very apprehensive.
One month since she entered the presidential race, Kamala Harris has a small but clear lead over Donald Trump, if the polls are to be trusted. But after the past two presidential elections, trusting the polls might feel like a very strange thing to do.
The 2016 election lives in popular memory as perhaps the most infamous polling miss of all time, but 2020 was quietly even worse. The polls four years ago badly underestimated Trump’s support even as they correctly forecast a Joe Biden win. A comprehensive postmortem by the American Association for Public Opinion Research concluded that 2020 polls were the least accurate in decades, overstating Biden’s advantage by an average of 3.9 percentage points nationally and 4.3 percentage points at the state level over the final two weeks of the election. (In 2016, by contrast, national polling predicted Hillary Clinton’s popular-vote margin quite accurately.) According to The New York Times, Biden led by 10 points in Wisconsin but won it by less than 1 point; he led Michigan by 8 and won by 3; he led in Pennsylvania by 5 and won by about 1. As of this writing, Harris is up in all three states, but by less than Biden was. A 2020-size error would mean that she’s actually down—and poised to lose the Electoral College.
The pollsters know they messed up in 2020. They are cautiously optimistic that they’ve learned from their mistakes. Of course, they thought that last time too.
How did the polls get worse from 2016 to 2020, with everyone watching? In the aftermath of Trump’s surprise 2016 victory, the public-opinion-research industry concluded that the problem was educational polarization. If pollsters had made a point of including enough white people without college degrees in their samples, they wouldn’t have underestimated Trump so badly. During the 2020 cycle, they focused on correcting that mistake.
It didn’t work. Even though polls in 2020 included more white non-college-educated voters, they turned out to be disproportionately the white non-college-educated voters who preferred Biden. The new consensus is that Republican voters are less likely to respond to polls in the first place, even controlling for education level. (To put it more nerdily, partisan preference correlates independently with willingness to take a poll, at least when Trump is on the ballot.) Don Levy, the director of the Siena College Research Institute, which conducts polls on behalf of The New York Times, calls the phenomenon “anti-establishment response bias.” The more someone distrusts mainstream institutions, including the media and pollsters, the more likely they are to vote for Trump.
Read: The polling crisis is a catastrophe for American democracy
Levy told me that, in 2020, the people working the phones for Siena frequently reported incidents of being yelled at by mistrustful Trump supporters. “In plain English, it was not uncommon for someone to say, ‘I’m voting for Trump—**** you,’” and then hang up before completing the rest of the survey, he said. (So much for the “shy Trump voter” hypothesis.) In 2020, those responses weren’t counted. This time around, they are. Levy told me that including these “partials” in 2020 would have erased nearly half of Siena’s error rate.
That still leaves the other half. Another complication is that most pollsters have given up on live calls in favor of online or text-based polls, meaning they have no angry partials to include. And so pollsters are trying variations of the same technique: getting more likely-Trump voters into their data sets. If a lower percentage of Republican-leaning voters respond to polls, then maybe you just need to reach out to a larger number.
This might sound obvious, but it entails an uncomfortable shift for the industry. Public pollsters have traditionally stuck to the politically neutral categories found in the census when assembling or weighting their samples: age, gender, race, and so on. The theory was that if you built your sample correctly along demographic lines—if you called the right number of white people and Latinos, evangelicals and atheists, men and women—then an accurate picture of the nation’s partisan balance would naturally emerge.
“In 2016, the feeling was that the problem we had was not capturing non-college-educated white voters, particularly in the Midwest,” Chris Jackson, the head of U.S. public polling at Ipsos, told me. “But what 2020 told us is that’s not actually sufficient. There is some kind of political-behavior dimension that wasn’t captured in that education-by-race crosstab. So, essentially, what the industry writ large has done is, we’ve started really looking much more strongly at political variables.”
Pollsters were once loath to include such variables, because modeling the partisan makeup of the electorate is an inexact science—if it weren’t, we wouldn’t need polls in the first place. But after its failure in 2020, the industry has little choice. “There’s no avoiding coming up with a hypothesis as to the composition of the electorate,” Matt Knee, who runs polls and analytics for Republican campaigns, told me. “Choosing to throw up your hands on the most important predictor of how someone’s going to vote, and saying ‘That’s not a valid thing to include in my hypothesis,’ just doesn’t make sense.”
Read: Return of the people machine
Some pollsters are leaning on state-level voter files to get the right balance of Democrats and Republicans into their samples. Another approach is to use “recalled vote”: asking people whom they voted for in 2020 and making sure that the mix of respondents matches up with the actual results. (If a state went 60 percent for Trump, say, but only 50 percent of the respondents say they voted for Trump, the pollster would either call more Trump-2020 voters or weight their responses more heavily after the fact.)
Each technique has its limitations. Party registration doesn’t match up perfectly with voting preferences. Some states, including Michigan and Wisconsin, don’t even have party registration, meaning pollsters have to rely on modeled partisanship based on factors such as age, gender, and religion. Recalled vote might be even shakier: Quite a lot of people misremember or lie about their voting history. Many say they voted when they in fact did not, and some people who voted for the loser will claim that they voted for the winner. Levy told me that when Siena experimented with using recalled vote in 2022, it made some results less accurate.
Still, pollsters see signs for hope. “People who told us they voted for Trump in 2020 are responding at the same rates as people who told us they voted for Biden in 2020,” said Jackson, from Ipsos, which “suggests we’re not having a really strong systemic bias.” The New York Times poll master Nate Cohn made a similar observation in a recent interview with The New Yorker: Democrats were much likelier to respond to Times polls in 2020, but this year, “it’s fairly even—so I’m cautiously optimistic that this means that we don’t have a deep, hidden non-response bias.” Another difference between 2020 and now: There is no pandemic. Some experts believe that Democratic voters were more likely to answer surveys in 2020 because they were more likely than Republicans to be at home with little else to do.
What’s clear at this point is that the election is close, and Harris is in a stronger position than Biden was. Natalie Jackson, a Democratic pollster at GQR Research, told me that if Harris’s numbers were just a result of energized Democrats being in the mood to answer polls, then Democrats would be seeing a comparable bump in generic congressional polls. The fact that they aren’t suggests that the change is real. “Trump’s numbers haven’t moved,” Jackson said. “This is all shifting from third party or undecided to Democrat.”
Like Olympic athletes, political pollsters spend four years fine-tuning their craft, but don’t find out whether their preparations were adequate until it’s too late to do anything differently. The nonresponse bias that bedeviled the polls in 2020 is not an easy thing to fix. By definition, pollsters know very little about the people who don’t talk to them. If Trump outperforms the polls once again, it will be because even after all these years, something about his supporters remains a mystery.
The Asterisk on Kamala Harris’s Poll Numbers
Pollsters think they’ve learned from their mistakes in 2020. Of course, they thought that last time too.
www.theatlantic.com
One month since she entered the presidential race, Kamala Harris has a small but clear lead over Donald Trump, if the polls are to be trusted. But after the past two presidential elections, trusting the polls might feel like a very strange thing to do.
The 2016 election lives in popular memory as perhaps the most infamous polling miss of all time, but 2020 was quietly even worse. The polls four years ago badly underestimated Trump’s support even as they correctly forecast a Joe Biden win. A comprehensive postmortem by the American Association for Public Opinion Research concluded that 2020 polls were the least accurate in decades, overstating Biden’s advantage by an average of 3.9 percentage points nationally and 4.3 percentage points at the state level over the final two weeks of the election. (In 2016, by contrast, national polling predicted Hillary Clinton’s popular-vote margin quite accurately.) According to The New York Times, Biden led by 10 points in Wisconsin but won it by less than 1 point; he led Michigan by 8 and won by 3; he led in Pennsylvania by 5 and won by about 1. As of this writing, Harris is up in all three states, but by less than Biden was. A 2020-size error would mean that she’s actually down—and poised to lose the Electoral College.
The pollsters know they messed up in 2020. They are cautiously optimistic that they’ve learned from their mistakes. Of course, they thought that last time too.
How did the polls get worse from 2016 to 2020, with everyone watching? In the aftermath of Trump’s surprise 2016 victory, the public-opinion-research industry concluded that the problem was educational polarization. If pollsters had made a point of including enough white people without college degrees in their samples, they wouldn’t have underestimated Trump so badly. During the 2020 cycle, they focused on correcting that mistake.
It didn’t work. Even though polls in 2020 included more white non-college-educated voters, they turned out to be disproportionately the white non-college-educated voters who preferred Biden. The new consensus is that Republican voters are less likely to respond to polls in the first place, even controlling for education level. (To put it more nerdily, partisan preference correlates independently with willingness to take a poll, at least when Trump is on the ballot.) Don Levy, the director of the Siena College Research Institute, which conducts polls on behalf of The New York Times, calls the phenomenon “anti-establishment response bias.” The more someone distrusts mainstream institutions, including the media and pollsters, the more likely they are to vote for Trump.
Read: The polling crisis is a catastrophe for American democracy
Levy told me that, in 2020, the people working the phones for Siena frequently reported incidents of being yelled at by mistrustful Trump supporters. “In plain English, it was not uncommon for someone to say, ‘I’m voting for Trump—**** you,’” and then hang up before completing the rest of the survey, he said. (So much for the “shy Trump voter” hypothesis.) In 2020, those responses weren’t counted. This time around, they are. Levy told me that including these “partials” in 2020 would have erased nearly half of Siena’s error rate.
That still leaves the other half. Another complication is that most pollsters have given up on live calls in favor of online or text-based polls, meaning they have no angry partials to include. And so pollsters are trying variations of the same technique: getting more likely-Trump voters into their data sets. If a lower percentage of Republican-leaning voters respond to polls, then maybe you just need to reach out to a larger number.
This might sound obvious, but it entails an uncomfortable shift for the industry. Public pollsters have traditionally stuck to the politically neutral categories found in the census when assembling or weighting their samples: age, gender, race, and so on. The theory was that if you built your sample correctly along demographic lines—if you called the right number of white people and Latinos, evangelicals and atheists, men and women—then an accurate picture of the nation’s partisan balance would naturally emerge.
“In 2016, the feeling was that the problem we had was not capturing non-college-educated white voters, particularly in the Midwest,” Chris Jackson, the head of U.S. public polling at Ipsos, told me. “But what 2020 told us is that’s not actually sufficient. There is some kind of political-behavior dimension that wasn’t captured in that education-by-race crosstab. So, essentially, what the industry writ large has done is, we’ve started really looking much more strongly at political variables.”
Pollsters were once loath to include such variables, because modeling the partisan makeup of the electorate is an inexact science—if it weren’t, we wouldn’t need polls in the first place. But after its failure in 2020, the industry has little choice. “There’s no avoiding coming up with a hypothesis as to the composition of the electorate,” Matt Knee, who runs polls and analytics for Republican campaigns, told me. “Choosing to throw up your hands on the most important predictor of how someone’s going to vote, and saying ‘That’s not a valid thing to include in my hypothesis,’ just doesn’t make sense.”
Read: Return of the people machine
Some pollsters are leaning on state-level voter files to get the right balance of Democrats and Republicans into their samples. Another approach is to use “recalled vote”: asking people whom they voted for in 2020 and making sure that the mix of respondents matches up with the actual results. (If a state went 60 percent for Trump, say, but only 50 percent of the respondents say they voted for Trump, the pollster would either call more Trump-2020 voters or weight their responses more heavily after the fact.)
Each technique has its limitations. Party registration doesn’t match up perfectly with voting preferences. Some states, including Michigan and Wisconsin, don’t even have party registration, meaning pollsters have to rely on modeled partisanship based on factors such as age, gender, and religion. Recalled vote might be even shakier: Quite a lot of people misremember or lie about their voting history. Many say they voted when they in fact did not, and some people who voted for the loser will claim that they voted for the winner. Levy told me that when Siena experimented with using recalled vote in 2022, it made some results less accurate.
Still, pollsters see signs for hope. “People who told us they voted for Trump in 2020 are responding at the same rates as people who told us they voted for Biden in 2020,” said Jackson, from Ipsos, which “suggests we’re not having a really strong systemic bias.” The New York Times poll master Nate Cohn made a similar observation in a recent interview with The New Yorker: Democrats were much likelier to respond to Times polls in 2020, but this year, “it’s fairly even—so I’m cautiously optimistic that this means that we don’t have a deep, hidden non-response bias.” Another difference between 2020 and now: There is no pandemic. Some experts believe that Democratic voters were more likely to answer surveys in 2020 because they were more likely than Republicans to be at home with little else to do.
What’s clear at this point is that the election is close, and Harris is in a stronger position than Biden was. Natalie Jackson, a Democratic pollster at GQR Research, told me that if Harris’s numbers were just a result of energized Democrats being in the mood to answer polls, then Democrats would be seeing a comparable bump in generic congressional polls. The fact that they aren’t suggests that the change is real. “Trump’s numbers haven’t moved,” Jackson said. “This is all shifting from third party or undecided to Democrat.”
Like Olympic athletes, political pollsters spend four years fine-tuning their craft, but don’t find out whether their preparations were adequate until it’s too late to do anything differently. The nonresponse bias that bedeviled the polls in 2020 is not an easy thing to fix. By definition, pollsters know very little about the people who don’t talk to them. If Trump outperforms the polls once again, it will be because even after all these years, something about his supporters remains a mystery.