Since the Secretary of State’s office started releasing final numbers this week, it has become clear that R-71 is headed for the ballot. Short of some scandalous revelation—you know, like finding out that the numbers being released are not the final numbers—the measure should make the ballot using standard statistical inference.
(I kid the SoS with that “scandalous revelation” quip. In fact, they have done a remarkable job turning last week’s data disaster around. The data are now provided in excruciating detail and they have carefully described the meanings behind the numbers, both on the official release page and on their blog. David Ammons has been kept busy answering questions in both blog posts and the comment threads. And now Elections Director Nick Handy has a nifty R-71 FAQ.)
Back to the projections. One point that has repeatedly come up in the comment threads is that the signatures sampled so far may not reflect a random sample of all signatures. Thus, the statistical inference may be wrong.
The point is valid because the statistical methods do assume that the sampled signatures approximate a random sample. One can imagine scenarios where the error rate uncovered would change systematically with time. For example, if petition sheets were checked in chronological order of collection, the duplication rate might increase if early signers forgot they already signed, or if the pace and sloppiness of collection increased toward the end.
For R-71, we don’t know that the petition sheets are being examined in anything approaching a chronological order. The SoS FAQ states:
Signature petitions are randomly bound in volumes of 15 petition sheets per volume.
Rather than speculate on the systematic error, let’s examine some real data. The SoS office releases data that give the numbers of signatures checked and errors for each bound volume in the approximate chronological order of signature verification. As of yesterday, there were 209 completed volumes covering 35% of the total petition.
After the fold, I give a brief section on analytical details, and then show graphs of the trends over time in error rates and projected numbers of valid signatures. But first, I give an update on today’s data release.
[Read more…]