Since the Secretary of State’s office started releasing final numbers this week, it has become clear that R-71 is headed for the ballot. Short of some scandalous revelation—you know, like finding out that the numbers being released are not the final numbers—the measure should make the ballot using standard statistical inference.
(I kid the SoS with that “scandalous revelation” quip. In fact, they have done a remarkable job turning last week’s data disaster around. The data are now provided in excruciating detail and they have carefully described the meanings behind the numbers, both on the official release page and on their blog. David Ammons has been kept busy answering questions in both blog posts and the comment threads. And now Elections Director Nick Handy has a nifty R-71 FAQ.)
Back to the projections. One point that has repeatedly come up in the comment threads is that the signatures sampled so far may not reflect a random sample of all signatures. Thus, the statistical inference may be wrong.
The point is valid because the statistical methods do assume that the sampled signatures approximate a random sample. One can imagine scenarios where the error rate uncovered would change systematically with time. For example, if petition sheets were checked in chronological order of collection, the duplication rate might increase if early signers forgot they already signed, or if the pace and sloppiness of collection increased toward the end.
For R-71, we don’t know that the petition sheets are being examined in anything approaching a chronological order. The SoS FAQ states:
Signature petitions are randomly bound in volumes of 15 petition sheets per volume.
Rather than speculate on the systematic error, let’s examine some real data. The SoS office releases data that give the numbers of signatures checked and errors for each bound volume in the approximate chronological order of signature verification. As of yesterday, there were 209 completed volumes covering 35% of the total petition.
After the fold, I give a brief section on analytical details, and then show graphs of the trends over time in error rates and projected numbers of valid signatures. But first, I give an update on today’s data release.
The third batch of R-71 (new format) data has been released this afternoon. The total signatures examined is now 50,493, (about 36.7% of the total). There have been 5,375 invalid signatures found, for an cumulative rejection rate of 10.65%. The invalid signatures include 4,692 that are not found in the voting rolls, 263 duplicates, and 420 that did not match the signature on file. There are also 19 signatures at various states of processing for a missing signature card.
The 263 duplicates suggest a total rate of about 1.73% for the petition.
Using the V2 estimator, the number of valid signatures is expected to be 121,798 leaving a surplus of 1,221 signatures over the 120,577 needed to qualify for the ballot. The rejection rate for the whole petition should be about 11.54%
A Monte Carlo simulation analysis give a 95% confidence interval for valid signatures of from 121,175 to 122,415. Here is the distribution of valid signatures:
Now I turn to an assessment of the trends over time in valid signatures and errors.
Details of the Analysis: This analysis uses the bound volume numbers released by the SoS office yesterday. For each bound volume* I simulate cumulative error numbers for each type of error for the 10,000 simulated petitions. The resulting error rates are projected to the expected number of errors of each type for the final petition. In this way, we can see how the projected numbers of errors change as the number of sampled signatures increases. The variability in the numbers of errors for each batch of simulations provides an unbiased estimate of statistical uncertainty about the median numbers of errors. I present them as 95% confidence intervals in the graphs. Other methodological details can be found here.
* In fact, I don’t show graphs for the first 10 volumes, simply because the numbers of invalid signatures are very small and the confidence intervals are huge.
Results: This graph shows the trend in the numbers of projected signatures not found on the voter rolls. As seen in all the graphs, the confidence in the final projection increases (confidence intervals decrease) as more volumes are completed.
Aside from some instability when the number are small (through about the first 30 volumes), the expected number of “missing from voter roll” signatures is relatively consistent over time. The final numbers should be about 12,800 ± 500.
Here are the projected final number of duplicates:
Here, again, after volume 30, the projected number of duplicates stabilizes. There is a slight trend toward decreasing duplicates from 3,100 around volume 90 to 2,400 in the most recent volumes.
The mismatched signature projections shows a trend in the opposite direction:
There were about 800 projected signature mismatchs around volume 90. The cumulative results at present suggests about 1,200 projected mismatches. This trend is almost enough to cancel the decreasing trend in duplicates.
The trend in projected valid signatures shows remarkable stability. The red line is the number of signatures needed to qualify for the fall ballot.
From about volume 30 on, the projected point estimates have always had the measure qualifying, although the sample size was too small for any certainty through volume 130.
Since volume 140, the projections have slightly bounced around 121,800, well within the margin of error (currently ± 650), and with a 95% confidence interval that is well above the minimum number of signatures needed to qualify for the ballot.
The conclusion is that there is little reason to believe R-71 will fail to qualify for the ballot. The opposing trends in duplicates and mismatch signatures are worth watching, but are probably not sufficient to put R-71 out of business. The projected number of valid signatures, to date, has shown remarkable stability.