Richard L. Smith
Department of Statistics
University of North Carolina, Chapel Hill


One of the strongest allegations of voting irregularities in the recent Presidential election was that a faulty ballot design in Palm Beach County, Florida, caused many votes to cast their ballots for the Reform Party candidate Patrick Buchanan, when they intended to vote for Al Gore. We examine this statistically, by performing a regression analysis of Buchanan's vote over all 67 counties of Florida, using demographic variables (population size, race, age distribution, education level and income) and votes for other candidates as covariates. A critical point of statistical implementation is to choose a suitable transformation of the response variable to achieve approximate homoscedasticity across counties. After considering a number of alternatives, a cube root transformation (of the number of votes cast for Buchanan) is chosen. Variable selection is performed using either backward selection or the Mallows C(p) criterion, leading to similar models. The results confirm that, if the regression model is fitted using all 67 counties, then Palm Beach is an enormous outlier, with a studentized residual of 13.3, for a theoretical significance level of around 10 to the power -67 under the standard normal-theory assumptions. If the regression model is fitted to the remaining 66 counties and used to predict the Buchanan vote in Palm Beach, we obtain a point prediction of 326 and a 95% prediction interval of (181,534), compared with the actual vote reported in initial returns of 3,407. These results demonstrate conclusively that the Palm Beach County vote was indeed anomalous.

Full paper:

postscript - postscript (A4 paper size) - pdf

Other analyses:

Link to Greg Adams' website.

Link to Jonathan O'Keeffe's website.

Data sets:

fldat1.txt Florida elections data set, ready to be read as an S-PLUS data frame.

fldat1.sasdat Florida elections data set, modified for SAS programs (remove header; recode Buchanan variable in line 50 as a missing value so that the variable selection is not distorted by Palm Beach).

fldat2.txt Explanations of Florida elections data set

Some of the programs used in the analysis:

fla.sas SAS program for variable selection using C(p) and backward selection.

fla.lst Output of previous SAS program.

fla1.sas SAS program for detailed model fitting and prediction.

fla1.lst Output of previous SAS program.

sfns.i S-PLUS functions to create studentized residuals and DFFITS from a data object created by 'lm' (run this once, before any other program requiring these quantities)

fl1.i S-PLUS program for data plots, model fitting and various outlier and influence diagnostics. (Warning: This takes some time to run! Edit the line nsim<-1000, defining the number of simulations, to something smaller if you just want to try the program out.)

fl3.i S-PLUS program to create plot of rescaled RSS against transformation parameter lambda.

Return to Richard Smith's page