Evaluation Tools

Breaking down our methods


In the annual analysis, the demography of motorists is grouped into four overlapping categories to ensure a large enough sample size for the statistical analysis. Although much of the analysis focuses on stops made of black (Hispanic or non-Hispanic) and Hispanic motorists (any race), the analysis is also conducted for aggregated groupings of all non-white motorists (Hispanic or non-Hispanic) as well as a combined sample of black and Hispanic motorists.

Male profile. identifying departments or state police troops in individual tests, the estimated disparity (i.e. the higher likelihood of stopping a minority motorist) must have been estimated with at least a 95 percent level of statistical significance for either black or Hispanic motorists alone.

Estimating Disparity

In terms of identifying departments or state police troops in individual tests, the estimated disparity (i.e. the higher likelihood of stopping a minority motorist) must have been estimated with at least a 95 percent level of statistical significance for either black or Hispanic motorists alone. Put simply, under the rigorous conditions set by each test, there must have been at least a 95 percent chance that either black or Hispanic motorists were more likely to be stopped (or searched) at a higher rate relative to white non-Hispanic motorists. Below is a brief summary of the analytical tools used to evaluate racial disparities in traffic stops. Full details for each method are outlined in the appendix of each report.

Veil of Darkness

A method referred to as the Veil of Darkness is used to assess the existence of racial and ethnic disparities in stop data. The test is a statistical technique that was developed by Jeffery Grogger and Greg Ridgeway (2006) and published in the Journal of the American Statistical Association. The Veil of Darkness analysis examines a restricted sample of stops occurring during the “inter-twilight window” and assesses relative differences in the ratio of minority to non-minority stops that occur in daylight as compared to darkness.

The inter-twilight window restricts stops to a fixed window of time throughout the year when visibility varies due to seasonality as well as the discrete daylight savings time shift. This technique relies on the idea that, if police officers are profiling motorists, they are better able to do so during daylight hours when race and ethnicity is more easily observed. After restricting the sample of stops to the inter-twilight window and controlling for things like the time of day and day of week, any remaining difference in the likelihood a minority motorist is stopped during daylight is attributed to disparate treatment. This analytical approach is considered the most rigorous and broadly applicable of all the tests presented in this report

Synthetic Control Method

The synthetic control method is where the number of minority traffic stops in a given department is evaluated against a benchmark constructed using stops made by all other departments in Connecticut. Since departments differ in terms of their enforcement activity (i.e. time of stops, reason for stops, etc.) and the underlying demographics of the population on the roadway, this analysis relies on the rich statistical literature on propensity scores. Here, a propensity score is a measure of how similar a stop made outside a given department is to a stop made by the department being analyzed.

These measures of similarity are used to weight stops when constructing an individual benchmark for each department. For example, if the department being analyzed has a high minority population and makes most of their stops on Friday nights at 7PM for speeding violations then stops made for speeding violations by departments with a similar residential population at this time and day will be given more weight when constructing the benchmark. This methodology ensures that there is an apples-to-apples comparison between the number of minorities stopped in a given town relative to their benchmark and allows for the interpretation of any remaining differences to be attributed to possible disparate treatment.

Three Benchmarks

The next three methods contained in our reports are descriptive in nature and compare department-level data to three benchmarks (statewide average, estimated commuter driving populations, and resident population). These methods are referred to as population benchmarks and are commonly used to evaluate racial disparities in police data across the country.

The statewide average comparison provides a simple and effective way to establish a baseline for all departments from which the relative differences between department stop numbers and the average for the state are compared. A comparison to the statewide average is presented alongside the context necessary to understand differences between local jurisdictions.

Next, researchers adjust “static” residential census data to approximate the estimated driving demographics in a particular jurisdiction. Residential census data can be modified to create a reasonable estimate of the possible presence of many nonresidents likely to be driving in a given community because they work there and live elsewhere. This estimate is a composition of the driving population during typical commuting hours based on data provided by the U.S. Census Bureau. 
The final population benchmark comparison limits the analysis to stops involving only residents of the community and compares them to the community demographics based on the 2010 decennial census for residents age 16 and over. Although any one of these benchmarks cannot provide by itself a rigorous enough analysis to draw conclusions regarding racial disparities, if taken together with the more rigorous statistical methods they do serve as a useful tool.


We also test for disparities in the outcomes of traffic stops using a model that examines the distribution of dispositions conditional on race and the reason for the stop. Specifically, we test whether traffic stops made of minority motorists result in different outcomes relative to their white non-Hispanic peers.

Post-Stop Outcomes

Lastly, an analysis of post-stop outcomes using a hit-rate approach following a technique published in the Journal of Political Economy by Knowles, Persico and Todd (2001). The hit-rate approach relies on the idea that motorists rationally adjust their propensity to carry contraband in response to their likelihood of being searched by police. Similarly, police officers rationally decide whether to search a motorist based on visible indicators of guilt and an expectation of the likelihood that a given motorist might have contraband.

According to the model, a demographic group of motorists would be searched by police more often than white non-Hispanic motorists if they were more likely to carry contraband. However, the higher level of searches should be exactly proportional to the higher propensity for this group to carry contraband. Thus, in the absence of racial animus, we should expect the rate of successful searches (i.e. the hit-rate) to be equal across different demographic groups regardless of differences in their propensity to carry contraband.