Washington University Law CERL
  Center for Empirical Research in the Law
 


EEOC Litigation Analysis
overview backend
Project Details
TECHNOLOGIES USED
php, mysql, css, x/html
DATA FIELDS
per-case variables collected : 127
DEVELOPMENT TIMEFRAME
backend site : 1 ft / 4 weeks
View the EEOC original DESIGN suggestions:
The EEOC Data Collector
EEOC Backend
Twenty years ago the process of conducting empirical research was vastly different from today. Converting raw data into a computable format was extremely time-consuming, oftentimes requiring expensive statistical software that was prone to data entry errors and difficult to use. Given the labor necessary to undertake a project, and the organizational requirements therein, the number of variables to be captured was necessarily small. Once data targets were identified, a coding schematic had to be devised. This schematic had to be easily discernible by the human coders. If it was not, the validity of the captured data would be dubious. Finally, aggregating the collected data into a single data file for analysis required a great deal of effort. When mistakes were found it was oftentimes necessary to go back into the field and recode data. This workflow persists today in much empirical work and remains a significant obstacle for large-scale empirical projects. At CERL we look to provide expertise and infrastructure to scholars so they may further the substantive nature of their work.

Using this project as an example, we asked the PI’s — Professors Kim ( site ), Martin ( site ), and Schlanger ( site ) — to go through several of the relevant cases and note the data they would require to undertake their study. When they turned the variable list over to CERL, there were more than one hundred individual data points to capture. Some of these data points had long and complex possibilities (i.e., Management Order Types) and some of them had the need to be redundantly captured (i.e., Docket Motions). After extending the initial sample to include all the possible variables, we created a database driven web environment that would allow our forty plus research assistants to wade through more than 2,000 cases over a six month period, coding over one hundred fields for each case.

Additionally for this project, CERL developed two new pieces of functionality. First, given the large number of RAs involved, a system had to be devised where a graduate student project manager (CERL GSA Christina Boyd) could easily distribute work, answer questions from research assistants, and oversee the effort from a single location since we had coders spread over time and distance. Second, the nature of this analysis required the ability to perform inter-coder reliability tests during and after the collection process. This complicated the database and application model significantly as we had multiple people coding the same information and the obvious need to keep all of their observations completely separate for data integrity and subsequent comparisons.

The resulting application ( design ), pictured below, serves the need admirably and enabled CERL and its collaborators to sift through 2,000+ cases and their relevant documents, coding over 120 variables per case, all in a six month period.

Basic Information
EEOC Basic Information
  Washington University / School of Law / Campus Box 1120 / St. Louis MO 63130
cerl@law.wustl.edu