This is part 2 of a series on storing and managing social science data in relational databases. If you haven’t already, read part 1 to get up to speed.
To help illustrate some of the concepts introduced in Part 1, I’m going to use the Dynamics of Collective Action (DOCA) dataset to design an efficient relational database. You can download the entire dataset at the DOCA website. For our purposes, I’m going to refer to a 15 case sample, which you can view here. The DOCA dataset consists of data coded from newspaper coverage of protest events. Each case is a distinct protest event. Table 1 contains a description of each of the over 100 variables in the dataset. For more detailed description of the dataset, you can browse the DOCA website.