This is part 2 of a series on storing and managing social science data in relational databases. If you haven’t already, read part 1 to get up to speed.
To help illustrate some of the concepts introduced in Part 1, I’m going to use the Dynamics of Collective Action (DOCA) dataset to design an efficient relational database. You can download the entire dataset at the DOCA website. For our purposes, I’m going to refer to a 15 case sample, which you can view here. The DOCA dataset consists of data coded from newspaper coverage of protest events. Each case is a distinct protest event. Table 1 contains a description of each of the over 100 variables in the dataset. For more detailed description of the dataset, you can browse the DOCA website.
This is the first of a series of posts about storing social science data in relational databases. I’ve found that grad school has prepared me for many things: using statistical software to do complex analyses, writing up academic papers and submit them for review to journals, collecting and coding data for research projects, presenting and networking at conferences. But I never learned how to properly keep track and store the data I’ve collected throughout my young academic career. Instead, I’ve cobbled together techniques, borrowing heavily from my skills as amateur computer and web programmer, that seem to work for me. In an effort to get more social scientists to think about this, I’ve decided to share some of my techniques, beginning with storing data in databases.