The last five years have seen an explosion in the amount of data available to social scientists. Thanks to Twitter, blogs, online government databases, and advances in text analysis techniques, data sets with millions and millions of observations are no longer a rarity (Lohr, 2012). Although a blessing, these extremely large data sets can cause problems for political scientists working with standard statistical software programs, which are poorly suited to analyzing big data sets. At best, analyzing massive data sets can result in prohibitively long computing time; at worst, it can lead to repeated crashing, making anything beyond calculating the simplest of summary statistics impossible. The volume of data available to researchers is, however, growing faster than computational capacities, making developing techniques for how to handle “Big Data” is essential.


Blackwell, Matthew, and Maya Sen. "Large Datasets and You: A Field Guide." The Political Methodologist 20.1 (2012): 2-5.