Prompted by an opportunity
Preservation and reuse of data are topics of growing interest among many disciplines. Much of the discussion initially focused on the large data sets such as those generated by astronomers and computational biologists, where data can easily fall in the terabyte range.
However, there are data sets of all sizes, and many possible solutions to archiving them and making them available for use by others.
Several years ago, users of AgEcon Search, http://ageconsearch.umn.edu/ a subject repository for full text of conference papers, working papers and journal articles, began to inquire about the possibility of archiving data. AgEcon Search has been in operation since 1994, and covers agricultural, development, energy, and resource economics. Initial interest in data came from instructors who wanted their students to have access to research papers and the related data sets so that they could learn by replicating the work of the authors.
In 2010 we responded a specific request from the Federal Council of the Australian Agricultural and Resource Economics Society (AARES). They asked if we would consider housing the data associated with the articles in their journal, Australian Journal of Agricultural and Resource Economics. We agreed to investigate the possibilities and report back.
Unlike some other data-rich disciplines, economists do not have large, well-established data repositories that would welcome their data. A few journals provide a Web site for data associated with the papers that they publish, although the preservation practices are often not terribly robust, and a small but growing number of countries have national repositories that economists can utilize.
An inexpensive solution
As we looked for a solution that would fit our needs, we had no budget, but we did have a librarian with expertise in social sciences data and some staff time to devote to the task.
We considered including the AARES data in the current AgEcon Search software, but this would not be ideal. We optimized our installation for text-based documents and the metadata scheme lacks fields for many elements important for properly describing data, such as time period, frequency, smallest geographic area covered, and population studied. Also, while the data would be archived for the long term, it may not be seamless to reuse it. We also had no way to link records within the software. Since we wanted records for the data which were separate from records for the articles, the ability to link was key.
After investigation, we discovered another economics document repository that had taken on a similar challenge to archive data. Economists Online, http://www.economistsonline.org/, based in Europe, developed a separate site to house the data related to their documents, with links back to the papers, at http://dvn.iq.harvard.edu/dvn/dv/NEEO.
The Economists Online data site uses Dataverse, http://thedata.org/ developed by Harvard’s Institute for Quantitative Social Science (IQSS). IQSS maintains both a hosted site that is free to all to use, and organizations may also download and install the open source version of the software.
There are several other economics-related groups that are using the hosted version of Dataverse. They include journals, NGOs, individuals, and research groups. A complete list of those using the hosted Dataverse site is at http://dvn.iq.harvard.edu/dvn/
Features of Dataverse include:
- It is hosted remotely
- It utilizes appropriate standards for social sciences data
- Setting up a section in Dataverse does not require specialized knowledge
- Other economics groups deposit in Dataverse, so it is already a destination
- Data sets in Dataverse are ranked highly in Google searches
- It is free of charge, with good tech support via e-mail
Also, although the URL of any of the hosted Dataverse sites does contain the word Harvard, there is no other mention of it on the individual Dataverse sites. A group’s Dataverse site may be customized easily, as we did, by adding graphics and links to visually tie the data site to the main AgEcon Search platform.
In an ideal future, large and robust data repositories for economics data may emerge, and we knew that if so, we would want to be able to export the records and data in the AgEcon Search data repository. We felt that Dataverse would afford that possibility. It uses the Data Documentation Initiative (DDI) http://www.ddialliance.org/ as the scheme for metadata, which is standard practice for social sciences data.
Small beginning for data archiving
Although data archiving is in its infancy in many of the sciences, social scientists have been working on the associated issues for the several decade. There is still much progress to be made, and we hope that the AgEcon Search Dataverse site is a small contribution to that effort.
When we agreed to work with the data from AARES and their journal, we had no idea how large the adoption would be. If the adoption rate was low, then a large investment of time and energy wouldn’t have been appropriate. On the other hand, we wanted to get the most out of the work we did do and to find a solution that would be flexible and allow for growth into the future.
The data sets that economists use are not always available to be shared publicly. Some may contain proprietary corporate information or material that would compromise personal privacy. In some cases, they use data that is already in the public domain, such as government-produced data. A few authors that we contacted were not interested in sharing particular data sets yet, since they were planning to do further analysis and publish additional papers. As a result, the adoption rate has been low among AARES authors, but we now have a platform with which we may approach other AgEcon contributors.
With over 250 groups contributing their documents to AgEcon Search, a next step is to offer them the possibility of including data related to their papers in Dataverse. We will be able to note the growing interest in the reuse and archiving of data as well as the relative ease of making the data ready for inclusion. There is also a growing body of research, including papers by Piwowar and Henneken, that conclude that papers are cited more often if the data behind the work is freely available (1,2).
Figure 1. A record for a document in AgEcon Search
Figure 2. A record for a data set in the AgEcon Search Dataverse
References
1. Piwowar, H.A., R. S. Day, and D. B. Fridsma. Sharing detailed research data is associated with increased citation rate. PLoS ONE 2(3): 2007, e308
2. Henneken, E.A. and A. Accomazzi Linking to Data-Effect on Citation Rates in Astronomy arXiv:1111.3618v1