0
Editorial |

Data Sharing and Duplication: Title and subTitle BreakIs There a Problem?

Christine A. Bachrach, PhD; Rosalind B. King, PhD
[+] Author Affiliations

Not Available


Copyright 2004 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.

More Author Information
Arch Pediatr Adolesc Med. 2004;158(9):931-932. doi:10.1001/archpedi.158.9.931
Text Size: A A A
Published online

Despite concerns in some scientific fields, data sharing has come of age. In 1985, the National Academy of Sciences' Committee on National Statistics endorsed the goal of wide access to research data and proffered recommendations for how and when data should be shared.1 Nearly 2 decades later, after numerous similar reports,2 4 a clear mandate is emerging. The National Science Foundation,5 the National Institutes of Health,6 and numerous professional societies and scientific journals have all taken steps to encourage, if not require, research scientists to share their research data and materials with other scientists. Some questions remain, however, as scientists struggle to learn how to carry out this mandate in ways that are effective, efficient, and ethically sound.

The article in this issue of the ARCHIVES by van Dyck et al7 uses data from a Maternal and Child Health Bureau study conducted by the National Center for Health Statistics, a pioneer in data sharing. van Dyck and colleagues use data from the National Survey of Children with Special Health Care Needs to provide a descriptive portrait of children with special health care needs and of disparities among these children with respect to access to care, satisfaction with care, and family impact. The publication in the journal Pediatrics of another article8 using the same data just months before the van Dyck article went to press raised the concern that data sharing may create inefficiencies in science by fostering duplication of effort. We examine this concern and what might be done to address it.

The idea that data sharing could lead to duplication was raised in the 1985 National Academy of Sciences report but has not been widely discussed. However, it is closely related to the often-expressed concern that data sharing could enable secondary analysts to publish key results from a data set before the producers of the data are able to do so.1 ,3 This concern has generally been addressed by allowing a limited period of exclusive use for the original study investigators, usually a year or until the first major publication. However, this practice does not address the more general question of duplication because in most cases the potential uses of a data set far exceed what can be accomplished within this limited period.

Before considering fixes to the problem of potential duplication, it is useful to assess whether and to what extent there is a problem. Consider the article by van Dyck et al.7 Although this analysis uses the same data as the article by Mayer et al8 in Pediatrics, and although both consider some of the same outcomes, the articles are in fact very different. The van Dyck article estimates the percent of US children who have special health care needs across demographic and economic groups, and it describes a wide range of outcomes for these populations. The Mayer article develops and tests a theory-based predictive model of unmet need for specific types of medical service among children with special health care needs. It provides no descriptive data regarding prevalence and no information on most of the outcomes examined by van Dyck and colleagues. In short, rather than providing an example of duplication, this pair of articles illustrates the value of having data in the public domain: researchers with different ideas, different approaches, and different goals can use the same data to develop different products with very different values and applications for science, practice, and public policy.

Does data sharing generally lead to duplication? Because the uses of shared data are not uniformly tracked, quantitative measures of duplication are out of reach. To address this question in a limited way, we undertook a survey of the published and forthcoming articles emerging from analyses of the National Longitudinal Study of Adolescent Health (Add Health). In this study, J. Richard Udry, PhD, and colleagues collected comprehensive information on the health and health-related behaviors of a large national sample of adolescents as well as information on their social contexts.9 The data were made available to both study and nonstudy scientists at the same time, and currently over 2000 individuals nationwide are analyzing the data. If duplication is a problem, it should be evident here. Our analysis of the first 180 publications from this project (assembled through a query to all users of the data) found none that exactly duplicated each other and 17 pairs of authors who addressed similar questions. Some of these pairs differed in their theoretical and statistical approach, whereas others used similar approaches but elaborated the questions in complementary ways.

In one instance, 2 sets of researchers tackled the question of whether parents encouraged sexual activity in their teenage children by recommending to them a method of contraception. One analysis found a positive effect10 ; the other, no effect.11 An analysis of the discrepancy showed that it was the result of different ways of defining the study population, defining the outcome, and constructing the model. The fragility of the result in the face of these differences produced knowledge that each analysis standing alone could not have produced. In fact, we don't know for sure whether parental advice on contraception encourages teenage sexual activity. This is important to understand—for parents, practitioners, and policy makers—because of a widespread assumption that information about sex and contraception is best left to parents.

Our analysis of Add Health data suggests that, when valuable data sets are widely available, researchers will occasionally use the same data to address similar questions. In some cases, this may have costs for the researchers themselves and for science in general. The researchers may suffer if they are unable to publish their work; science may suffer if the time spent in conducting similar analyses could have been spent in conducting analyses that produced greater gains in knowledge. On the other hand, these similar analyses often advance science significantly by highlighting the sensitivity of results to an analytic approach. They reveal what we do not know and stimulate debate about how to advance our scientific theories and methods. They reinforce accountability among researchers and provide an incentive for researchers to communicate with each other.

Could potential duplication be effectively managed in the context of data sharing? Theoretically, data providers might monitor ongoing uses of a data set and prevent new users from undertaking duplicative work. This would require timely, complete, and specific information from secondary users about the analyses they are conducting. Some data providers achieve this by restricting secondary users to specific analyses of the data that are agreed upon in advance. Although this is feasible, it is hardly efficient. The requirements imposed on providers to document and approve specific uses, the risk that productive lines of research will be disallowed, and the inflexibility placed on users create high costs for the research enterprise.

Even in the absence of limits on the uses of shared data, tracking ongoing work would remain difficult. Secondary users may be reluctant to provide detailed analysis plans to data providers: doing so on a continuous basis would divert time and resources from their work, and sharing their original ideas could be seen as risky. The system would impose a substantial burden on data sharers, who would have to maintain elaborate information systems and screen for related interests on an ongoing basis. The costs of a truly effective system would greatly outweigh the benefits, and the requirements it would demand of data users would likely undermine its effectiveness and discourage use of the data.

Although systems to prevent duplication are problematic, steps can be taken (and often are) to help potential users identify whether other scientists have already used a data set to address their questions. In many cases, data sharers maintain lists of publications, presentations, working papers, and dissertations that have used the data set. Some studies maintain listservs that can help data users identify co-users with similar interests. Others hold user conferences at which scientists can network. A further step that, to our knowledge, has not been adopted is asking data users to provide and periodically update keywords or topic sentences summarizing their general research interests and to post these on Web sites available to potential data users. This approach is relatively inexpensive and would provide users a tool for identifying other users with similar interests. It would leave decisions about sharing information concerning specific analyses to the users themselves.

Data sharing is here to stay, but best practices for data sharing remain a work in progress. As technologies, scientific methods, and scientific cultures change, norms for data-sharing practices will inevitably evolve. The concern about duplication may reflect, in part, lingering proprietary attitudes toward investigator-collected data, but it raises legitimate questions about how to manage the sharing of research resources in ways that are optimally efficient for the scientific enterprise. Certainly, more can be done to reduce the likelihood of unprofitable duplication. As such efforts are undertaken, we must be careful to ensure that they are not more costly than the problem they seek to address.

Fienberg  SE, edMartin  ME, edStraf  ML.ed Sharing Research Data.  Washington, DC National Academies Press1985;
Duncan  GT, edJabine  TB, edde Wolf  VA.ed Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics.  Washington, DC National Academies Press1993;
National Research Council,  Finding the Path: Issues of Access to Research Resources.  Washington, DC National Academies Press2000;
National Research Council,  Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.  Washington, DC National Academies Press2003;
National Science Foundation,  Dissemination and sharing of research data. NSF Grant Policy Manual (NSF02-151), Chapter VII Available at:http://www.nsf.gov/pubs/2002/nsf02151/gpm7.htm#730Accessed June 23, 2004
National Institutes of Health,  Final NIH statement on sharing research data. NIH Guide, NOT-OD-03-032. Available at:http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.htmlAccessed June 23, 2004
van Dyck  PC, edKogan  MD, edMcPherson  MG, edWeissman  GR, edNewacheck  PW.ed Prevalence and characteristics of children with special health care needs. Arch Pediatr Adolesc Med. 2004;158884- 890
Mayer  ML, edSkinner  AC, edSlifkin  RT.ed Unmet need for routine and specialty care: data from the National Survey of Children with Special Health Care Needs. Pediatrics. 2004;113e109- 115
PubMed
Not Available,  Add Health: The National Longitudinal Study of Adolescent Health. Available at:http://www.cpc.unc.edu/projects/addhealthAccessed June 23, 2004
Jaccard  J, edDittus  PJ.ed Adolescent perceptions of maternal approval of birth control and sexual risk behavior. Am J Public Health. 2000;901426- 1430
PubMed
McNeely  C, edShew  ML, edBeuhring  T, edSieving  R, edMiller  BC, edBlum  RW.ed Mothers' influence on the timing of first sex among 14- and 15-year-olds. J Adolesc Health. 2002;31256- 265
PubMed

AUTHOR INFORMATION

Correspondence: Dr Bachrach, Demographic and Behavioral Sciences Branch, National Institute of Child Health and Human Development, 6100 Executive Blvd, Rm 8B07, MSC 7510, Bethesda, MD 20892-7510 (cbachrach@nih.gov).

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Fienberg  SE, edMartin  ME, edStraf  ML.ed Sharing Research Data.  Washington, DC National Academies Press1985;
Duncan  GT, edJabine  TB, edde Wolf  VA.ed Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics.  Washington, DC National Academies Press1993;
National Research Council,  Finding the Path: Issues of Access to Research Resources.  Washington, DC National Academies Press2000;
National Research Council,  Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.  Washington, DC National Academies Press2003;
National Science Foundation,  Dissemination and sharing of research data. NSF Grant Policy Manual (NSF02-151), Chapter VII Available at:http://www.nsf.gov/pubs/2002/nsf02151/gpm7.htm#730Accessed June 23, 2004
National Institutes of Health,  Final NIH statement on sharing research data. NIH Guide, NOT-OD-03-032. Available at:http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.htmlAccessed June 23, 2004
van Dyck  PC, edKogan  MD, edMcPherson  MG, edWeissman  GR, edNewacheck  PW.ed Prevalence and characteristics of children with special health care needs. Arch Pediatr Adolesc Med. 2004;158884- 890
Mayer  ML, edSkinner  AC, edSlifkin  RT.ed Unmet need for routine and specialty care: data from the National Survey of Children with Special Health Care Needs. Pediatrics. 2004;113e109- 115
PubMed
Not Available,  Add Health: The National Longitudinal Study of Adolescent Health. Available at:http://www.cpc.unc.edu/projects/addhealthAccessed June 23, 2004
Jaccard  J, edDittus  PJ.ed Adolescent perceptions of maternal approval of birth control and sexual risk behavior. Am J Public Health. 2000;901426- 1430
PubMed
McNeely  C, edShew  ML, edBeuhring  T, edSieving  R, edMiller  BC, edBlum  RW.ed Mothers' influence on the timing of first sex among 14- and 15-year-olds. J Adolesc Health. 2002;31256- 265
PubMed

Correspondence

CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Comment

Some tools below are only available to our subscribers or users with an online account.

Web of Science® Times Cited: 2

Related Content

Customize your page view by dragging & repositioning the boxes below.