Frequently Asked Questions about Berkeley Lab’s Research Data Policy
The Lab’s Research Data Policy covers the use, management and sharing of research data generated at Berkeley Lab. Its aim is to provide a framework for the managing and sharing of Research Data in support of Berkeley Lab’s mission of bringing science solutions to the world.
This policy applies to all research data for which Berkeley Lab holds ownership or use rights, and to all persons working for Berkeley Lab in any capacity or through any other augmentation to Berkeley Lab staffing levels, and who are involved in the design, conduct or reporting of research, regardless of the funding source for such activities.
The Research Data Policy was developed by a cross-Lab working group consisting of representatives from all Areas as well as the Laboratory Directorate.
Why can’t I copyright raw data? Can I copyright a database of (raw) data?
Raw data, scientific facts, algorithms/formulae, obviousness, cannot be granted IP protection by US law. US case history (previous legal decisions via the courts) determines the outcomes of nuanced situations, for example rearranging the order of names and numbers in a telephone book/database doesn’t necessarily give you new IP rights over the original.
Does it matter for copyright when the raw data is processed?
Raw data that has been processed may be copyrightable if adequate human interaction or authorship is involved. When this human interaction happens (e.g., as part of manual post-processing, or as part of the setup & control of a research-specific, in-situ processing pipeline) does not matter.
Data Management and Data Management Plans
What is a Data Management Plan?
Data Management Plans (DMP) document the procedures and processes for managing research data, including data collection, storage/protection, processing, as well as data licensing and sharing. It is useful to plan ahead on how to handle research data during and after a research project, and DMPs support the rigor and integrity of the research being undertaken.
Who can I contact for data storage guidance?
For assistance with data management plans or with the process of sharing research data please contact Science-IT at email@example.com.
How do I write a good Data Management Plan?
Tools such as dmptool.org by the University of California provide templates for Data Management Plans as required by various research funders, including the Department of Energy. The tool also serves as a repository for Data Management Plans, and allows researchers to share their DMP with a DMP ID (a unique, persistent identifier specific to each Data Management Plan). That identifier helps with tracking compliance with this Research Data Policy. For feedback or questions related to the DMPTool, please visit, https://dmptool.org/contact-us.
How do I ensure policy compliance for my Data Management Plan?
The Research Data Policy provides only a framework for Data Management Plans, without prescribing their content. Funder requirements or other regulations may determine content for Data Management Plans. Section E.4 of the Policy describes the various items that should be considered for a Data Management Plan. However, a mandatory requirement is that Data Management Plans must consider the handling of confidential or personal data, if such data is collected in a research project.
What does the recommendation that deposited data should use openly documented data formats and metadata standards mean?
If Research Data is being shared, it is preferable if the data is shared in openly documented data formats, so that it can be reused by others. Sometimes, instruments used in research will only export data in proprietary formats, making the use of open data formats impossible. However, if there is an option, openly documented data and metadata formats are preferred.
How do I make my Data Management Plan machine-actionable?
Machine-actionable means that the information in a Data Management Plan includes underlying metadata that describes the main items in the Data Management Plan and is well structured. This makes it easier for machines to parse (machine-readable) and interpret (act on) the information included in Data Management Plans. Machine-actionable Data Management Plans can be used by funders to link the plans with research projects, or by researchers to find related projects. They are a recommendation but not a requirement under the Research Data Policy.
Certain tools that create Data Management Plans, such as dmptool.org by the University of California will automatically create machine-actionable Data Management Plans.
More information on machine-actionable DMPs can be found at:
What are best practices for giving and receiving credit for data sharing?
The intention of sharing research data is that others can build on your work efficiently and helps to advance science. Receiving credit for the data that you share, and giving credit to those whose data you use, is an integral part of the academic system. The article “Ten simple rules for getting and giving credit for data” provides a good introduction into the topic.
Are there recommended platforms or repositories that researchers can use to archive/publish their data?
For communities with established disciplinary data repositories, it is recommended to use these repositories for data publishing. When your community does not have a disciplinary repository, publishing data in general repositories is recommended. Examples of these are Dryad, Zenodo and Figshare. Our recommendation is to use Dryad as a general subject repository for research data. Berkeley Lab is an institutional member of Dryad, meaning all Berkeley Lab researchers can submit research data (up to 300 GB per dataset) to Dryad at no cost. Publishing data in Dryad ensures long-term preservation and availability. Zenodo offers a convenient mechanism to archive and share snapshots of GitHub repositories and is recommended for software and code sharing. The Software Disclosure and Distribution Policy must be followed before any software is shared.
Archival platforms can include center-specific resources, such as a computing center’s tape archive. Not all archival platforms provide a publishing option, but all considered repositories need to include a method and plan for long-term storage and availability.
How can I reduce the cost to archive/publish research data?
The cost of publishing research data may be covered by research funders. In addition, many disciplinary repositories offer to publish certain data sets free of charge. For others, Berkeley Lab is an institutional member of Dryad, meaning all Berkeley Lab researchers can submit research data (up to 300 GB per dataset) to Dryad at no cost.
What “minimal datasets” do I need to share at the time of publication?
As part of the Lab’s mission to bring science solutions to the world, publishing our research includes the provision of data that is underlying the published findings, so that others can replicate or better build on our work. The policy requires the sharing of minimal datasets and metadata in support of the published work. This could be the data that is depicted in or behind figures or charts in a paper. There is no need to share raw data if the standard in the field is to share processed data, and there is no need to share the full dataset of an investigation if only parts of the data were reported in the published work. However, the policy does require sharing data according to field-specific standards, which may include raw data. See also the separate FAQ on recommended platforms or repositories that researchers can use to archive/publish their data.
What is the best way to share software?
Zenodo is a research data and code repository that fulfills the recommended standards for data sharing and is a convenient way to pull GitHub packages into the repository.
The University of California, Berkeley, has drafted a Software Sharing Guide. This guide will help you learn how to make your code citable. It will take you step by step to archive your code using data and code archiving platform Zenodo and to get a persistent identifier for your code that can be used to cite the software.
“Research Data or academic software published by others used for research conducted at Berkeley Lab must be acknowledged where appropriate for academic credit, for example through data or software citations.” What software does this include? Does it include general purpose software like Excel, Matlab, Labview, etc. and all types of data acquisition software?
Credit must be given to data or software that has been shared as part of the research process, for example by other researchers as part of an academic collaboration. In these situations, providing credit through citations is expected. On the other hand, if software has been purchased or licensed, there is usually no expectation of academic credit.
Cornell University has assembled a practical guide on how to use data citations.
Scripts, interactive documents or notebooks that analyze Research Data can be published under the license of the dataset. What if I am unsure if this applies to my software or want to publish it under a different license?
The guidance is that if scripts or interactive notebooks only/primarily work with a specific dataset, it makes sense to keep them with the dataset and its license. If you are unsure if your planned scripts or interactive notebooks fall into this category, please reach out to the Intellectual Property Office at firstname.lastname@example.org.
If the software is more general or shall use a different license, follow the Software Disclosure and Distribution Policy.
How do I share methods and protocols for my research?
Sharing methods and protocols supports the reproducibility and replicability of research. Research articles in the scientific literature typically have a ‘methods’ section to document how research was performed. However, those sections are often not suitable to describe research methods in sufficient detail. Instead, there are scientific journals that publish detailed methods or protocols as a full research paper, thereby providing academic credit. Another alternative is protocols.io, which is a platform to develop and collaborate on research methods as well as to share them in a citable manner. The University of California has subscribed to protocols.io, providing LBNL users with premium access.
When a PI leaves the Lab – what does “custodial ownership” mean?
Custodial ownership goes beyond digital ownership of a shared dataset. The custodian of an object is responsible for the obligations that arise from the ownership, for example to ensure the security of the data, and to ensure storage in compliance with retention requirements. For physical objects such as written Lab notes, custodians retain the physical object.
Who to Contact
Who do I contact for further information?
Questions about data storage? For assistance with data management plans or with the process of sharing research data please contact Science-IT at email@example.com.
Security concerns or in cases of accidental disclosure of personally identifiable information or protected health information? Contact firstname.lastname@example.org.
Concerns or disputes over the control, use, and integrity of Research Data or violations of the Research Data Policy? Contact the Research Integrity Officer (RIO) at email@example.com. The RIO may discuss concerns informally, which may include discussing them anonymously and/or hypothetically. If the circumstances described by an individual do not suggest research policy violations, the RIO will refer the matter to other offices or officials with responsibility for resolving a concern.