Metadata

Metadata helps researchers understand the content, context, and structure of the dataset. It provides details about variables, units of measurement, data sources, and data collection methods.

As interdisciplinary research becomes more common, metadata becomes even more critical when datasets from various sources may be combined and analyzed together. It helps researchers from different fields understand and use data from diverse disciplines.

Documenting your data at the very beginning of your research project and incorporating changes as your project progresses ensures accuracy and completeness. Doing so will make the process much easier, as constructing metadata at the end of the project, will be painful and important details may have been lost or forgotten.

In addition, Stanford University has some excellent information about Metadata and the site is referenced below.

We will continue to add to guidance and tools for creating metadata as it becomes available.

More About Metadata

Metadata can include information about data ownership, licensing, and ethical considerations. Researchers can determine if they have the right to use and share the data based on metadata disclosures and allows them to cite your dataset accurately, giving you credit for your work and contributions to the field. Documenting your data at the very beginning of your research project and incorporating changes as your project progresses ensures accuracy and completeness. Doing so will make the process much easier, as constructing metadata at the end of the project, will be painful, and important details may have been lost or forgotten.

Data should be housed in a secure location prior to sharing; consult your department head, department administrator, or colleagues about available options. In addition, the following tools can be used by researchers for data collection.

REDCap is an electronic data collection tool with a user-friendly interface that allows researchers to build and manage online surveys and databases. ALL REDCap projects are subject to fees: $150/per project for the 1st year and $75 every year thereafter.

UConn Health provides a service to centralize all lab management tasks, including metadata management requirements per NIH's mandate. While LabMaC meets NIH's requirements, its goal is to facilitate rigorous biomedical research and enhance reproducibility.

Data Standards

FAIRsharing.org maintains a registry of terminology artifacts, models/formats, reporting guidelines, and identifier schemas. This link to the search tool displays 60+ data standards that are:

recommended by a data policy from a journal, journal publisher, or funder.
actively maintained by a representative of the resource.
active and ready for use.

Additional filtering options by subject, domain, species, etc. are available, to narrow down your choices. The FAIRsharing Standards Overview can be found here: https://doi.org/10.5281/zenodo.8186982

The README File

The README.txt files provide the information needed to make working with (DROs) Digital Research Objects, numerical data, images, spreadsheets, etc., easier and increase the accessibility for users and researchers. The following guidelines will help you craft a comprehensive document to assist users. A separate README file is recommended for each distinct dataset. For example, if the same data collection occurs multiple times during your project, a single README file is sufficient for the set. The document may contain any or all of the following information:

Keywords: Terms or phrases that describe the subject, domain, and/or content of the data.
Persistent Identifiers (PIDs): Unique identifiers, such as: ORCID ids, DOI (Digital Object Identifier), etc.
Naming Conventions: Standards used to organize and identify folders and files and for version control.
Data Ownership: Details regarding the creator, ownership/source(s), and rights associated with the data.
Data Content/Quality: Information on data validation, anomalies, accuracy, precision, and completeness.
Time Intervals: Information about the time resolution and frequency of data collection or timestamps indicating when data was collected or recorded.

Creating a README file at the beginning of your research process, and updating it consistently throughout your research, will help you to compile a final README file when your data is ready for deposit.

Publish your README file as a plain text file, avoiding proprietary formats, such as Microsoft Word, whenever possible. The .txt format is recommended due its generic and interoperable properties making it ideal for sharing. If you’ve used (or prefer) a proprietary format, save the document in .txt format prior to sharing.

The Data Dictionary

A data dictionary is a structured collection of metadata or information specific to the data elements within your dataset. It helps users understand the context of the data, their attributes, relationships, and definitions.

Data Element Name: This is the name of the data element.
Definition/Description: Describes the data element, its purpose and its context. e.g., weight in kilos, height in cm
Data Type: This defines the type of data that can be stored in a field. E.g., text or numeric, date format
Values and Anomalies: Variables used for a particular data element and deviations from standards, norms, or expected results.
Data Structure/Groups: A group of data elements that describe a unit in the system and/or relationships between data elements.

From Stanford University: Creating Metadata for Scientific Research

Creating metadata manually can be a confusing and time-consuming task. Stanford University offers information about the process, including existing tools to assist researchers in automating the creation of Metadata.

Create metadata for your research project - Guides from Stanford University

We will update this page as we gain more knowledge on this topic.

Search Intranet