The European Genome-phenome Archive (EGA) offers services for archiving, processing and distribution for all types of potentially identifiable genetic and phenotypic human data at the European Bioinformatics Institute (EBI).
1. Data sharing policies
Journals and funders increasingly require researchers to have a data sharing plan:
Wellcome Trust's "Policy on data management and sharing"
Nature "Availability of data and materials"
Public Library of Science (PloS) "Sharing of Materials, Methods, and Data"
The EBI has run public databases that disseminate data to the wider scientific community for many years.
The EGA is designed to provide an appropriate archive for data on subjects who have consented to the use of their individual genetic data for biomedical research, but not for unlimited public data release.
Data can be submitted to the EGA prior to publication, at other significant milestones, and at study close in accordance with the Toronto statement.
The must suitable EBI archive for your data is dependent on the type of data you are wishing to submit and if the data requires public or controlled access. Public access is defined as complete and open access to all files submitted. Controlled access, in the context of the EGA, requires formal applications to be made to access the submitted data files.
Controlled access data is defined by the original informed consent agreements signed by the participants involved in your study, these consents prevent the derived data files from being dispersed by open and public access. Controlled access data often consists of human data derived from medical research and consortium projects. All data submitted to the EGA MUST be subject to controlled access as defined by the original informed consents. If in doubt consult the informed consent agreements that apply to your study.
Controlled access does not correspond to holding a release prior to publication. All EBI archive resources enable you to hold a submission before publication.
As part of the submission process, submitted data files are packaged into datasets. Access to dataset/s are controlled by a Data Access Committee (DAC), which must be registered as part of the submission process. A DAC may consist of a single or several committee member/s that are responsible for making data access decisions in response to applications made by individuals wishing to access data. A DAC may be responsible for approving access to single or multiple datasets.
An overview of the EGA data distribution model
A named individual, referenced on the DAC Access policy document, within the Data Access Committee (DAC) is provided access to the EGA DAC admin tools, which enable EGA accounts to be created and managed with access permissions for the dataset/s that fall under the responsibility of the DAC.
Data types accepted by the EGA can be split into three categories: Sequence, Array-based and Phenotypes.
All manufacturer-specific raw data formats for the major next generation sequencing platforms are accepted, including aligned BAM files and variation files in VCF format.
All array-based technologies are accepted, which may include the raw data, intensity and analysis files, and there are no restrictions on data formats accepted.
Phenotype and clinical data for secondary use are accepted, which may be submitted as phenopackets or another data format.
All samples submitted to the EGA must include the attributes of gender, donor ID (anonymised individual identifier) and phenotype information critical for facilitating analysis (for example, defining tumour and non-tumour samples and/or defining disease state) using controlled ontology terms.
The EGA recommends using the Experimental Factor Ontology Database for describing your sample phenotypes.