Sequence variations in VCF
The submission of metadata required for VCF files must be submitted using a combination of Webin and XML submission to the REST server, the guidelines for this workflow are described on this page.
Large scale and/or frequent submitters may wish to consider submitting all your metadata programmatically to our REST server.
**Metadata submitted as xmls or through the Webin tool will be made publicly available to view on the EGA website and other EBI resource/partner websites**

The metadata objects required for read submissions are as follows:
Study: information about the sequencing study
Samples: Information about the sequencing samples
Analysis: References the analysis (VCF) files; associated with samples and study.
DAC: contains information about the Data Access Committee (DAC)
Policy: contains the Data Access Agreement (DAA); associated with DAC
Dataset: contains the collection of runs/analysis data files to be subject to controlled access; associated with Policy
**Study, samples, DAC and policy metadata can all be registered prior to uploading files**
**The Analysis object must be submitted as an analysis XML to the REST server, all other objects may be submitted using Webin**
1) Register your Study, Samples, DAC and Policy using Webin
Go to the EGA Webin page and log in using your submission account name and password.
Components must be registered individually((e.g. Study, samples, DAC and policy) or together by selecting Experiements and reads if your data files have been uploaded), and can be registered in any order.
Study
Samples
Data Access Committee (DAC)
Data access policy
Register your Study
- Go to the New Submission tab
- Choose Register study (project), click Next and complete the web form
- Click submit to accession your study
To use the study accession number in a publication, we suggest the following format:
"Sequence data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGASXXXXXXXXXXX.
Further information about EGA can be found on https://ega-archive.org "The European Genome-phenome Archive of human data consented for biomedical research"( http://www.nature.com/ng/journal/v47/n7/full/ng.3312.html ).
Register your Samples
All samples should have 'Gender', 'Donor id (Subject id)' and 'Phenotype' attributes.
Gender should be described as 'male', 'female' or 'unknown'. If 'unknown' due to a known sex chromosome aneuploidy, please create a user defined attribute called 'Sex chromosome karyotype' and add the appropriate value, for example, 'XXY'.
Donor id (Subject id) should be a de-identified subject handle. If unknown, please add 'unknown' to the field.
Phenotypes should, where possible, be an Experimental Factor Ontology accession. If a term cannot be found to describe your phenotype please use free text. All sample phenotypes considered important for further analysis of the data should be provided (for example, tumour type), additional phenotype attributes can be created by defining your own attributes; use the notion 'phenotype2', 'phenotype3', etc.
- Go to the New Submission tab
- Choose Register samples and click Next
Register your Data Access Committee (DAC)
Further information on the role of your DAC can be found here.
- Go to the New Submission tab
- Choose Register Data Access Committee (DAC) and click Next and follow the online prompts
Register your Data access policy
Your Data access policy provides the terms and conditions of data use, this is also referred to as the Data Access Agreement (DAA).
Completion of a DAA by the applicant/s should form part of the application process to the Data Access Committee (DAC).
- Go to the New Submission tab
- Choose Register Data access policy and click Next and follow the online prompts
2) Submit your Submission and Analysis XML to the REST server
Webin does not currently support the submission of Analysis objects (VCF files). We are working on adding this functionality to Webin, but in the meantime, we require that all submitters complete an analysis xml to upload to the REST server.
Below you will find a step by step guide of the process. Please contact helpdesk@ega-archive.org should you require additional support.
i) Prepare a submission XML and Analysis XML - click on the links to be taken to a description and example of each xml. The latest xml schemas can be found here.
ii) Upload your Submission XML and Analysis XML to the REST production server: https://www.ebi.ac.uk/ena/submit/drop-box/submit/
**The field marked 'Location in the drop box' can be left blank**
iii) Upon successful submission to REST you will obtain analysis accessions (EGAZXXXXXXXXXXX) for use in your dataset. Be sure to keep a copy these accessions for use later.
3) Submit your dataset using Webin
The dataset describes the data files, defined by the run (EGARXXXXXXXXXXX) and analysis (EGAZ00000000000) accessions that make up the dataset and links the collection of data files to a specified Data Access Committee and Data access policy.
As a result, you must have registered your Analysis, Data Access Committee (DAC) and Data access policy before submitting your Dataset.
Please consider the number of datasets that your submission consists of, for example, a case control study is likely to consist of at least two datasets. In addition, we suggest that multiple datasets should be
described for studies using the same samples but different sequence technologies. Please contact EGA Helpdesk for further assistance.
- Go to the New Submission tab
- Choose Submit Dataset and click Next
- Select/Register Data Access Committee (DAC) and Data access policy
- Register your dataset
- After submitting your dataset you should contact the EGA Helpdesk to provide a release date for your dataset.
Datasets are automatically held (i.e. not released) unless they are affiliated to a study that has already been released.
**Metadata submitted as xmls or through the Webin tool will be made publicly available to view on the EGA website and other EBI resource/partner websites**
What happens after the submission of a dataset?
All datasets affiliated to unreleased studies are automatically placed on hold until the authorised submitter or DAC contact instructs our helpdesk@ega-archive.org for the study to be released.
Datasets affiliated to released studies will automatically be released.
When your study progresses is released the named DAC contacts will be provided access to the EGA DAC admin tools to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.
Further information regarding the role of the Data Access Committee can be found here
Finally, your data is archived within our databases and prepared for encrypted distribution upon the request of permitted EGA account holders.
We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.
- © COPYRIGHT 2023. EGA CONSORTIUM