Submitting array based metadata
The submission metadata required for Array-based submission must be submitted using EGA Submitter Portal and by completing the Array-based format (AF) spreadsheet. The guidelines for this workflow are described on this page.
**Metadata submitted as xmls or through the EGA Submitter Portal will be made publicly available to view on the EGA website and other resources/ partner websites**
Use the Submitter Portal to register your Study, Samples, Data Access Committee (DAC) and Policy. This online interface enables you to create new and edit existing submissions.
Go to the EGA Submitter Portal page and log in using your assigned ega-box and password.
- Go to the “New Submission” tab
- Choose “Register study” (project) and complete the web form
- Save object and click submit (blue arrow)
- Take a note of your study accession number (EGASXXXXXXXXXXX)
To use the study accession number in a publication, we suggest the following format:
"Sequence data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGASXXXXXXXXXXX. Further information about EGA can be found on https://ega-archive.org "The European Genome-phenome Archive of human data consented for biomedical research"( http://www.nature.com/ng/journal/v47/n7/full/ng.3312.html ).
- Choose “Register samples” and complete the web form
- You can register manually, populating the field each time for each sample
- You can register your samples using a template (recommended for large batches of samples)
- Once all samples are registered, submit them in order to obtain the list of accession numbers (EGANXXXXXXXXXXX).
- You can submit each sample manually, by clicking on the blue arrow for each object
- Choose Edit title and description tab and click on “I’m done. Process this submission” button. Wait up to a few minutes for the whole submission to be submitted.
- Save the list of accession numbers. These will be needed for the spreadsheet.
Registering Data Access Committee
Further information on the role of your DAC can be found here .
- Choose Register Data Access Committee (DAC) and follow the online prompts
- Take a note of the DAC accession number (EGACXXXXXXXXXXX)
Your Data Access Policy provides the terms and conditions of data use. This is also referred to as the Data Access Agreement (DAA).
Completion of a DAA by the applicant/s should form part of the application process to the Data Access Committee (DAA)
- Choose “Register data access policy” tab and follow the online prompts
- Take a note of the Policy accession number (EGAPXXXXXXXXXXX)
Complete the Array-based format (AF) spreadsheet
Once you have completed the registration of your Study, DAC and Policy using Submitter Portal , you must then complete and return the AF spreadsheet
The AF spreadsheet consists on four components:
- Tab1) Webin accessions : Provide the accession numbers for your study, DAC and policy. Please, also add your ega-box number.
- Tab2) Sample & phenotypes: Please, leave this tab blank. All samples MUST be registered via Submitter Portal
- Tab3) Dataset: Describe the dataset to be created
- Tab4) Data files: Define how your data is going to be organised into datasets and packets for distribution (linkage between samples and files).
Should further assistance be required after going through the guide below; please do not hesitate to contact the EGA helpdesk
Once the AF spreadsheet is populated, please send it to our EGA helpdesk for further validation.
AF spreadsheet: Webin accessions
Should your submission require multiple DAC’s or policies, use ‘ ; ‘ to separate the accession numbers.
AF spreadsheet: Samples & phenotypes
AF spreadsheet: Datasets
We suggest that each dataset consists of a common set of data. The example below consists of two datasets, grouped according to shared data type, technology and by case/control.
We also like to capture the number of unique samples that make up the dataset and the Data Access Committee (DAC) responsible for providing the named dataset and their policy (EGAP).
AF spreadsheet: Data files
What follows is an example of how to map your samples to the array based files added to your upload account (4th tab).
Please, find below some practical examples on how to register the linkage between samples-files
Case 1) 1 sample or list of samples in different datasets:
In case you have a list of samples that belong to different datasets, please, repeat the samples accession number/s in the first column and link the sample to the corresponding dataset each time (each row).
Each row is one linkage between sample-file-dataset.
Case 2) 1 sample links to several files:
In order to add multiple files to one sample you MUST use “ ; “ between filenames. Example: file1.gpg;file2.gpg;file3.gpg
In case that you want to add an extra file to the sample (phenotype or .Rdata), please use “Additional files” column.
Important note: You MUST upload the encrypted and unencrypted md5sum values of all files uploaded to your submission account using the filename nomenclature (file.gpg, file.md5,file.md5.gpg). Your submission will not be processed without md5values supplied for all files in the CORRECT format.
What happens after the submission of a dataset?
All datasets affiliated to unreleased studies are automatically placed on hold until the authorised submitted or DAC contact instructs our EGA helpdesk for the study to be released.
Datasets affiliated to released studies will automatically be released.
When your study progresses is released the named DAC contacts will be provided access to the EGA DAC admin tools to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.
Further information regarding the role of the Data Access Committee can be found here
Finally, your data is archived within our databases and prepared for encrypted distribution upon the request of permitted EGA account holders.
We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.