The EGA - Submitter Portal, provides the tools that aim to facilitate the metadata submission of human data to the European Genome Archive. The aim of this page is to provide a video tutorial on how to use the EGA Submitter Portal. The page is divided into ordered sections for completing a submission.
In this tutorial page we will demonstrate how to use the Submitter Portal to register your metadata. While the video focuses on the run-based submission (for raw files - fastq - and aligned data - BAM/CRAM), the analysis-based submission is defined below (for your BAM/BAI pairs, variation -VCF - and phenotype files)
Before registering the metadata is very important that all submitters have encrypted and and uploaded to their ega submission account (ega-box).
The EGA is a shared, public service with limited resources. In order to manage the available resources, we enforce a soft limit of 10Tb per submission account at any one time. Please do not exceed this limit. If you are approaching this limit please contact EGA helpdesk so that we can advise on how to register the associated metadata and trigger the archiving of files, so that you can continue with your submission. If we note that your submission account increases above 10Tb on a consistent base your password will be changed until metadata is associated
Please note that some metadata (run and analysis objects) cannot be registered until at least 24hours after the files have been uploaded to your box. Additionally, submissions to EGA can take approximately one month, so please, allow plenty of time for the submission and archiving processes.
The metadata objects required for read submissions are as follows:
- Study: information about the sequencing study
- Samples: Information about the sequencing samples
- Experiments: Information about the sequencing methods, protocols and machines. Experiments generate the linkage between samples and study. Only necessary for FASTQ and BAM/CRAM submissions.
- Runs: Samples, experiments and files are linked through runs - appropriate objects for FASTQ and BAM/CRAM submissions
- Analysis: References the analysis (BAM) files; associated with samples and study.
- DAC: contains information about the Data Access Committee (DAC)
- Policy: contains the Data Access Agreement (DAA); associated with DAC
- Dataset: contains the collection of runs/analysis data files to be subject to controlled access; associated with Policy
- **Study, samples, DAC and policy metadata can all be registered prior to uploading files**
If you are performing Array-based submission(s), the Submitter Portal should only be used to register the Study, Samples, Data Access Committee (DAC) and Policy metadata objects. We are currently working on the features to provide the creation of array metadata submissions using the portal..
EGA objects can be identified by their unique accession. These are ID's displayed everywhere, shared among all EGA locations and specific for each data type (More information on the list below)
|EGA Accession ID||EGA Object description|
|EGAS||EGA Study Accession ID|
|EGAC||EGA DAC Accession ID|
|EGAP||EGA Policy Accession ID|
|EGAN||EGA Sample Accession ID|
|EGAR||EGA Run Accession ID|
|EGAX||EGA Experiment ID|
|EGAZ||EGA Analysis Accession ID|
|EGAD||EGA Dataset Accession ID|
|EGAB||EGA Submission ID|
|EGAF||EGA File Unique Accession ID|
In the below 12 short videos, you can find a worked example, with detailed instructions on how to use the EGA submitter portal to perform metadata submissions to the EGA.
Points to Notice
There is a strong relationship among EGA metadata objects. Unless the primary objects (study, samples and DAC) are properly submitted, their linked and secondary objects will not validate (experiments, runs, analyses or policies). The tertiary metadata object (dataset) require all the objects to be submitted before can be validated and submitted. Should you prefer to submit everyone at once, please generate all the objects with no validation and the go to "Edit title and description" tab and click "I'm done". This will validate and submit all together
The EGA submitter portal video focuses on a unique use, the submission of Runs.
Aligned BAM files are expected to be submitted as runs (1 to 1 cardinality with samples). Analysis should be only be used for BAM/BAI pair, VCF and phenotype linkage to samples.. The analysis is an EGA specific metadata object that links Samples, to Files. This object also stores some metadata about your experiments, such as the experiment type, genome reference, or the platform used.**If only BAM or CRAM alignment files are submitted but not the original unaligned FASTQ files, then please make sure that the BAM or CRAM files also contain the unaligned reads. This is critical to enable primary re-analysis and re-alignment of the dataset using new tools or future genome assemblies.**
Prior to defining the Analysis
In order to register your analysis you should firstly :
Please note that the EGA allows for the re-use of registered metadata. Therefore the previously registered Study, DAC, Policy or samples can be re-used for the analysis data submission.
Defining the Analysis
- In the Submitter Portal accordion, select the option "Link files and samples" and click "Analysis Data".
- Start by selecting the sample(s) to be linked to the file, and populate the required attribute fields. Please note the existence of mandatory fields. These must be populated.
- Finally, select the file and file type to be associated with the sample. If you wish to add additional files, click the button "Add additional files".
- Your analysis will be created in draft status. To learn more about validating, editing or deleting the analysis view the Submitter Portal video section above
Points to notice
When populating the chromosome field (mandatory). Please, after selecting the chromosome(s), press key ENTER in order to save your selection.
Submitter Portal - Guided Documentation
The EGA submitter portal credentials are provided by the Helpdesk team when a submission account is requested
Main page: when you log in to the Submitter Portal, you will find the following image (with your submissions):
In the main page you can see the open submissions in your ega-box. The submission can have different status depending on the objects in it:
- Draft: the objects in the submission have been created but not validated or registered (submitted)
- Validated: the objects in the submission have been created and validated. The submission will be in validated status once all objects are also validated (V)
- Validated with errors: the objects in the submission have been created but in the process of validation could not been completed due to an error.
- Submitted: the all the objects in the submission have been created and registered (submitted).
- IMPORTANT: once an object is submitted, it gets a unique accession number assigned. Then, this registered objects is automatically added on our databases.
- Submitted Partially: one or more objects in the submission is submitted, but there are still other objects not submitted (draft or validated)
- Submitted draft: when a submitted submission is modified, the status turns into submitted draft.
- IMPORTANT: submit the modified object again in order to re-obtain a submitted status
- Submitted validated with errors: when a submitted submission is modified and modified, the status turns into submitted validated with errors if the modification contains an error.
1) Submissions : Clicking this button you can see all submissions in you ega-box
By clicking on the option in the circle you can filter your submissions depending on their status:
- Open submissions: Draft, Validated, Validated with errors, Submitted draft or Submitted validated with errors
- Close submissions: Submitted
- All submissions: all status together
2) Submitted objects : You can also see your objects (studies, samples, files, experiments, analyses, dacs, policies and datasets):
For example, samples. You can also filter your samples depending on their status:
Moreover, you can also filter your samples by different options: Status, EGA ID, Alias, Subject ID, Updated, Created
3) New submission: Click this button when you need to start a new submission
- Add a title for your submission. This way you will easily distinguish between different submission (in case you are undergoing multiple submission in the same ega-box).
- You can start a submission in different steps. For example, you can create a submission in the samples if you already have a study registered.
In the submission there are several tabs (one for each object)
- Register study: study
- Register samples: sample
- Define one or more experiments:experiment
- Link files and samples: run and/or analysis
- Register data access committee: dac
- Register data access policy: policy
- Submit dataset: dataset
When registering an object, there are some field that are mandatory (marked with a *). If these mandatory fields are not populated, you will not be able to save the object:
As you can see, the ‘Save study’ is greyed because there is still an mandatory field empty (Study type). Once this field is filled, the objects can be saved by clicking on ‘Save study’.
There are several action for a created object:
- 1 ) Validate: by clicking on the green tick, you will request to validate the object
- 2 ) Submit: by clicking the blue arrow, you will request to submit the object
- 3 ) Edit: by clicking on the yellow pencil, you will request to edit the object
- 4 ) Delete: by clicking the red cross, you will request to delete the object
Each object is linked in a unidirectional way with another object. Map of linkage of objects in a submission:
For example, an study is not directly linked to a dataset. A dataset is linked to runs (linkage between samples and files). These linkages are linked to experiments and, these experiments are the ones directly linked to a study.
Reusing Registered Objects
In the EGA we strongly encourage reusing registered objects if needed. How can you do that? In each tab you can find a click box where all objects in the ega-box will display. For instance, you want to reuse an old study but you need to register a new experiment, in the experiment tab you will find the following checkbox:
By clicking the Show all box’s studies:
The same goes when reusing a sample in the linkage with the files:
And the same with multiple combination of objects.
For this reason, if you already have old submitted objects (via Webin or SP) you can reuse them, without having to register them all over again.
How to submit all objects
How to submit all objects (the whole submission) at once? By clicking on the ‘I’m done. Please, process this submission’ button on the first tab of the submission tab list:
When you try to validate a run and the samples used are not registered (submitted):
Click on validate and a message box will appear saying that the submission request was sent:
After a few minutes the following message will appear:
These error messages stating that the validation failed because the referenced alias could not be found are because the sample actually DOES NOT EXIST on our database (where the call is sent to validate or submit your object).If you submit your sample first (by clicking on the blue arrow on the sample object):
Then, click to submit the run it will work this time (as the sample is not registered and added on our database):
Also, the experiment is submitted itself with the validation and submission of its linked objects (sample and runs)
IMPORTANT: If there are several runs in different status, the experiment will duplicated itself in different statuses. It is ok. This object (experiment) will submit itself once the submission is completed and all samples and runs are submitted.
Finally, in order to observe the error messages, please, go to the ‘Submission errors console’ tab
EGA Webin is an online tool that could be used to submit metadata (affiliated to sequence files) to the EGA. Furthermore, it can also be used to to register Study (EGAS), Data Access Committee (DAC) and Policy (EGAP) for all array based submissions.
The Webin platform is a historical tool that preceded the current Submitter Portal. It was developed and used to register metadata affiliated to sequence files. Detailed documentation about Webin and how to use it can be obtained here.
Click on the links below for guides on submitting specific metadata using the Webin :
Analysis: Aligned (BAM)
Analysis: Variant (VCF)
Webin will not be further maintained, however, it can be used as a backup tool if there is any issue with your submission via the Submitter Portal. Please contact the ega helpdesk for any related queries.