How to search on EGA

Tip 1: Use simple and clear keywords

No matter what you are seeking: a study, a dataset or a committee; the search will look recursively in all the fields of all the entities in the database. Thus, there is no need to specify which type of data you are looking for. Try to use single and specific words and avoid long and complex expressions.

Try it out: cancer rna-seq instead of cancer studies using rna-seq


Tip 2: Narrow your result using search operators

Search operators give you more control over the results. They manage the way keywords are used to restrict the search. By default, the search engine applies an OR operator when a space is inserted between two words. Include AND (capitalised) between the keywords in your search in order to show only pages containing both term.

Try it out: Search all the studies based on transcriptomic data, including those ones that use RNA sequencing to obtain this data


Tip 3: Do not worry about details

The search engine automatically checks for the most common spelling mistakes of a given word and suggest you to use the correct version. It also ignores capital letters and most punctuation so none of these mistakes will hamper the accuracy of your search. Besides, when using a combination of words, the search engine will suggest similar combinations with a higher number of results.


Tip 4: Use words that are likely to appear within the text

Unfortunately, our search engine is not able to interpret words yet. Therefore, words which do not appear in the body text will not return any result despite they denote a group of present terms. For instance, if we are looking for studies focused on neurological diseases, using neurological as keyword would not be the best option. Instead, you should use the name of each neurological disease such as bipolar disorder, squizophrenia or depression.

In the same fashion, incomplete words do not return any result. Please include asterisk before and/or after the incomplete word in order to allow for partial matching.

Try it out: canc* instead of canc (without wildcard). Also you can try out *oma, or even *geneti*

Advanced Searches

Additional example queries:

Datasets and Dacs related to DUO:0000005 and lung word

Search for a GWAS study in dbGaP

Search for a GWAS study in EGA

Search for a dataset that is at the EGA beacon and with type "Whole genome sequencing" and with number of samples between 100 and 2000 and with technology "Illumina HiSeq 2000"

  • example query
  • Or write: is_in_beacon:true AND dataset_type:"Whole genome sequencing" AND samples:[100 TO 2000] AND dataset_technology:"Illumina HiSeq 2000"

Search for dac which email domain ends in “.es” and contains the sclerosis word

Search for dacs edited between 2018-01-01 and 2019-09-01 with domain ending in “.ca” related to cancer

  • example query
  • Or write: edited_time:[2018-01-01T00:00:00Z TO 2019-09-01T00:00:00Z] type:dac domain:*.ca cancer