Need Help?

Synthetic - FEGA Sweden Heilsa synthetic dataset December 2023

Synthetic - This submission contains a subset of a synthetic dataset derived from the project Heilsa Tryggvedottir - a Nordic collaboration on sharing sensitive human data. Heilsa Tryggvedottir is funded by the Nordic e-Infrastructure Collaboration (NeIC), the ELIXIR nodes of Finland, Norway, and Sweden, Computerome in Denmark, and the Estonian Scientific Computing Infrastructure (ETAIS). In the synthetic data creation process, it was attempted to strike a fine balance between the usability of the datasets (e.g. technical FEGA development, testing, user training, and basic bioinformatics) and compliance with GDPR. File names and file content (e.g. headers in fastq) are anonymized. Moreover, the X, Y, and mitochondrial sequences have been discarded from the original data since these data can be used for maternal, paternal, or ethnic origin tracing. The dataset does not follow natural haplotype distribution (inherent to imputation panels). The only inputs derived from real sequence data are variant distribution density per chromosome and learning sequencing error models. The synthetic dataset consists of two fastq files, a cram file, a vcf file, and two index files.

Request Access

FEGA Sweden Synthetic Data Policy

This Policy implies no restriction on data use (DUO_0000004). However, when using the data, the following should be acknowledged: 1. The project: Heilsa Tryggvedottir – Nordic collaboration on sharing sensitive human data. 2. The funders of Heilsa Tryggvedottir: The Nordic e-Infrastructure Collaboration (NeIC), the ELIXIR nodes of Finland, Norway, and Sweden, Computerome in Denmark, and the Estonian Scientific Computing Infrastructure (ETAIS). If there are any questions, contact ega-se@nbis.se. DATA DISCLAIMER: FEGA Sweden assumes no responsibility or liability for any errors or omissions in the information contained in the data and metadata, or for the results obtained from using the information. All information is provided "as is", with no guarantee of completeness, accuracy, timeliness, or results obtained from using the information.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS50000000086 Whole Genome Sequencing

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF50000057493 fastq.gz 31.9 GB
EGAF50000057494 fastq.gz 32.7 GB
EGAF50000057495 vcf.gz 349.4 MB
EGAF50000057496 tbi 1.6 MB
EGAF50000057497 cram 11.8 GB
EGAF50000057498 crai 1.5 MB
6 Files (76.7 GB)