Federated EGA


Federated EGA Vision Statement

The Federated EGA is the primary global resource for discovery and access of sensitive human omics and associated data consented for secondary use, through a network of national human data repositories to accelerate disease research and improve human health.

Over the last 10 years, most individual-level human omics data have been generated in the context of research consortia and shared via global repositories such as the European Genome-phenome Archive (EGA). Many countries now have emerging personalized medicine programmes which are generating data from national or regional initiatives. Thus, human genomics is undergoing a step change from being a research-driven activity to one funded through healthcare initiatives.

Genetic data generated in a healthcare context is subject to more stringent information governance than research data and often must comply with national legislation. To address this need, the Federated EGA provides a network of connected resources to enable transnational discovery of and access to human data for research while also respecting jurisdictional data protection regulations. By providing a solution to emerging challenges around secure and efficient management of human omics and associated data, the Federated EGA fosters data reuse, enables reproducibility, and accelerates biomedical research.


Overview

The EGA project is currently a collaboration between EMBL-EBI and the CRG, regulated by agreements between the two institutions. The Federated European Genome-phenome Archive (EGA) will be a distributed network of repositories for sharing human -omics data and phenotypes. Typically a node would be an organization or project that hosts human genetic data so that the data can remain within a jurisdiction. Federated EGA gathers metadata of -omics data collections stored in national or regional archives and makes them discoverable across the EGA network.

EGA is contributing the Federated EGA model, requirements and experiences to several communities and projects like GA4GH, ELIXIR Federated Human Data Implementation Study or ELIXIR Federated Human Data community.


Documentation


  Title   Version   Description
Structure and Organization
EGA Federation: Structure and Organization 1.1 The structure of an EGA federated network and service expectations. We organise the EGA into three types of nodes: Central EGA, Federated EGA nodes and EGA Community nodes; we outline the goals of such an organization, and summarize the commitments and services provided by the nodes.
Strategic Committee
EGA Federation Strategic Committee 1.1 In the EGA Federation Strategic Committee terms of reference document we describe the purpose and objectives of the committee, which is to provide direction and strategic planning for the federated EGA project. The committee receives input from the EGA Strategic Committee and provides feedback for the EGA strategic roadmap.
Operations Committee
EGA Federation Operations Committee 1.1 The EGA Federation Operations Committee terms of reference describes the purpose and objectives of the operations committee, which is to review operational performance and coordinate technical implementation roadmaps of EGA Federated and Community nodes. The committee receives advice from the EGA Federated Strategic Committee, and provides operational reporting to the EGA Federated Strategic Committee
Guidelines
Node Operations guidelines 2.0 The EGA Federated Node Operations gives an overview of the operational areas which require resources in order to create a federated EGA node. The document is based on more than 10 years experience of establishing and operating the EBI and CRG Central EGA nodes. It provides a breakdown of the operational areas of responsibility into Helpdesk Services, Technical Operations, Software Development, and IT Infrastructure.


Available Software

The LocalEGA is a federated storage software for sensitive data.

Software
Main LocalEGA software Repository
Documentation
Main LocalEGA software Documentation


Local EGA Software

A portable toolkit to securely deposit and share human sensitive data - Local EGA, Mini-Symposium Federated Human Data, Elixir All Hands Meeting, 2020


Federated EGA API's

Below is a list of the GA4GH standards and APIs implemented by the Federated EGA. Visit EGA-GA4GH for the full list that are currently available or planned for implementation at EGA.


  Standard   Purpose   Specification
  Version
  Supported
  Version
  Implementation
Beacon Supports discovery of genomic variants, individuals, and individuals V1.0.1 V0.3 Specification
Documentation
Endpoint
Crypt4GH Enables direct byte-level compatible random access to encrypted genetic data stored in community standards (e.g. CRAM, VCF) V1.0 V1.0 Specification
Documentation
Endpoints
Data Use Ontology (DUO) Allow users to semantically tag datasets with usage restrictions so datasets can be automatically discoverable based on a researcher's authorization level or intended use. 2021-02-23 2021-02-23 Specification
Documentation
Endpoint
htsget A protocol for secure, efficient, and reliable access to sequencing read and variation data. V1.3.0 V1.0.0 Specification
Documentation
Endpoint
refget Enables access to reference sequences using an identifier derived from the sequence itself. V1.2.6 N/A Specification
Documentation
Endpoint
Researcher IDs (passport, visa) Specify the collection of researchers that may access a dataset at any given time, and the credentials they must supply. V1.0.1 V1.0.1 Specification
Documentation
Endpoint

  API   Purpose   EGA API Version   Implementation
Submission API For submitting metadata to EGA following the INSDC object schemas. Implements DUO. V1.0.0 Specification
Documentation
Endpoint
Permissions API For getting and setting permissions to EGA objects. Implements Researcher ID. V1.0.0 Specification
Documentation
Endpoint