Single cell transcriptomics of hESC-derived midbrain dopaminergic neurons generated by a new human development-based protocol

Single cell RNA-seq from D0,D11,D16,D21,D28 of dopamingeric differentiation from hESCs cell lines H9 and HS980 using current protocols. Different time points along the differentiation for each cell line were multiplexed using BD™ Single-Cell Multiplexing Kit for use with the 10x Chromium™ Single Cell 3’ Reagent Kit v2.

8 samples
DAC: EGAC00001002707
Technology: Illumina NovaSeq 6000

Access Policy1 Study Files

Metadata

Request Access

NOVEL STRATEGIES FOR CELL-BASED NEURAL RECONSTRUCTION (NSC-RECONSTRUCT) - INITIAL DMP – KAROLINSKA INSTITUTET

NOVEL STRATEGIES FOR CELL-BASED NEURAL RECONSTRUCTION (NSC-RECONSTRUCT) - INITIAL DMP – KAROLINSKA INSTITUTET 1. DATA SUMMARY Provide a summary of the data addressing the following issues: State the purpose of the data collection/generation Explain the relation to the objectives of the project Specify the types and formats of data generated/collected Specify if existing data is being re-used (if any) Specify the origin of the data State the expected size of the data (if known) Outline the data utility: to whom will it be useful PURPOSE AND OBJECTIVES The overall goal of the NSC-Reconstruct project is to develop new regenerative approaches to combat brain damage inflicted by neurodegeneration or injury using new innovative technologies for the development of next generation cell-based therapies for neuronal replacement and circuitry repair, with four overarching goals : i. The development of cellular products, reprogramming methods and research tools with broad potential for brain repair using innovative cellular and genomic technologies; ii. The development of novel strategies for cell-based repair in order to pave the way for the next generation of cell-based replacement therapies for the treatment of major neurodegenerative and traumatic diseases; iii. To achieve integration and functional reconstruction of complex circuit/pathway using transplants of diverse subtypes of stem cell-derived neurons; iv. To promote the translation and commercialization of new cell and gene products, research tools and therapies for clinical trials and market approval. In this proposal we aim to generate high-quality data as well as neuronal and circuit repair strategies. This wil be achieved by using cutting edge technologies to gain improved understanding of the cellular and molecular mechanisms controlling development and repair; and by implementing such knowledge to engineer novel and more efficient cell-based therapies capable of modifying the disease process. EXPERIMENTAL TYPE OF DATA Experimental data will be obtained from: ● gene expression from QPCR machines, ● images of cells and tissues from microscopes ● Flow cytometry machine ● single cell RNA and ATACseq from sequencing machines Observational data will be collected by single cell RNA- and ATAC-seq from human postmortem tissues Simulation data will be generated by computational models such as Machine learning (logistic regression) Datasets to be re-used, complied and mined: We will use gene expression datasets form open online resources (i.e Allen brain atlas) and our previous publications (i.e. La Manno et al., 2016) and published datasets by other research teams. DATA FORMAT: ● Text: e.g. txt, HTML, PDF, docx ● Numeric: e.g. xlsx, csv ● Audiovisual: e.g- jpeg, png, tiff, mp3, mp4, avi ● Flow cytometry data will be in .fcs. ● Computer code (Phython, R, Linux) EXPECTED DATA SIZE: About 1TB 2. FAIR DATA 2.1 Making data findable, including provisions for metadata: Outline the discoverability of data (metadata provision) Outline the identifiability of data and refer to standard identification mechanism. Do you make use of persistent and unique identifiers such as Digital Object Identifiers? Outline naming conventions used Outline the approach towards search keyword Outline the approach for clear versioning Specify standards for metadata creation (if any). If there are no standards in your discipline describe what metadata will be created and how Our work will follow the FAIR principles, as recommended by KI. These include making our data FINDABLE by: 1) Assigning data and metadata a unique and persistent identifier 2) Describing the data with rich metadata using standard or otherwise specified terminologies. 3) The metadata will include the identifier of the data it describes 4) Data will be indexed in a searchable resource. Data will be documented following the MINSEQE standard recomendations (http://fged.org/projects/minseqe/). We will record information in a format amenable to future deposit in databases such as GEO or EGA. 2.2 Making data openly accessible: Specify which data will be made openly available? If some data is kept closed provide rationale for doing so Specify how the data will be made available Specify what methods or software tools are needed to access the data? Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)? Specify where the data and associated metadata, documentation and code are deposited Specify how access will be provided in case there are any restrictions Our work will follow the FAIR principles, as recommended by KI. These include making our data ACCESSIBLE in the following way: 1) Data and metadata will be retrievable by their unique and persistent identifier. 2) Metadata will be accessible, even when the data is no longer available 3) Data and metadata will be evaluated for copyright and/or intellectual property right issues, which may delay data sharing and accessibility as described below. 4) Data and Datasets that do not contain personal information, as well as code, will be: ● openly accessible online, prior to publication, after all datasets are collected and analysed ● accessible to collaborators in the project when necessary to accomplish our objectives. 5) Datasets containing personal information will be deposited in European repositories such as EGA, and made exclusively accessible to authorised and authenticated personal (see below). DATA STORAGE: Human sequencing data from NGI will be transfer, processed and temporarily stored (fastq and analysis files) in the Bianca server for sensitive data at Uppmax (Uppsala Multidisciplinary Center for Advanced Computational Science), which has several layers of security. Data will be further processed in the Monod computer in our division, which is protected with fire-wall, has controlled access. Personal identity is protected through pseudonymisation. All data (raw and processed) will be stored at the department server at KI (G network, MBB mount) with controlled access via personal KI-ID and password. KI ELN will be used for the documentation of all analyses and results. Raw genetic data for which there is informed consent will be permanently stored in controlled-access repositories (e.g. European Genome-Phenome Archive, https://ega-archive.org). Qualified researchers can obtain the data after signing an agreement that includes privacy protection compliant with GDPR, including adequate data protection and the possibility to withdraw consent. PUBLICATION: Only processed human sequencing data, which cannot be de-anonymized (e.g. gene expression matrices) will be published. Manuscripts with processed data and metadata, Scripts and protocols will be made available through open access repositories, such as BioRxiv, GitHub and protocols.io, prior to publication. Raw genetic data for which there is informed consent will be permanently stored at KI's (G network, MBB mount) and in controlled-access repositories (e.g. European Genome-Phenome Archive, https://ega-archive.org). Qualified researchers can obtain the data after signing an agreement that includes privacy protection compliant with GDPR, including adequate data protection and the possibility to withdraw consent. DATA ACCESS: Access to the documentation stored in ELN is restricted to group members. Access to the data saved on the Monod computer in our unit is restricted to group members/authorized personnel. We only work with pseudonymized data, with the key stored in remote locations to which we do not have access: Brain banks at NIH, Cambridge Univ, or Nederlands brain bank; and iPS cell repository at NINDS and Jackson Laboratory . 2.3 Making data interoperable: Assess the interoperability of your data. Specify what data and metadata vocabularies, standards or methodologies you will follow to facilitate interoperability. Specify whether you will be using standard vocabulary for all data types present in your data set, to allow inter-disciplinary interoperability? If not, will you provide mapping to more commonly used ontologies? Our work will follow the FAIR principles, as recommended by KI. These include making our data INTEROPERABLE as follows: 1) (Meta)data will use a formal, accessible, shared, and broadly applicable language for knowledge representation 2) (Meta)data will use vocabularies that follow FAIR principles 3) (Meta)data will include qualified references to other (meta)data New Scripts and protocols will be made available through open repositories such as GitHub and protocols.io. 2.4 Increase data re-use (through clarifying licenses): Specify how the data will be licenced to permit the widest reuse possible Specify when the data will be made available for re-use. If applicable, specify why and for what period a data embargo is needed Specify whether the data produced and/or used in the project is useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why Describe data quality assurance processes Specify the length of time for which the data will remain re-usable Our work will follow the FAIR principles, as recommended by KI. These include making our data RE-USABLE as follows: Data will be quality-checked at collection/generation by validation against controls or publicly available databases. Images will be inspected for artifacts and the results will be recorded in a spreadsheet file. RNA seq data will be quality controlled using standard protocols (FastQC…). Only high-quality data will be included in the subsequent analysis. Metadata: Associated experimental metadata, protocols and scripts will be stored at Karolinska Institute ELN system. Legal aspects of data/metadata sharing: Sensitive personal data/metadata will be handled according to KI:s guidelines (https://staff.ki.se/gdpr). We will only work with pseudonimized data and the key is only in a remote location to which we have no access. Data/metadata transfer or processing agreement will be performed in the context of our consortia agreement. If necessary, will be performed between our research group and collaborators for data transfer, previously approved by KI's legal department. IP rights will be handled according to KI:s guidelines (https://staff.ki.se/guidelines-on-intellectual-property-and-corporate-collaborations). Long term storage/access of data/metadata: Processed data and metadata will be published in BioRxiv and open access journals. Scripts and protocols will additionally be published in full detail through open access repositories, such as GitHub and protocols.io. Raw genetic data will be permanently stored at KI's (G network, MBB mount) and in controlled-access repositories (e.g. European Genome-Phenome Archive, https://ega-archive.org). 3. ALLOCATION OF RESOURCES Explain the allocation of resources, addressing the following issues: Estimate the costs for making your data FAIR. Describe how you intend to cover these costs Clearly identify responsibilities for data management in your project Describe costs and potential value of long term preservation KI provides the require infrastructure and resources required for making our data FAIR. No specific resources have been allocated for data management of this particular project, but resources such as salaries for all participants contributing data in the project (researchers, lab manager and PI) are available. Each fo the labmembers will contribute to the process of making their data FAIR. Data management is primarily done by the researcher involved in the analysis, with the help of the lab manager and under the supervision of the PI. The PI will be responsible for data management, long term storage and archive. As a consortia we will work to develop a plan for sharing data and to facilitate achieving our objectives in the project. We may need to employ a data manager as the volume of data and researchers involved in the project grows. 4. DATA SECURITY Address data recovery as well as secure storage and transfer of sensitive data DATA STORAGE AND RECOVERY Human sequencing data from NGI will be transfer, processed and temporarily stored (fastq and analysis files) in the Bianca server for sensitive data at Uppmax (Uppsala Multidisciplinary Center for Advanced Computational Science), which has several layers of security. Data will be further processed in the Monod computer in our division, which is protected with fire-wall, has controlled access. Personal identity is protected through pseudonymisation. All data (raw and processed) will be stored at the department server at KI (G network, MBB mount) with controlled access via personal KI-ID and password. KI ELN will be used for the documentation of all analyses and results. Raw genetic data for which there is informed consent will be permanently stored in controlled-access repositories (e.g. European Genome-Phenome Archive, https://ega-archive.org). Qualified researchers can obtain the data after signing an agreement that includes privacy protection compliant with GDPR, including adequate data protection and the possibility to withdraw consent. Publication: Only processed data, which cannot be de-anonymized (e.g. gene expression matrices) will be made public. Scripts and protocols will be made available through open repositories such as GitHub and protocols.io. SECURITY Access to the documentation stored in ELN is restricted to group members. Access to the data saved on the Monod computer in our unit is restricted to group members/authorized personnel. We only work with pseudonymized data, with the key stored in remote locations to which we do not have access: Brain banks at NIH, Cambridge Univ, or Nederlands brain bank; and iPS cell repository at NINDS and Jackson Laboratory. 5. ETHICAL ASPECTS To be covered in the context of the ethics review, ethics section of DoA and ethics deliverables. Include references and related technical aspects if not covered by the former Patient data will be pseudonymized by the brain or iPS cell banks. The material will arrive to KI coded. The code will remain in a remote and protected location which will not accessible to researchers in the group. Sequencing of brain samples has been approved by the Swedish Ethical Review Authority (EPN, approval no. 2019/2152-31). Results will only be presented on aggregated level (i.e: data matirces) without any possibility of backward identification. If needed, Data Transfer/Processing agreements will be signed prior to any data sharing. 6. OTHER Refer to other national/funder/sectorial/departmental procedures for data management that you are using (if any) We use the data management procedures recommended by Karolinska Institutet, with follow the Swedish and European procedures: https://staff.ki.se/plan-your-research-data-management-0 https://staff.ki.se/guidelines-for-research-documentation-and-data-management

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID	Study Title	Study Type
EGAS00001006313	Single cell transcriptomics of hESC-derived midbrain dopaminergic neurons generated by a new human development-based protocol	Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID	File Type	Size
EGAF00006184190	bam	29.5 GB
EGAF00006184191	fastq.gz	1.7 GB
EGAF00006184192	fastq.gz	5.4 GB
EGAF00006184193	fastq.gz	13.1 GB
EGAF00006184194	fastq.gz	1.7 GB
EGAF00006184195	fastq.gz	5.3 GB
EGAF00006184196	fastq.gz	13.1 GB
EGAF00006184197	bam	42.8 GB
EGAF00006184198	fastq.gz	2.3 GB
EGAF00006184199	fastq.gz	7.7 GB
EGAF00006184200	fastq.gz	18.9 GB
EGAF00006184201	fastq.gz	2.3 GB
EGAF00006184202	fastq.gz	7.7 GB
EGAF00006184203	txt	1.7 kB
14 Files (151.7 GB)