Characterization of Structural Variants in Acute Myeloid Leukemia Patients

Acute myeloid leukemia (AML) is the most common form of acute leukemia in adults and is expected to increase in frequency as the population ages. Most subtypes of AML are difficult to treat, in part due to a lack of understanding of possible molecular targets. Furthermore, the peak incidence of AML is in the sixth decade, and patients over the age of 65 do not tolerate chemotherapeutic regimens well. Moreover, AML has been highly studied as a model for other cancers. An RNA Biology Program is established aiming to develop comprehensive datasets through the profiling of all different classes of RNA expression, RNA editing, alternative splicing, and non-coding RNAs in CD34+ normal hematopoietic stem cells (HSC), their differentiated progeny, and AML clinical samples that could be utilized to understand leukemia.

These datasets will allow us to computationally identify and characterize various RNA species and processes to enable the development of effective therapeutics to AML, leukemia and other types of cancers. Epigenetic factors such as methylation and histone modifications will also be characterized from these clinical samples allowing for integrated dataset analysis to look at the cross-talk between RNA and DNA/epigenomics in leukemia.

The following is a subsidiary study of the above study description where whole genome sequencing was performed on two collected AML patients' samples using Illumina and Oxford Nanopore sequencing technologies. The aim is to characterize the roles of structural variants (SV) in RNA biology which are potentially important for AML development using our own established long-read SV characterization tool, NanoVar.

The advent of Oxford Nanopore sequencing technologies has opened a new avenue for better SV characterization by having longer sequencing reads. NanoVar is an accurate and rapid SV characterization tool that utilizes low-depth Nanopore sequencing data. NanoVar makes use of split-reads and hard-clipped reads for SV discovery and employs a simulation-trained neural network classifier for true positive enrichment. NanoVar exhibited higher accuracy and faster computational speed amongst other long-read SV detection tools in simulated data. In AML patient data, NanoVar displayed excellent accuracy in SV characterization (16/16 SVs validated in two patients) by using only 4x depth sequencing data. In summary, NanoVar improves the accuracy and speed of SV characterization at a lower sequencing cost, an approach compatible with clinical studies.

Type: Cohort
Archiver: The database of Genotypes and Phenotypes (dbGaP)