PPM1-1229-150022

Header Information

Proposal Number:

PPM1-1229-150022
Program Cycle:

PPM 01
Submitting Institution Name:

Sidra Medicine
Project Status:

Award Tech. Completed
Start Date:

03/11/2016
Lead Investigator:

Dr. Khalid Fakhro
Project Duration:

2 Year(s)
End Date:

02/03/2020
Submission Type:

New
Proposal Title:

A high resolution map of structural variation in Qatari genomes and their contribution to quantitative traits and disease

Project Summary

Proposal Description:

The rapid proliferation of next generation technologies over the past decade has enabled whole genome sequencing (WGS) to advance at a phenomenal scale, uncovering remarkable levels of genomic diversity among humans. Surprisingly, though global efforts such as the 1000 Genomes projects have identified millions of variants in different multi-ethnic populations, the Arab world remains poorly represented in public databases [1-3]. We recently demonstrated that a subset of the Qatari population represents one of the most ancient and genetically diverse populations in the world [4], explaining the extensive genetic heterogeneity underlying disease in this part of the world [5-8]. As such, it is of paramount importance to generate definitive genetic variant databases for this population if promises of precision health and personalized medicine are to be achieved in Arabs. The Qatar Genome Program (QGP) represents one striking attempt to achieve this goal. In its pilot phase, >2,500 genomes will be sequenced and analyzed in light of the extensive deep phenotyping available at the Qatar Biobank. In addition to single- and multi- nucleotide variants called in this dataset, there will be an urgent need to continue our work on generating a comprehensive Structural Variant (SV) map for the Qatari population [9]. This will have implications not only for human health and disease (e.g. finding ‘knockout’ humans with homozygous genic deletions), but also for generating the backbone structure for a Qatari-specific reference genome in the future. This proposal transcends our previous work and is novel on many levels, primarily due to the technical challenges of SV prediction in such a large WGS dataset. Indeed, unlike small-variant calling, for which best practices are well established [10], there are no standardized pipelines for SV calling from thousands of genomes yet. Additionally, there is no unified environment yet for the management and analysis of big genomics datasets required to achieve this goal. While the R Project for Statistical Computing has had a fundamental role in supporting the development of methods for statistical-genetics analyses of high throughput datasets, the processing of large WGS data often surpasses R’s capability and is therefore carried out through standalone programs. We identified in the ROOT framework developed by European Council for Nuclear Research (CERN) a potential platform capable of management, compression, and analyses of such large datasets. ROOT is an object-oriented framework developed in C++, originally conceived in the high-energy particle physics community where it is routine to store, analyze and visualize petabytes of data in an efficient, collaborative way. In ROOT the data are stored as instances of C++ classes, in a hierarchical object-oriented database optimized for data analysis and thus are highly compressed. The ROOT framework can call R functions, and allow sharing of novel libraries and functions among users, akin to the philosophy underlying R’s open-source, community driven success. Moreover, analysis on ROOT can be performed in parallel on clusters of computers or multi-core machines even if they are located in different geographical locations – ideal for our proposed collaboration. Our project can therefore be summarized with the following 3 objectives: 1) In collaboration with the CERN we will implement a novel data representation (including relevant wrappers for standard read data format) optimized for the compression and storage of 2,500 Qatari subjects’ WGS data in ROOT. We will also adapt SV prediction-algorithms that use combined detection approaches to work with ROOT. 2) Through common efforts between CERN and Sidra data centers the SV prediction will be carried out using the ROOT distributed parallel processing environment. We will integrate resulting SV call files to generate a comprehensive map of SVs at base-pair resolution in the Qatari population. We will be supported by the Database of Genomic Variants (dgv.tcag.ca) to host this rich dataset and serve it to the research community through a web-based browser for Qatari SVs [cite: 24174537]. 3) We will run a genome-wide association analysis to investigate the contribution of SVs to Cardiovascular Disease (CVD) risk factors in the Qatari population (which have been extensively collected by the Qatar Biobank). We will then attempt to replicate highly significantly associated SVs in the TwinsUK cohort –an independent cohort of >3,000 individuals from a different genetic background under different environmental pressures for whom WGS and similar CVD-related traits are already available. Replicated SVs will be validated by wet lab analysis (qPCR/long range PCR) and their breakpoints confirmed. Altogether, we believe our project will have high impact for both basic science and translational medicine. First, we will provide a proof-of-concept for the use of the ROOT framework as a shared platform for the management and analysis of big genomics data. This could be of immense value to the community as we enter the WGS era, in which raw data will grow to unprecedented sizes and multinational collaborations become the norm. Second, we will generate the definitive database of structural variants in Qataris, an evolutionarily ancient population sharing extensive ancestry with the rest of the Arab world. Third, we will leverage the deep phenotyping on these samples to estimate the contribution of SVs to CVD and related cardio-metabolic traits, a leading cause of death and morbidity worldwide. Finally, we will involve and develop trainees to contribute to a knowledge-based economy in Qatar. Thus, the sum of our study will be greater than its individual parts, and will have a profound impact on personalized medicine and precision health in Qatar.
Research Area Keywords:

Cardiovascualr disease; Copy number variation; Single nucleotide polymorphism; Qatari Genome project; Rare diseases
Research Area Keywords by PM:

traits ; genetic heterogeneity; heterogeneity
Research Type:

Translational Research / Experimental Development

Research Area	Sub Research Area	Sub Speciality	Primary	Secondary
3. Medical and Health Sciences	3.4 Medical Biotechnology	Gene-Based Diagnostics and Therapeutic Interventions	Yes	No	عرض
3. Medical and Health Sciences	3.4 Medical Biotechnology	Health-Related Biotechnology	No	Yes	عرض

Institution

Institution	Country	Institution Role
King's College London	United Kingdom	Collaborative Institution	عرض
Sidra Medicine	Qatar	Submitting Institution	عرض

Personnel

Role	Name	Affiliation
Lead PI	Dr. Khalid Fakhro	Sidra Medicine	عرض
PI	Dr. Mario Falchi	King's College London	عرض
PI	Dr. Puthen Veettil Jithesh	Hamad Bin Khalifa University	عرض
PI	Dr. Ammira Al-Shabeeb Akil	Sidra Medicine	عرض
PI	Dr. Charbel Abi Khalil	Weill Cornell Medical College in Qatar	عرض
Consultant	Dr. Charbel Abi Khalil	Weill Cornell Medical College in Qatar	عرض

Outputs/Outcomes

Output Type	Publication Title	Authors	Reference No
Online Paper	Ethnic-specific association of amylase gene copy number with adiposity traits in a large Middle Eastern biobank	Rossi N, Aliyev E, Visconti A, Akil ASA, Syed N, Aamer W, Padmajeya SS, Falchi M, Fakhro KA.	DOI:10.1038/s41525-021-00170-3	عرض

Search

Header Information

Project Summary

Institution

Personnel

Outputs/Outcomes