Project Summary/Abstract Genomic alterations drive cancer development in pediatric and adult patients, but pediatric cancers display a relative scarcity of somatic variants. This observation coupled with the earlier age of onset in familial cancer syndromes suggests that germline variants contribute to the development of many pediatric cancers. Many known pathogenic germline variants exhibit biological features such as loss of heterozygosity and rarity within the general population; certain germline cancer-causing variants also cause cancer when acquired somatically. Identifying pathogenic germline variants from whole exome sequence (WES) data would clarify the pathogenesis of pediatric cancers, and might support improved treatments for cancer subtypes with poor prognosis such as relapsed acute lymphoblastic leukemia (ALL), high-grade glioma (glioma), and ependymoma. The long-term goal of this research is to understand the role of germline variants in pediatric cancer development using genomic data. The objective of this application is to develop a computational biology tool that identifies pathogenic germline variants from WES data, and to apply this tool to analyze germline variants in relapse ALL, glioma, and ependymoma. This application's central hypothesis is that pathogenic germline variants have biological and cohort-specific features that distinguish them from benign variants, such that a machine learning pipeline trained on these features can predict pathogenic germline variants. Aim 1 will call germline variants in WES data from 1) pediatric patients with known cancer-causing germline variants and 2) control patients without cancer; patients in data set 1) are randomly divided into a training and test set. A bioinformatics pipeline will call variants, add annotation, and filter out low quality variants. Aim 2 will use the training set to train and optimize a machine-learning algorithm to predict high confidence germline driver variants. Aim 3 will apply the final pipeline to validation sets of pediatric relapsed ALL, glioma, and ependymoma samples. These aims will generate a computational pipeline that predicts pathogenicity of germline variants using a pediatric cancer cohort, enabling improved understanding of the contribution of germline variants to multiple pediatric cancers." |