day-8 – Intensive School for Advanced Graduate Studies – Machine Learning – 2021

Day 8

Applications in Medical Genomics

Organized by Luisa Bernardinelli and Davide Gentilini

Machine learning in cancer pharmacogenomics
In this half day we will see how supervised machine learning can help to predict a response to treatment based on the genomic profile of a tumour.
Large-scale cancer pharmacogenomic screening experiments profile hundreds of cancer cell lines versus hundreds of clinically approved or experimental compounds. This is a multi-task multi-view prediction problem, which can be addressed by high-dimensional multivariate regression, where the response variables are potentially highly correlated.
We will see how we can improve model interpretability and prediction performance over default machine learning approaches by introducing some of the structure of the data into the model and thereby tailoring the model to the data setup. Specifically, we will start from penalised regression and its connection to Bayesian variable selection and show how to use structured penalties/ priors to incorporate existing knowledge about relationships between genomic input features and treatment outcomes. This prior knowledge can come for example from molecular biology (e.g., signalling pathways or protein networks), pharmacology (target pathways of drugs), or experimental design.
I will introduce the application area by means of some show cases from our own work in collaboration with biomedical colleagues at Oslo University Hospital.

Speaker:

Manuela Zucknick

Manuela Zucknick is Associate Professor at the Oslo Centre for Biostatistics and Epidemiology, University of Oslo. Before that, she worked as a researcher at the German Cancer Research Centre in Heidelberg after completing her PhD in Biostatistics at Imperial College London. She is interested in statistical machine learning methods for translational oncology, with a focus on exploring statistical methods for integrating different sources of genomic data and incorporating biological knowledge via structured penalties and priors, e.g. for modelling and prediction of treatment effects in anti-cancer drug screens.

Affiliation
Oslo Centre for Biostatistics and Epidemiology,
Faculty of Medicine, University of Oslo, Norway
Email: manuela.zucknick@medisin.uio.no
URL: http://www.med.uio.no/imb/english/people/aca/manuelkz

References:

Ickstadt K, Sch afer M, Zucknick M (2018). Toward integrative Bayesian analysis in molecular biology. ARSIA. 5:141-167.
Menden MP, et al (2019). Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nature Communications 10:2674.
Zhao Z, Zucknick M (2020). Structured Penalized Regression for Drug Sensitivity Prediction. JRSSC. 69(3): 525-545. DOI:10.1111/rssc.12400

C4X discovery using novel mathematics to discover disease sub-populations and prioritize novel drug targets
The workshop will focus on how C4X discovery uses novel mathematics to discover disease sub-populations and prioritize novel drug targets. The first part of the workshop will introduce the concept of genetic analysis and showcase conventional genetic analysis approaches highlighting advantages and disadvantages.The second part of the workshop will give an overview of the mathematical principles behind Taxonomy3, the technology that we use in C4X Discovery to discover novel drug targets. The third part of the workshop will focus on network approaches and bioinformatics tools that are developed to progress the genetic findings into potential drug targets. The last part of the talk will focus on progressing population stratification findings into the clinic and the challenges posed by data availability and interpretation

1. Introduction

a. Introduction to the drug discovery industry and processes.
b. Why is genetic data important for drug discovery? – A discussion on an increased success in clinical trials based on drug targets with genetic support.
c. What are GWAS studies and how they work? – A basic introduction to univariate genome-wide association study methods (with demonstration) and their usage for drug discovery.
d. What is epistasis and why is it important? – Reviewing the literature on methods used to highlight important genetic interactions for a GWAS study.
e. What is an omnigenic model? – Omnigenic models consider a hierarchical categorization of the genetic factors explaining the heritability of complex disorders, which involves a discussion on ‘missing heritability’

2. Taxonomy3

a. How does Taxonomy3 work and where do we add value compared to conventional genetic analysis?
i. A mathematical formulation of the method (with demonstration) will be discussed which will highlight a. Transforming the data by retaining the information content from the genetic variants b. Applying a multivariate dimensionality reduction approach to account for correlations across the entire genome c. Defining statistical interactions which represent the synergistic effect of pairs of genetic variants. d. Variable selection for synergistic effects. e. Identifying disease modality sub-populations. f. Significance assessment of variants and interaction pairs.
ìì. Real-life examples of targets being progressed by C4X discovery identified using the above methodology. Real-life examples of sub-populations identified by the above technology which are currently being commercialized by C4X discovery.

b. How do we improve QC and ethnical homogeneity selection?
ì. Demonstration on how Taxonomy3 is used to identify further technological bias in genetic data not captured by conventional quality control steps.
ìì. Demonstration on how Taxonomy3 is used for ethnic co-analysis for ethnic outlier detection and ethnic sub-population selection, which is not captured by conventional ethnicl co-analysis.

c. Discussion on how significant genetic variants and variant interactions fit with the omnigenic model.

3. Target Prioritization – How to get from a genetic signal to a potential drug discovery program?

a. Bioinformatics tools
ì. Mapping SNP-to-Gene-to-Pathway is a crucial process for successful drug discovery – we review the various aspects of these maps and give real-life examples of identified targets due to different mapping strategies. This discussion will involve linkage disequilibrium block, eQTL associations, and chromatin interactions.
ìì. Gene Ontology enrichment techniques are used to identify relevant biological processes to interpret findings. Review of methods and databases.
ììì. Construction of biological networks from identified targets explains further the biological convergence of the genetic findings. Review of methods and databases.
ìv. Pathway association can highlight relevant cells in which targets are expressed. Review of methods and databases.

b. In vitro experiments are used to validate targets: examples from C4X Discovery CRISPR and compound experiments will be discussed

4. How to identify population stratification relevance for the disease?

1 – A discussion on how the association of disease sub-populations with clinical phenotypes and/or genomic data is key for understanding the complexity of the disease and the development of stratified medication. Examples of in-house sub-population associations and challenges posed by data availability and quality.
2 – Discussion on progressing drug targets for stratified populations and how this fits with the omnigenic model.

Zhana Kuncheva

Zhana Kuncheva is currently working as a Genetics Data Science Lead at C4X discovery overlooking the mathematical, scientific and technical aspects of Taxonomy3.
Before this, Dr Kuncheva completed her PhD in Statistics at Imperial College London with a focus on modelling complex genomic and neuroscience networks. She also has experience with neuroscience imaging data as she spent her MSc year at the University of Warwick researching the topic.

Affiliation: Genetics Data Science Lead, Clinical Development and Mathematics, C4X Discovery
Email: zhana.kuncheva@C4XDISCOVERY.COM
url: www.c4xdiscovery.com

References:

1) Nelson, Matthew R., et al. “The support of human genetic evidence for approved drug indications.” Nature genetics 47.8 (2015): 856.
2) King, Emily A., J. Wade Davis, and Jacob F. Degner. “Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval.” PLoS Genetics 15.12 (2019).
3) Boyle, Evan A., Yang I. Li, and Jonathan K. Pritchard. “An expanded view of complex traits: from polygenic to omnigenic.” Cell 169.7 (2017): 1177-1186.
4) Reed, Eric, et al. “A guide to genome‐wide association analysis and post‐analytic interrogation.” Statistics in medicine 34.28 (2015): 3769-3792.
5) Turner, Stephen, et al. “Quality control procedures for genome‐wide association studies.” Current protocols in human genetics 68.1 (2011): 1-19.
6) Delrieu, Olivier, and Clive Bowman. “Visualizing gene determinants of disease in drug discovery.” (2006): 311-329.
7) Delrieu, Olivier, and C. Bowman. “On using the correlations of divergences.” Systems Biology and Statistical Bioinformatics (2007): 27-35.
8) Charalambous, Christakis, Olivier Delrieu, and Clive Bowman. “Whole genome scan algebra and smoothing.” The Art and Science of Statistical Bioinformatics (2008): 21-27.