A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics

時間:2020-01-10         阅读:

光華講壇——社会名流与企业家论坛第 5685

主題:A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics

主講人:哈佛大学 张豪宇博士

主持人:统计学院 刘耀午

時間:2020年1月13日(星期一)15:00-16:00

地點:西南財經大學光華校區光華樓1007會議室

主辦單位:数据科学与商业智能联合实验室 统计学院 科研处

主講人簡介:

张豪宇博士现为哈佛大学公共卫生学院生物统计系博士后,导师是林希虹院士。他在浙江大学数学系完成本科学习后,在约翰霍普金斯大学生物统计系取得博士学位,导师是Nilanjan Chatterjee教授。他的主要研究兴趣为统计遗传学。

內容摘要:

Cancers are routinely classified into subtypes according to various features, including histo-pathological characteristics and molecular markers. Previous genome-wide association studies (GWAS) have reported heterogeneous association between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose score tests for genetic associations using a mixed-effect two-stage polytomous model (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics of interest. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization (EM) algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. We also identified 32 novel breast cancer susceptibility loci using both standard methods and MTOP from a GWAS analysis including 133,384 breast cancer cases and 113,789 controls, plus 18,908 BRCA1 mutation carriers (9,414 with breast cancer) of European ancestry. .

根據像組織病理學特征和分子標記這些特征,通常可以劃分癌症的亞型。已有的全基因組關聯研究(GWAS)方法可以檢測基因和癌症亞型之間的異質關聯。然而,在潛在的關聯測試中,在處理相關腫瘤特征、缺失數據和增加的自由度時並不清楚最優模型策略是什麽。我們提出了一種使用混合效應的兩階段多分類模型(MTOP)。在第一階段,使用一個標准的多分類模型來識別腫瘤特征交叉分類所定義的所有可能的亞型。在第二階段,根據基線亞型的病例-控制比和與腫瘤標記物相關的病例-病例參數,使用更簡潔的模型來指定亞型-特異性病例-控制比。此外,爲了減少自由度,我們使用隨機效應模型爲其他探索性標記估計病例-病例參數。我們使用期望最大化(EM)算法來估計腫瘤標記物上缺失的數據。通過仿真模擬和對波蘭乳腺癌研究(PBCS)數據分析,我們發現MTOP在識別風險位點和腫瘤亞型之間的異質關聯方面優于其他方法。我們還用標准方法和MTOP對GWAS分析中涉及的133384例乳腺癌病例和113789例對照,以及18908例BRCA1突變攜帶者(9414例乳腺癌)數據進行分析,識別出了32個新的易患乳腺癌基因。