Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
Examination Committee
Prof Robert K M KO, LIFS/HKUST (Chairperson)
Prof Weichuan YU, ECE/HKUST (Thesis Supervisor)
Prof Xiaodan FAN, Department of Statistics, The Chinese University of Hong Kong (External Examiner)
Prof Matthew McKAY, ECE/HKUST
Prof Ling SHI, ECE/HKUST
Prof Huamin QU, CSE/HKUST
Abstract
Genome-wide association studies (GWASs) are widely used to discover single nucleotide polymorphisms (SNPs) associated with diseases. Commonly, we use a multi-stage setting to discover associations and to validate identified findings. Under such a setting, we discover associations in primary studies and validate findings in replication studies. Only the associations showing statistical significance in both studies are regarded as true findings. In this dissertation, we study three statistical issues in multi-stage GWASs. Another related statistical issue is how to improve power with multiple GWAS data sets. This dissertation also proposes a novel joint analysis method using summary statistics from multiple GWASs.
First, we study how to estimate the power of replication studies in multi-stage GWASs. In this dissertation, we propose an Empirical Bayes (EB)-based method to estimate the power of a replication study for each association. Experiments show that our method is better than traditional estimators in terms of overcoming the winner's curse and providing higher estimation accuracy.
Second, we study the probability of a primary association (i.e., statistically significant association in the primary study) being validated in the replication study. This dissertation proposes a Bayesian probabilistic measure, named the replication rate (RR), to find the answer. We further provide an estimation method for RR which makes use of the summary statistics from the primary study. We can use the estimated RR to determine the sample size of the replication study and to check the consistency between the results of the primary study and those of the replication study.
Third, we study how to determine significance levels in multi-stage settings. We propose a novel method to determine significance levels jointly. Experiments show that our method can provide more power than traditional methods and that the false discovery rate (Fdr) is well controlled.
Finally, we study joint analysis methods using summary statistics from multiple GWASs. We propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the Fdr at a certain level.