Abstract
In genetic association studies, one analyzes associations between a (potentially very large) set of genetic markers and a phenotype of interest. This is a particular multiple test problem which has several challenging aspects, for instance the high dimensionality of the statistical parameter and the discreteness of the statistical model. In this chapter, we discuss how to fine-tune multiple tests that we have described theoretically in Part I in order to address these challenges. In particular, we propose the usage of realized randomized \(p\)-values in data-adaptive multiple tests and show how linkage disequilibrium among genetic markers can be employed to construct simultaneous test procedures and to establish probability bounds which lead to effective numbers of tests. Finally, we analyze (positive) dependency properties among test statistics and the applicability of standard margin-based multiple tests. The methods are applied to two real-life datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti A (2002) Categorical data analysis. Wiley Series in Probability and Mathematical Statistics, 2nd edn. Wiley, Chichester
Dickhaus T (2012) Simultaneous Statistical Inference in dynamic factor models. SFB 649 Discussion Paper 2012–033, Sonderforschungsbereich 649, Humboldt Universität zu Berlin, Germany. http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2012-033.pdf
Dickhaus T, Stange J (2013) Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate. Calcutta Statis Assoc Bull, to appear
Dickhaus T, Strassburger K, Schunk D, Morcillo-Suarez C, Illig T, Navarro A (2012) How to analyze many contingency tables simultaneously in genetic association studies. Stat Appl Genet Mol Biol 11(4):Article 12
Finner H, Straßburger K, Heid IM, Herder C, Rathmann W, Giani G, Dickhaus T, Lichtner P, Meitinger T, Wichmann HE, Illig T, Gieger C (2010) How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality. Stat Med 29(22):2347–2358
Herder C, Rathmann W, Strassburger K, Finner H, Grallert H, Huth C, Meisinger C, Gieger C, Martin S, Giani G, Scherbaum WA, Wichmann HE, Illig T (2008) Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies. Horm Metab Res 40:722–726
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000,529
Karlin S, Rinott Y (1980) Classes of orderings of measures and related correlation inequalities I. Multivariate totally positive distributions. J Multivariate Anal 10:467–498
Langaas M, Bakke Ø (2013) Robust Methods for Disease-Genotype Association in Genetic Association Studies: Calculate p-values using exact conditional enumeration instead of asymptotic approximations. arXiv:1307.7536v1
Lewontin RC, Kojima KI (1960) The evolutionary dynamics of complex polymorphisms. Evolution 14:458–472
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913
Meinshausen N, Meier L, Bühlmann P (2009) \(p\)-Values for high-dimensional regression. J Am Stat Assoc 104(488):1671–1681. doi: 10.1198/jasa.2009.tm08647
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573
Spokoiny V, Dickhaus T (2014) Basics of modern parametric statistics. Springer, Heidelberg, forthcoming
The 1000 Genomes Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073
The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437(7063):1299–1320
The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 hared controls. Nature 447(7):661–678
Wasserman L, Roeder K (2009) High-dimensional variable selection. Ann Stat 37(5A):2178–2201
Weir BS (1996) Genetic data analysis II. Sinauer Associates, Sunderland, MA
Wigginton JE, Cutler DJ, Abecasis GR (2005) A Note on Exact Tests of Hardy-Weinberg Equilibrium. Am J Hum Genet 76:887–893
Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, Sundvall J, Watanabe RM, Nagaraja R, Ebrahim S, Lawlor DA, Ben-Shlomo Y, Davey-Smith G, Shuldiner AR, Collins R, Bergman RN, Uda M, Tuomilehto J, Cao A, Collins FS, Lakatta E, Lathrop GM, Boehnke M, Schlessinger D, Mohlke KL, Abecasis GR (2008) Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 40:161–169
Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of genetic association studies. Statistics for biology and health. Springer, New York. doi:10.1007/978-1-4614-2245-7
Ziegler A, König IR (2006) A statistical approach to genetic epidemiology. Wiley, Weinheim
Acknowledgments
This chapter makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113. Parts of this chapter originated from joint work with Klaus Straßburger, Daniel Schunk, Carlos Morcillo-Suarez, Thomas Illig, Arcadi Navarro and Jens Stange. I am grateful to Mette Langaas and Øyvind Bakke for inviting me and for their hospitality during my visit to Norwegian University of Science and Technology (NTNU), for many fruitful discussions and for some valuable references.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dickhaus, T. (2014). Genetic Association Studies. In: Simultaneous Statistical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45182-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-45182-9_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45181-2
Online ISBN: 978-3-642-45182-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)