Artikel
Comparison of methods aiming to detect causal genes in datasets including rare variants
Suche in Medline nach
Autoren
Veröffentlicht: | 13. September 2012 |
---|
Gliederung
Text
Background: Rare causal variants are believed to fill a significant part of the observed gap between heritability estimates of common diseases or quantitative traits and explained variance by discovered common genetic variants. Next generation sequencing methods allow the identification of rare variants for a reasonable number of individuals which can be analyzed for screening of disease variants or genetic modifiers of traits. Statistical methods to detect these variants are required and should be appropriately compared and characterized.
Method: The publically available GAW17 dataset is based on a preliminary 1,000 Genomes dataset of 697 individuals combined with a simulated complex disease model comprising intermediate quantitative phenotypes. This data set was used to assess and compare strategies for selecting candidate loci. Our approaches are either genome-wide considering all markers simultaneously or gene-centric, i.e. we aim to select candidate genes rather than markers. For this purpose, we analyse and compare a number of uni- and multivariate methods including marginal correlation, Hotelling test, combination of both, LASSO, Boosting, correlation-adjusted t-score (CAT score) and the correlation-adjusted marginal correlation (CAR score). Methods are evaluated on the basis of top-gene lists for three different phenotypes including both, categorically and continuously distributed traits.
Results and Discussion: We detect clear differences between methods. Detailed analysis of the causal gene characteristics reveals conditions under which particular methods perform well. Exemplarily, in gene-wise analysis, the marginal statistic was superior when there is a single causal marker with a dominating effect in the gene and when a relatively liberal cut-off of the gene-list is used, while the Hotelling test was superior when there are several independent causal markers of the gene and when a stringent cut-off of the gene-list is used. Interestingly, in gene-wise analysis, more elaborated methods for regression analysis (LASSO, Boosting, and CAT / CAR scores) did not generally perform better compared with these statistics. We discuss recommendations for the application of the methods in screening of disease variants or genetic modifiers of quantitative traits.