Objective: To extract the relevant SNPs for alcoholism using sib-pair IBD profiles of pedigrees.Methods: We used the ensemble decision approach, a supervised learning approach based on decision forests, to locate alcoholism relevant SNPs using genome-wide SNP data. Results: Application to a publicly available large dataset of 100 simulated replicates for three American populations (http://www.gaworkshop.org/) demonstrates that the proposed approach has successfully located all of the simulated true loci.Conclusion: The numerical results establish the proposed decision forest analysis to be a powerful and practical alternative for large-scale family-based association study.
Objective: To develop novel strategies to identify relevant molecular signatures for complex human diseases based on data of identical-by-decent profiles and genomic context.Methods: In the proposed strategies, we define four relevancy criteria for mapping SNP-phenotype relationships-point-wise IBD mean difference, averaged IBD difference for window, Z curve and averaged slope for window.Results: Application of these criteria and permutation test to 100 simulated replicates for two hypothetical American populations to extract the relevant SNPs for alcoholism based on sib-pair IBD profiles of pedigrees demonstrates that the proposed strategies have successfully identified most of the simulated true loci.Conclusion: The data mining practice implies that IBD statistic and genomic context could be used as the informatics for locating the underlying genes for complex human diseases. Compared with the classical Haseman-Elston sib-pair regression method, the proposed strategies are more efficient for large-scale genomic mining.