Reconstruction of genetic networks is one of the key scientific challenges in functional genomics. This paper describes a novel approach for addressing the regulatory dependencies be-tween genes whose activities can be delayed by multiple units of time. The aim of the proposed ap-proach termed TdGRN (time-delayed gene regulatory networking) is to reversely engineer the dy-namic mechanisms of gene regulations, which is realized by identifying the time-delayed gene regu-lations through supervised decision-tree analysis of the newly designed time-delayed gene expres-sion matrix, derived from the original time-series microarray data. A permutation technique is used to determine the statistical classification threshold of a tree, from which a gene regulatory rule(s) is ex-tracted. The proposed TdGRN is a model-free approach that attempts to learn the underlying regula-tory rules without relying on any model assumptions. Compared with model-based approaches, it has several significant advantages: it requires neither any arbitrary threshold for discretization of gene transcriptional values nor the definition of the number of regulators (k). We have applied this novel method to the publicly available data for budding yeast cell cycling. The numerical results demonstrate that most of the identified time-delayed gene regulations have current biological knowledge supports.
JIANG Wei1,2, LI Xia1,2,3,4, GUO Zheng1,2,3, LI Chuanxing1, WANG Lihong1 & RAO Shaoqi1,5 1. Department of Bioinformatics, Harbin Medical University, Harbin 150086, China
基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-WayClustering,HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuselargeB-celllymphomaDLBCL)芯片数据集,通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05),因此,HCTWC方法在解决疾病异质性是有效的。
Identifying disease-relevant genes and functional modules, based on gene expression pro- files and gene functional knowledge, is of high im- portance for studying disease mechanisms and sub- typing disease phenotypes. Using gene categories of biological process and cellular component in Gene Ontology, we propose an approach to selecting func- tional modules enriched with differentially expressed genes, and identifying the feature functional modules of high disease discriminating abilities. Using the differentially expressed genes in each feature module as the feature genes, we reveal the relevance of the modules to the studied diseases. Using three data- sets for prostate cancer, gastric cancer, and leukemia, we have demonstrated that the proposed modular approach is of high power in identifying functionally integrated feature gene subsets that are highly rele- vant to the disease mechanisms. Our analysis has also shown that the critical disease-relevant genes might be better recognized from the gene regulation network, which is constructed using the characterized functional modules, giving important clues to the concerted mechanisms of the modules responding to complex disease states. In addition, the proposed approach to selecting the disease-relevant genes byjointly considering the gene functional knowledge suggests a new way for precisely classifying disease samples with clear biological interpretations, which is critical for the clinical diagnosis and the elucidation of the pathogenic basis of complex diseases.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.
GAO Lei1, LI Xia1,2, GUO Zheng1,2, ZHU MingZhu1, LI YanHui1 & RAO ShaoQi1,3 1 Department of Bioinformatics, Harbin Medical University, Harbin 150086, China
Selecting differentially expressed genes(DEGs) is one of the most important tasks in microarray applications for studying multi-factor diseases including cancers.However,the small samples typically used in current microarray studies may only partially reflect the widely altered gene expressions in complex diseases,which would introduce low reproducibility of gene lists selected by statistical methods.Here,by analyzing seven cancer datasets,we showed that,in each cancer,a wide range of functional modules have altered gene expressions and thus have high disease classification abilities.The results also showed that seven modules are shared across diverse cancers,suggesting hints about the common mechanisms of cancers.Therefore,instead of relying on a few individual genes whose selection is hardly reproducible in current microarray experiments,we may use functional modules as functional signatures to study core mechanisms of cancers and build robust diagnostic classifiers.
YAO ChenZHANG MinZOU JinFengLI HongDongWANG DongZHU JingGUO Zheng