After stepping out the valley of net economy, B2C e-commerce is about to come into a climax of develop-ment. Information extraction techniques are going to be one of the most important factors to promote B2C e-com-merce. In this paper, we present a review on the recent progress of information extraction techniques applied to B2Ce-commerce. The characteristics of each technique are also appraised.
本文提出了一种基于关联规则的中文概念集生成算法。该算法首先产生文档的中文关键词集,采用向量空间模型VSM(vector space model)表示文档;然后以中文关键词为事务项,以中文文档为事务,采用成熟的关联规则算法发现中文关键词频繁集;再生成原始概念集并对原始概念集进行聚类,最终实现了中文概念集的自生成.同时该算法能引入增量更新的特性,对概念集进行增量更新。通过实验,表明该算法能有效地生成中文概念集.可以用之于对表示中文文档的高维特征向量的语义降维,具有一定的使用价值。