Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.
<正>Attribute reduction is a form of the data reduction,usually as a preprocessing step in data mining.Its job ...
Shifei Ding~(1,2),Hao Ding~1 1.School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,221116 2.Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080
Learning to rank is designed to determine a ranking for the target objects according to some rule.Specifically...
DING Shi-Fei 1,2,LIU Xiao-Liang 1,ZHANG Li-Wen 1 1.School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221008,P.R.China 2.Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,P.R.China