Learning outliers to refine a corpus for Chinese webpage categorization

CORC > 北京大学 > 信息科学技术学院

	Learning outliers to refine a corpus for Chinese webpage categorization
	Luo, Dingsheng ; Wang, Xinhao ; Wu, Xihong ; Chi, Huisheng
	2005
英文摘要	Webpage categorization has turned out to be an important topic in recent years. In a webpage, text is usually the main content, so that auto text categorization (ATC) becomes the key technique to such a task. For Chinese text categorization as well as Chinese webpage categorization, one of the basic and urgent problems is the construction of a good benchmark corpus. In this study, a machine learning approach is presented to refine a corpus for Chinese webpage categorization, where the AdaBoost algorithm is adopted to identify outliers in the corpus. The standard k nearest neighbor (kNN) algorithm under a vector space model (VSM) is adopted to construct a webpage categorization system. Simulation results as well as manual investigation of the identified outliers reveal that the presented method works well. ? Springer-Verlag Berlin Heidelberg 2005.; EI; 0
语种	英语
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/295273]
专题	信息科学技术学院
推荐引用方式 GB/T 7714	Luo, Dingsheng,Wang, Xinhao,Wu, Xihong,et al. Learning outliers to refine a corpus for Chinese webpage categorization. 2005-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

暂无评论

评注功能仅针对注册用户开放，请您登录

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接