Relative term-frequency based feature selection for text categorization

CORC > 北京大学 > 信息科学技术学院

	Relative term-frequency based feature selection for text categorization
	Yang, SM ; Wu, XB ; Deng, ZH ; Zhang, M ; Yang, DQ
	2002
关键词	text categorization feature selection relative term frequency
英文摘要	Automatic feature selection methods such as document frequency (DF), information gain (IG), mutual information (MI) and so on are commonly applied in the preprocess of text categorization in order to reduce the originally high feature dimension to a bearable level, meanwhile reduce noise to improve precision. Generally they assess a specific term by calculating its occurrences among individual categories or in the entire corpus, where "occurring in a document" is simply defined as occurring at least once. A major drawback of this measure is that, for a single document, it might count a recurrent term the same as a rare term, while the former term is obviously more informative and should less likely be removed. In this paper we propose a possible approach to overcome this problem, which adjusts the occurrences count according to the relative term frequency, thus stressing those recurrent words in each document. While it can be applied to all feature selection methods, we implemented it on several of them and see notable improvements in the performances.; Automation & Control Systems; Computer Science, Artificial Intelligence; Computer Science, Cybernetics; Engineering, Electrical & Electronic; CPCI-S(ISTP); 0
语种	英语
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/293890]
专题	信息科学技术学院
推荐引用方式 GB/T 7714	Yang, SM,Wu, XB,Deng, ZH,et al. Relative term-frequency based feature selection for text categorization. 2002-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们