CORC  > 计算技术研究所  > 中国科学院计算技术研究所
A neural topic model with word vectors and entity vectors for short texts
Zhao, Xiaowei1; Wang, Deqing1; Zhao, Zhengyang1; Liu, Wei2; Lu, Chenwei1; Zhuang, Fuzhen3,4
刊名INFORMATION PROCESSING & MANAGEMENT
2021-03-01
卷号58期号:2页码:11
关键词Topic model Short text Variational auto-encoder Word embedding Entity embedding
ISSN号0306-4573
DOI10.1016/j.ipm.2020.102455
英文摘要Traditional topic models are widely used for semantic discovery from long texts. However, they usually fail to mine high-quality topics from short texts (e.g. tweets) due to the sparsity of features and the lack of word co-occurrence patterns. In this paper, we propose a Variational Auto-Encoder Topic Model (VAETM for short) by combining word vector representation and entity vector representation to address the above limitations. Specifically, we first learn embedding representations of each word and each entity by employing a large-scale external corpora and a large and manually edited knowledge graph, respectively. Then we integrated the embedding representations into the variational auto-encoder framework and propose an unsupervised model named VAETM to infer the latent representation of topic distributions. To further boost VAETM, we propose an improved supervised VAETM (SVAETM for short) by considering label information in training set to supervise the inference of latent representation of topic distributions and the generation of topics. Last, we propose KL-divergence-based inference algorithms to infer approximate posterior distribution for our two models. Extensive experiments on three common short text datasets demonstrate our proposed VAETM and SVAETM outperform various kinds of state-of-the-art models in terms of perplexity, NPMI, and accuracy.
资助项目National Key R&D Program of China[2019YFA0707204] ; National Natural Science Foundation of China[U1836206]
WOS研究方向Computer Science ; Information Science & Library Science
语种英语
出版者ELSEVIER SCI LTD
WOS记录号WOS:000612229800005
内容类型期刊论文
源URL[http://119.78.100.204/handle/2XEOYT63/16197]  
专题中国科学院计算技术研究所
通讯作者Zhuang, Fuzhen
作者单位1.Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
2.Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing 100029, Peoples R China
3.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, CAS, Beijing 100190, Peoples R China
4.Chinese Acad Sci, Xiamen Data Intelligence Acad ICT, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Zhao, Xiaowei,Wang, Deqing,Zhao, Zhengyang,et al. A neural topic model with word vectors and entity vectors for short texts[J]. INFORMATION PROCESSING & MANAGEMENT,2021,58(2):11.
APA Zhao, Xiaowei,Wang, Deqing,Zhao, Zhengyang,Liu, Wei,Lu, Chenwei,&Zhuang, Fuzhen.(2021).A neural topic model with word vectors and entity vectors for short texts.INFORMATION PROCESSING & MANAGEMENT,58(2),11.
MLA Zhao, Xiaowei,et al."A neural topic model with word vectors and entity vectors for short texts".INFORMATION PROCESSING & MANAGEMENT 58.2(2021):11.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace