CORC  > 北京大学  > 信息科学技术学院
词边界字向量的中文命名实体识别; Chinese named entity recognition via word boundary based character embedding
姚霖 ; 刘轶 ; 李鑫鑫 ; 刘宏
刊名智能系统学报
2016
关键词机器学习 中文命名体识别 深度神经网络 特征向量 特征提取 machine learning Chinese named entity recognition deep neutral networks feature vector feature ex-traction
DOI10.11992/tis.201507065
英文摘要常见的基于机器学习的中文命名实体识别系统往往使用大量人工提取的特征,但特征提取费时费力,是一件十分繁琐的工作。为了减少中文命名实体识别对特征提取的依赖,构建了基于词边界字向量的中文命名实体识别系统。该方法利用神经元网络从大量未标注数据中,自动抽取出蕴含其中的特征信息,生成字特征向量。同时考虑到汉字不是中文语义的最基本单位,单纯的字向量会由于一字多义造成语义的混淆,因此根据同一个字在词中处于不同位置大多含义不同的特点,将单个字在词语中所处的位置信息加入到字特征向量中,形成词边界字向量,将其用于深度神经网络模型训练之中。在Sighan Bakeoff?3(2006)语料中取得了F189.18%的效果,接近当前国际先进水平,说明了该系统不仅摆脱了对特征提取的依赖,也减少了汉字一字多义产生的语义混淆。; Most Chinese named entity recognition systems based on machine learning are realized by applying a large amount of manual extracted features. Feature extraction is time?consuming and laborious. In order to remove the dependence on feature extraction, this paper presents a Chinese named entity recognition system via word boundary based character embedding. The method can automatically extract the feature information from a large number of unlabeled data and generate the word feature vector, which will be used in the training of neural network. Since the Chinese characters are not the most basic unit of the Chinese semantics, the simple word vector will be cause the semantics ambiguity problem. According to the same character on different position of the word might have different meanings, this paper proposes a character vector method with word boundary information, constructs a depth neural network system for the Chinese named entity recognition and achieves F1 89.18% on Sighan Bakeoff?3 2006 MSRA corpus. The result is closed to the state?of?the?art performance and shows that the system can avoid rel?ying on feature extraction and reduce the character ambiguity.; 原创项目研发与非遗产业化资助项目( YC2015057).; 中文核心期刊要目总览(PKU); 中国科技核心期刊(ISTIC); 中国科学引文数据库(CSCD); 1; 37-42; 11
语种中文
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/448882]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
姚霖,刘轶,李鑫鑫,等. 词边界字向量的中文命名实体识别, Chinese named entity recognition via word boundary based character embedding[J]. 智能系统学报,2016.
APA 姚霖,刘轶,李鑫鑫,&刘宏.(2016).词边界字向量的中文命名实体识别.智能系统学报.
MLA 姚霖,et al."词边界字向量的中文命名实体识别".智能系统学报 (2016).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace