CORC  > 厦门大学  > 软件学院-学位论文
题名基于情感和异源异构数据融合的潜在关系发现模型研究; Research of Latent Semantic Discovery Model Based on Sentiment and Heterogeneous Data Integration
作者张晓霞
答辩日期2015 ; 2014
导师吴清强
关键词潜在语义分析 情感分析 数据融合 Latent Semantic Analysis Sentiment Analysis Heterogeneous Data Integration
英文摘要生物医学数据的数量正在爆炸式地增长,如此海量的数据给医学科学家研究新药带来丰富的理论支持,但研究者们通宵达旦地阅读文献也不及其增长速度,更不用说抽取出隐藏在其中的信息。因此从生物医学数据中自动提取和分析信息的系统变得越来越重要。本论文对科学文献中生物本体间的情感关系表达、潜在关系抽取以及异源异构数据融合三个方面进行研究。 随着信息存储多样化的发展,从单一的数据源中抽取信息有时不能满足科研工作者的知识需求,因此要求异源异构数据能实现集成信息服务,达到跨异构库知识发现的目的。为了解决该问题,本论文研究了基于数据融合和基于结果融合的两种潜在语义分析模型,前者将预处理后的数据源进行集成为一个数据集,然后继续对其进行分析。后者先独立地分析各数据源最后将结果集成。本论文通过实例验证,验证了两种集成方法的可行性和有效性。 本论文利用基于图的半监督学习算法,即标注传递算法,来自动识别出生物实体之间的情感关系表达。目前,大部分研究都采用有监督学习方法,而且通常能取得较好的性能,但是基于有监督学习的关系抽取模型需要大量有标签的训练数据作为样本集,这将需要花费大量的人力和时间,降低效率。而标签传递算法把标签信息从图中的任意一个节点通过加权的各边循环地传递到附近的其他节点,最终达到全局稳定从而推导出未标签节点的标注信息数据,实现当训练数据不足时改善学习性能。 本论文利用基于上下文环境的ABC模型去发现潜在关系,该模型能够挖掘多层级实体的潜在关系,从而获得更全面的结果数据。而且本论文跨越传统的数据构建方法,不用疾病-药物之间的关系直接检索,而是采用非相关关系的数据集作为数据源,即疾病-基因、基因-药物之间的关系,从而能够更全面的分析出疾病与药物之间的非相关潜在关系。; The number of biomedical data is growing explosively, such vast amounts of data brings abundant theoretical support for biomedical scientists researching new drugs, but even if they read the literatures day and night, they will not read all, let alone extract hidden information. So, the system of auto-extracting and analyzing information from biomedical data is more and more important. Meanwhile, with the development of biomedical study, the single data source can already not meet the increasing information needs so auto-discovery relationship model from heterogeneous data becomes very important in biomedical domain. The dissertation mainly studies emotional relationships between biological ontologies in biomedical literature, the potential relation extraction, as well as heterogeneous data integration. With the number of information format stored increasing ,the information drawn from single data source has been already unable to meet the information needs of researchers , thus scientific databases and scientific literature are required to achieve data integration, to discovery knowledge across the heterogeneous database. The dissertation studies two latent semantic analysis models, namely the Latent Semantic Analysis model based on results integration, and Latent Semantic Analysis model based on data integration. The former first analyzes data source , then integrates all results .And the latter integrates intermediate results to a new data set, and then continues analysis . The experiment verifies the feasibility and effectiveness of the two methods. The dissertation uses graph-based semi-supervised learning algorithm, label propagation method ,to automatically identify the relationship between biological entities. Extracting sentiment relationships between entities from the text automatically is an important direction in the field of text mining. Currently, supervised learning method is used in most of the studies, and usually performance nicely, but a large number of labels are required as sample set of training data, which will cost a lot of manpower and time, so that reducing efficiency. The label propagation method passes tag information from any node in the figure to other neighboring nodes by weighted edge recurrently, eventually reaching global stability so as to deduce the information data on not label node. And it can improve learning performance when the training data is not enough. In this dissertation, context -based ABC model is used to discover the multi-level potential relationship entities , and the non-correlation data sets, the relationship of disease-gene and gene-drug, is used as data source instead of traditional construction method, the relationship between disease-drug directly, to analyze more comprehensive potential relationship between disease and drug.; 学位:工程硕士; 院系专业:软件学院_软件工程; 学号:24320111152294
语种zh_CN
出处http://210.34.4.13:8080/lunwen/detail.asp?serial=43951
内容类型学位论文
源URL[http://dspace.xmu.edu.cn/handle/2288/82986]  
专题软件学院-学位论文
推荐引用方式
GB/T 7714
张晓霞. 基于情感和异源异构数据融合的潜在关系发现模型研究, Research of Latent Semantic Discovery Model Based on Sentiment and Heterogeneous Data Integration[D]. 2015, 2014.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace