CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 硕士学位论文
题名基于生物网络的关联模式挖掘方法研究
作者左晓晗
学位类别工学硕士
答辩日期2012-05-24
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师杨一平
关键词人工智能 数据挖掘 特征选择 生物网络 证候模式 问诊模型 表型 基因 Artificial Intelligence Data Mining Feature Selection Biological Network Phenotype Pattern Phenotype Gene
其他题名Pattern Mining based on Biological Network
学位专业计算机应用技术
中文摘要模式识别是人工智能学科中的一个重要的研究领域,运用数据挖掘方法解决研究对象的模式识别问题,称为模式挖掘。中医在冠心病诊断和治疗方面有着完善的理论基础和成熟的方法体系,然而中医的基本概念、理论和方法是建立在中国古代阴阳八卦哲学基础上的,对证候模式的描述难以理解和量化,其知识是以非结构化的形式存在的。本文以证候、表型、基因等实体所构成的生物网络为基础,着重研究中西医在冠心病诊疗中相关概念的关联关系,采用特征选择,关系提取等方法,构建了证候-表型、证候-基因关联模式,对其关系提取和计算方法进行了深入研究,取得了一定的成果。本文主要工作分为以下几个部分: 一、 基于特征选择的模式挖掘 ·特征选择构建冠心病证候-表型模式构建:在疾病证候-表型模式构建中,各表型(特征)互相依赖、互相影响,常见的特征选择方法已不再适用,本文提出了基于改进的Markov Blanket算法来分析中医证候与表型的关联关系,确定与证候相关的表型集合,构建证候-表型关联模式。 ·分类算法构建证候问诊模型:在确定了证候-表型模式后,以冠心病作为实例,分别使用神经网络、支持向量机、决策树和贝叶斯网络构建证候分类器,对给定的病例数据判定其证候诊断结果,从而实现证候问诊模型。 二、 基于文本挖掘和推理网络的模式挖掘 ·利用于文本挖掘构建表型-基因模式:充分利用OMIM数据库精心维护、更新及时、可靠性较高等特点,采用文本挖掘的方法提取隐含其中的表型-基因关联关系,并将这些关系作为构建表型-基因关联模式的基础。 ·利用标签传播算法挖掘潜在关联模式:OMIM数据库所收录的表型和基因数量有限,对于没有包含在其中的表型与基因间的关系,本文利用基因间隐含在蛋白质反应网络中的拓扑结构信息,采用网络标签传播算法挖掘潜在表型-基因关联模式,提出了相应的算法并给出了预测结果。
英文摘要The theory of Artificial Intelligence (AI) has been thoroughly researched and successfully applied to the extraction of relationship between all kinds of items. Traditional Chinese Medicine(TCM) and western medicine have got their own theoretical basis and well developed systems on disease diagnosis and therapy, but some of the items in TCM are based on philosophical concepts of Ancient China, so they are difficult to be interpreted and hard to be quantitated. In this paper we focused on the relationship of the biological network of Zheng、 phenotypes and genes, intended to draw the patterns of Zheng-phenotype and phenotype-gene. We proposed two algorithms for the problems that existed in the patterns generation and presented the result using our methods. The main work of this paper contains: 一、 Pattern mining based on feature selection ·Zheng-phenotype patterns mining based on feature selection: The correlative dependence and influence of phenotypes is a big problem in the construction of Zheng-phenotype, normal feature selection algorithms cannot be used here. We proposed an improved feature selection algorithm based on Markov Blanket and used it to analysis the correlation between Zheng and phenotypes calculate the feature subset against Zheng and generate patterns of Zheng-phenotypes. ·Construction of diagnose model based on classification: Based on the patterns of Zheng-phenotype, we trained six classifiers using Bayesian network, Naive Bayesian, logistic regression, support vector machine(SVM), K-nearest neighbor(KNN) and decision tree, and presented the classification results given new patients' records. 二、Pattern mining based on text mining and inference network ·Construction of phenotype-gene patterns based on text mining: The records of Online Mendelian Inheritance in Man(OMIM) are manually maintained by experts in the field and have high reliability, we used the records in our paper to mine the relationship between phenotypes and genes. The relationship mining from OMIM were treated as the foundation of the phenotype-gene patterns. ·Mining the implied patterns using Label Propagation algorithm: The patterns mining from OMIM only cover a small part of the phenotypes and genes. For the rest of the phenotypes and genes, we proposed a Label Propagation algorithm based on the topology of protein–protein interactions(PPIs) network to generate the phenotypes-genes patterns.
语种中文
其他标识符200928014629091
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/7621]  
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
左晓晗. 基于生物网络的关联模式挖掘方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2012.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace