CORC  > 厦门大学  > 信息技术-学位论文
题名基于Tiling Array的拟南芥基因结构分析; Analysis of Arabidopsis Genes Structure Based on Tiling Array
作者林常胜
答辩日期2008 ; 2008
导师吉国力
关键词生物信息学 Bioinformation 序列分割 Sequence Segmentation 基因结构分析 Analysis of Arabidopsis Genes Structure
英文摘要本文是和美国迈阿密大学植物科学系Dr. Q. Quinn Li合作,基于该课题组提供的植物拟南芥不同的细胞类型下的野生,突变,互补,DNA四种类型的转录样本数据以及对其转录结构的研究成果,借助基因相关软件、应用计算机和数学算法对基因组再注释进行研究。在对寻找新基因算法、预测蛋白质结构与功能的算法以及数据的可视化分析和研究领域中,从大量的、不完全的、有噪声的、模糊的、随机的数据中提取有用信息和知识,找到基因组序列中代表蛋白质和RNA基因的编码区,同时阐明基因中大量存在的非编码区的信息实质,一直是一个饶有趣味并富有挑战性的课题。随着生物学与生物信息学的发展, 基因片段分割作为基因结构分析重要的前期工作也越来越受到更多人的关注,对基因片段分割的精确性以及有效性提出了更高的要求。而通过比较已知全基因组注释文件判断基因编码的起止位置,以及内含子和外显子的分割边界,通过数据可视化效果来验证基因片段分割的精确性以及有效性,这在基因功能和转录本分析中有重要的应用意义。但由于生物芯片本身存在的缺陷和噪声干扰以及真核细胞基因结构表现出分散性、多样性以及复杂性的特点,所以对基因结构分析中未知元素造成的误差以及选择一个最佳的停止标准认识十分有限,以及存在序列分割或比对过程中耗损的时间过长、效率不够等问题。至今还没有看到利用Tiling Array芯片杂交反应后的数据分析拟南芥基因结构的正式文献报道。 本论文通过各种生物信息处理软件和数学算法,探索拟南芥基因结构分析的有效方法和数据可视化实现。本文首先结合Partek软件实现DNA Reference算法,对庞大的探针数据集进行预处理,使得目标序列不依赖反应寡核苷酸探针的亲合度,以实现不同的探针之间信号的定量可比性,然后针对本文的实际情况对探针数据进行精简,采用动态规划思想构建SCM模型对数据进行分割,使用大量的统计方法分析数据,估计模型中状态参数,并得到各个分割点,将探针强度,注释信息,分段信息存入MySQL数据库,提供自行设计ProbeViewer软件,展示分割效果,辅助生物学家直观分析基因结构。; In cooperation with Dr.Q.Quinn Li at the Department of Botany, Miami University, this thesis was accomplished on the study of genome re-annotation by means of gene-related program, computer calculations and mathematical algorithms, based on the data from transcription samples of different cell model of Arabidopsis thaliana, which in includes four types: wild, mutant, complementary & DNA, and the research achievements on their Tran scripted structures, both provided by Dr. Li's research group. In the research field of searching new gene algorithm, algorithm for protein structure and function prediction, and data visualization analyses, there always be a interesting and challenging project to find CDS representing protein & RNA gene on the genome sequences, and describe the essence of enormous information existing in non-CDS, by searching useful information and knowledge in huge, incomplete, noisy, obscure & random background data. Along with the development of biology and bioinformatics, genes segment segmentation, as the important preliminary work for genes structure analyses, receives more and more attentions from researchers, and the higher accuracy and validity for genes segment segmentation are also required. While through determining the start-stop positions of CDS by comparing the known genome annotation, as well as the segmenting boundaries between intron and exon, the accuracy and validity of genes segment segmentation were verified by data visualization, and this is very significant for applications on the function of gene and the analyses of transcripts. But the knowledge on errors caused by unknown elements in analysis of Arabidopsis genes structure and choosing of optimized cut-off standard are very limited, and it is inefficient and takes too long time in segmenting or comparing the existing gene sequences, due to the intrinsic defect & noise of bio-chips, and the dispersivity, diversity & complexity of eukaryotic cell gene structure. So far, no article about data obtained after hybridizing reaction of Tiling Array chip in analysis of Arabidopsis genes structure, is reported officially. In this thesis, the investigations on effective methods and data visualization of analysis of Arabidopsis genes structure were conducted by employing various bio-signal processing programs and mathematical algorithms. First, the pretreatments on enormous probed data group were done by DNA reference algorithm, combining Partek software, in order to adjust the sequence-dependent response of the oligonucleotide probes, achieve quantitative comparability of the signal between different probes. Then the simplification of probe data was carried out according to the practical situation in this paper, through section segmentation of data with SCM modeling constructed by dynamic algorithm. The separating points and state parameters in the model are obtained by using a lot of statistical methods on data analyses. The separating results are displayed on self-made ProbeViewer program through inputting the data table of probe intensity, annotation and section information into MySQL database,and aid biologists intuitive analysis of gene structure.; 学位:工学硕士; 院系专业:信息科学与技术学院自动化系_系统科学; 学号:22320051302489
语种zh_CN
出处http://210.34.4.13:8080/lunwen/detail.asp?serial=20384
内容类型学位论文
源URL[http://dspace.xmu.edu.cn/handle/2288/50505]  
专题信息技术-学位论文
推荐引用方式
GB/T 7714
林常胜. 基于Tiling Array的拟南芥基因结构分析, Analysis of Arabidopsis Genes Structure Based on Tiling Array[D]. 2008, 2008.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace