Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
Ye CX[*]1; Ma ZS[*]2
刊名PeerJ
2016
卷号4期号:X页码:e2016
关键词Consensus algorithm Genome assembly Variant discovery Single molecular sequencing Third generation sequencing technology
通讯作者cxy@umd.edu ; ma@vandals.uidaho.edu
合作状况其它
英文摘要Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time.
收录类别其他
资助信息The research received funding from the following sources: NSFC (Grant No: 61175071 & 71473243) and ‘‘Exceptional Scientists Program of Yunnan Province, China.’’
语种英语
内容类型期刊论文
源URL[http://159.226.149.26:8080/handle/152453/10032]  
专题昆明动物研究所_计算生物与生物信息学
昆明动物研究所_遗传资源与进化国家重点实验室
作者单位1.Department of Computer Science, University of Maryland, College Park, MD, USA
2.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
推荐引用方式
GB/T 7714
Ye CX[*],Ma ZS[*]. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads[J]. PeerJ,2016,4(X):e2016.
APA Ye CX[*],&Ma ZS[*].(2016).Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads.PeerJ,4(X),e2016.
MLA Ye CX[*],et al."Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads".PeerJ 4.X(2016):e2016.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace