Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads | |
Ye CX[*]1; Ma ZS[*]2 | |
刊名 | PeerJ |
2016 | |
卷号 | 4期号:X页码:e2016 |
关键词 | Consensus algorithm Genome assembly Variant discovery Single molecular sequencing Third generation sequencing technology |
通讯作者 | cxy@umd.edu ; ma@vandals.uidaho.edu |
合作状况 | 其它 |
英文摘要 | Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. |
收录类别 | 其他 |
资助信息 | The research received funding from the following sources: NSFC (Grant No: 61175071 & 71473243) and ‘‘Exceptional Scientists Program of Yunnan Province, China.’’ |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://159.226.149.26:8080/handle/152453/10032] |
专题 | 昆明动物研究所_计算生物与生物信息学 昆明动物研究所_遗传资源与进化国家重点实验室 |
作者单位 | 1.Department of Computer Science, University of Maryland, College Park, MD, USA 2.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China |
推荐引用方式 GB/T 7714 | Ye CX[*],Ma ZS[*]. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads[J]. PeerJ,2016,4(X):e2016. |
APA | Ye CX[*],&Ma ZS[*].(2016).Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads.PeerJ,4(X),e2016. |
MLA | Ye CX[*],et al."Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads".PeerJ 4.X(2016):e2016. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论