Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

CORC > 昆明动物研究所 > 昆明动物研究所 > 遗传资源与进化国家重点实验室 > 计算生物与生物信息学

	Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
	Ye CX[]1; Ma ZS[]2
刊名	PeerJ
	2016
卷号	4 期号:X 页码:e2016
关键词	Consensus algorithm Genome assembly Variant discovery Single molecular sequencing Third generation sequencing technology
通讯作者	cxy@umd.edu ; ma@vandals.uidaho.edu
合作状况	其它
英文摘要	Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time.
收录类别	其他
资助信息	The research received funding from the following sources: NSFC (Grant No: 61175071 & 71473243) and ‘‘Exceptional Scientists Program of Yunnan Province, China.’’
语种	英语
内容类型	期刊论文
源URL	[http://159.226.149.26:8080/handle/152453/10032]
专题	昆明动物研究所_计算生物与生物信息学昆明动物研究所_遗传资源与进化国家重点实验室
作者单位	1.Department of Computer Science, University of Maryland, College Park, MD, USA 2.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
推荐引用方式 GB/T 7714	Ye CX[],Ma ZS[]. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads[J]. PeerJ,2016,4(X):e2016.
APA	Ye CX[],&Ma ZS[].(2016).Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads.PeerJ,4(X),e2016.
MLA	Ye CX[*],et al."Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads".PeerJ 4.X(2016):e2016.