Improving performance of matrix multiplication and FFT on GPU

CORC > 北京大学 > 信息科学技术学院

	Improving performance of matrix multiplication and FFT on GPU
	Cui, Xiang ; Chen, Yifeng ; Mei, Hong
	2009
英文摘要	In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former1, about 5% faster than the CUBLAS 2.0 library. Better FFT performance results are obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms. ? 2009 IEEE.; EI; 0
语种	英语
DOI标识	10.1109/ICPADS.2009.8
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/153569]
专题	信息科学技术学院
推荐引用方式 GB/T 7714	Cui, Xiang,Chen, Yifeng,Mei, Hong. Improving performance of matrix multiplication and FFT on GPU. 2009-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

暂无评论

评注功能仅针对注册用户开放，请您登录

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接