CORC  > 北京大学  > 信息科学技术学院
Improving performance of matrix multiplication and FFT on GPU
Cui, Xiang ; Chen, Yifeng ; Mei, Hong
2009
英文摘要In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former1, about 5% faster than the CUBLAS 2.0 library. Better FFT performance results are obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms. ? 2009 IEEE.; EI; 0
语种英语
DOI标识10.1109/ICPADS.2009.8
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/153569]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Cui, Xiang,Chen, Yifeng,Mei, Hong. Improving performance of matrix multiplication and FFT on GPU. 2009-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace