题名GMD 2.0的建立 – PME的GPU加速及其它改进
作者石静
学位类别硕士
答辩日期2012-05-31
授予单位中国科学院研究生院
导师李晓霞
关键词分子动力学模拟(MD) PME GMD CUDA GPU
其他题名Establishment of GMD 2.0 Version – GPU-enabled Implementation of PME and other Extensions
学位专业材料工程
中文摘要分子动力学模拟(MD)是分子模拟的一类常用方法,为生物体系的模拟提供了重要途径。由于计算强度大,目前MD可模拟的时空尺度还不能满足真实物理过程的需要。作为CPU的加速设备,近年来,GPU的快速发展为提高MD计算能力提供了新的可能。研究如何利用GPU提升分子动力学程序的计算性能对提高MD可模拟时空尺度具有重要的意义。静电效应广泛存在于生物现象的各个方面,如多肽链折叠、酶活性及蛋白质自组装等,对其精确模拟是MD的重要组成部分。静电作用属于长程作用,在分子动力学模拟中最为耗时。若采用截断方法计算,GPU并行可以取得相当可观的加速性能,但截断带来的误差大。Particle-Mesh-Ewald(PME)方法是公认的精确处理静电作用的算法之一。本文介绍在课题组已建立的GPU加速分子动力学模拟程序GMD 1.0的基础上,基于NVIDIA CUDA,采用GPU实现PME算法的策略,将PME算法整合到GMD程序的同时,对GMD 1.0进行了若干改进以增强其通用性,建立了GMD 2.0版本。主要工作包括: (1)基于CUDA开发环境,实现了精确计算静电作用的PME算法。GPU编程难点主要在于如何将计算任务合理分解及映射到GPU端并选用合适的存储器类型,细致地平衡数据传输和指令吞吐量以发挥GPU的最大计算性能。针对PME算法中组成静电作用的实空间、傅立叶空间及能量修正项等三部分各自的特点,分别采用不同的计算任务组织策略。其中,计算最为复杂的傅立叶空间主要分为电荷扩散、FFT变换、势能计算、力的计算等环节,对于多数环节,提出不同的GPU实现策略。采用不同规模的体系对傅立叶空间计算的各个环节进行测试,均获得了几倍至几十倍的加速。(2)将PME正确嵌入GMD 1.0中。在此基础上,通过修改成键作用的读写冲突解决方法,补充非正常二面角、库仑1-4作用及范德华1-4作用的计算,建立以charge group为单位的邻居列表等改进,显著扩展了GMD 1.0的通用性,建立了GMD 2.0版本,目前基本支持Dreiding II、Amber 03力场。(3)采用不同算例对GMD 2.0程序进行正确性验证,通过和Gromacs的能量及温度数据对比,结果表明GMD 2.0计算结果与Gromacs 4.5.3 CPU版本基本一致,并且计算精度优于基于OpenMM 2.0加速的Gromacs 4.5.3 GPU版本。利用大多数MD软件采用的dhfr算例对GMD 2.0进行了整体性能测试,结果表明,相比于Gromacs 4.5.3版本,GMD 2.0获得明显的性能提升,加速比分别是其单核CPU性能的3.93倍、8核CPU性能的1.5倍、GPU版本的1.87倍。
英文摘要Molecular dynamics (MD) is a basic method for molecular modelling that offers a computational approach to study the behavior of biomolecules at atomic detail, but such simulations are still quite limited in size and timescale to meet the spatio-temporal scales of real world physical process because MD is computational intensive. Owing to the recent advances in the hardware and software architecture, the graphics processing unit (GPU) has shown its potential to accelerate MD simulation. Thus, how to maximize the performance of GPU-enabled MD program is critical for enlarging spatial and temporal scales in MD. Electrostatic effects play an important role in various biological processes, such as the polypeptide chain folding, enzyme activity and protein self-assembly. Accurate simulation of electrostatic interactions is essential for MD. Using the truncated method, electrostatic interactions can get considerable acceleration as van der Waals interactions. But, the cut-off method is approximate and for biological systems, people tend to select more accurate algorithms. Particle-Mesh-Ewald (PME) is such an algorithm for accurate calculation of the electrostatic interactions in MD. This paper mainly presents a GPU-enabled implementation of PME, and embeding of it into GMD, a GPU based molecular dynamics program. On this basis, some improvments are made to establish GMD 2.0, which greatly extends the former GMD 1.0 version. The thesis can be summarized as the following. The challenges in GPU programming are how to distribute computing tasks and map them to the proper threads organization and memory hierarchy on GPU, while care should be taken to balance memory transfer and instruction throughput for best performance. Therefore, proper strategies are carefully designed and implemented for the direct space sum, reciprocal space sum and energy correction term in PME to improve the overall performance. The most challenging part is the reciprocal space sum, which could be divided into charge spreading, 3D FFT, mesh sum for energy and coulomb force calculation. For each part, more than one GPU based parallel strategies have been investigated and speedups ranging from several to dozens folds are obtained for performance tests. The GPU-enabled PME codes have been properly embedded into GMD. The GMD program is further extended by firstly modifying the method to solve read-write conflict in bond interaction computing, then adding calculations of improper dihedral angle interation, LJ and coulomb 1-4 interactions, and lastly establishing neighbor lists based on charge group, which results in the GMD 2.0. The energy and temperature data calculated by GMD 2.0 has good agreement with Gromacs 4.5.3 CPU, and even better than the Gromacs 4.5.3 GPU version does. With the de facto standard benchmark of dhfr, GMD 2.0 obtains speedups of 3.93, 1.5 and 1.87 compared with Gromacs 4.5.3 single core CPU version, 8 cores CPU version and GPU version (OpenMM 2.0 based) respectively.
语种中文
公开日期2013-09-25
内容类型学位论文
源URL[http://ir.ipe.ac.cn/handle/122111/1808]  
专题过程工程研究所_研究所(批量导入)
推荐引用方式
GB/T 7714
石静. GMD 2.0的建立 – PME的GPU加速及其它改进[D]. 中国科学院研究生院. 2012.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace