CORC  > 北京大学  > 地球与空间科学学院
GPU加速的改进PAM聚类算法研究与应用; Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU
周恩波 ; 毛善君 ; 李梅 ; 孙振明
刊名地球信息科学学报
2017
关键词K-Medoids GPU K-Medoids Simulate Anneal Arithmetic GPU Parallel Computing Spatial Clustering Analysis 模拟退火 并行计算 空间聚类分析
DOI10.3969/j.issn.1560-8999.2017.06.007
英文摘要空间聚类是空间数据挖掘的重要方法,而K-Medoids是一种常用的空间聚类算法.K-Medoids聚类算法存在初始点选择问题,而且计算复杂.为了提高算法的有效性和时间效率,本文结合模拟退火算法思想,改进了传统的K-Medoids算法PAM,提出一种基于GPU计算的并行模拟退火PAM算法.类比矩阵乘法运算,定义了一种新的矩阵计算方法,可以有效减少数据在GPU全局内存和共享内存之间的传输,提高了算法在GPU中的执行效率.利用模拟退火算法搜索聚类中心点,保证了聚类结果的全局最优性.基于不同的数据集,将串行和并行模拟退火PAM算法以及已有的遗传PAM算法进行比较,结果表明并行模拟退火PAM算法聚类结果正确,且时间效率高.最后,应用本文改进算法对贵州省安监系统的安全监管隐患数据进行聚类分析,发现了隐患聚集中心,相关结果对政府的决策具有一定的实际应用价值.; Spatial clustering is one of the most important methods in spatial data mining. As a common but powerful spatial clustering algorithm, K-Medoids is applied in many fields such as generalization of spatial entity information, spatial point pattern analysis and epidemiology application. However, K-Medoids algorithm meets two main challenges innately as follow. At first, K-Medoids has selection problem of the initial medoids. Different initial medoids may not attain the same clustering results which could lead to a non-optimal results sometimes. Furthermore, time efficiency of the algorithm is not satisfactory because there exist quantities of iterations to find the most suitable partition. Existing studies on the K-Medoids algorithm don't take the validness and time efficiency into consideration at the same time. Optimal methods like the Genetic Algorithm are applied to improve the validness of K-Medoids but the time efficiency is not acceptable when dealing with growing data. The MapReduce model is utilized to handle with data of high volume which can't adapt to some circumstances short of computer clusters. In order to improve the result validity and time efficiency of the algorithm, this paper revised the traditional K-Medoids algorithm of Partitioning Around Medoids (PAM) combining with the idea of the Simulate Anneal Arithmetic (SAA) and proposed a parallel Simulate Anneal Partitioning Around Medoids (SAPAM) algorithm which was implemented efficiently in Graphics Processing Units (GPUs). SAA algorithm is used to search for the initial medoids which promises the validness of the algorithm. The stochastic factor introduced in SAA algorithm gives the possibility of eliminating the local optima to attain the global optimal clustering results of PAM. To accelerate the clustering process, we design the parallel SAPAM algorithm to utilize quantities of GPU's threads which execute the program at the same time. By analogy with the matrix multiplication, a new matrix computation method is defined to reduce the time consumption of data transfer between GPU's global memory and shared memory. The matrix computation method reuses data in the shared memory of GPU and computes the distances between medoids and many points at a time which improve the algorithm's performance evidently. To validate the proposed algorithm, we generated eight datasets with different attributes and sizes randomly and conducted experiments on the eight datasets to compare the proposed parallel SAPAM algorithm with the traditional PAM algorithm, sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm. The experiment results showed that SAPAM algorithm attained more accurate clustering results compared with the traditional PAM and the parallel genetic K-Medoids algorithm. Besides, the proposed algorithm performed better than the sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm in time efficiency. According to the results, our GPU-based SAPAM algorithm was four to eight times faster than the traditional PAM algorithm. The results demonstrate that the proposed method can execute efficiently and attain a valid result. Finally, SAPAM algorithm was applied to analyze the safety monitoring data of Guizhou province to get the clustering pattern of the safety threats. The clustering results show us several clusters of the safety threats which may provide some practical application value to the governor.; 国家重点研发计划重点专项; 中国科学引文数据库(CSCD); 6; 782-791; 19
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/465849]  
专题地球与空间科学学院
推荐引用方式
GB/T 7714
周恩波,毛善君,李梅,等. GPU加速的改进PAM聚类算法研究与应用, Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU[J]. 地球信息科学学报,2017.
APA 周恩波,毛善君,李梅,&孙振明.(2017).GPU加速的改进PAM聚类算法研究与应用.地球信息科学学报.
MLA 周恩波,et al."GPU加速的改进PAM聚类算法研究与应用".地球信息科学学报 (2017).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace