Mining maximal correlated member clusters in high dimensional database | |
Jiang, LZ ; Yang, DQ ; Tang, SW ; Ma, XL ; Zhang, DH | |
2006 | |
英文摘要 | Mining high dimensional data is an urgent problem of great practical importance. Although some data mining models such as frequent patterns and clusters have been proven to be very successful for analyzing very large data sets, they have some limitations. Frequent patterns are inadequate to describe the quantitative correlations among nominal members. Traditional cluster models ignore distances of some pairs of members, so a pair of members in one big cluster may be far away. As a combination and complementary of both techniques, we propose the Maximal-Correlated-Member-Cluster (MCMC) model in this paper. The MCMC model is based on a statistical measure reflecting the relationship of nominal variables, and every pair of members in one cluster satisfy unified constraints. Moreover, in order to improve algorithm's efficiency, we introduce pruning techniques to reduce the search space. In the first phase, a Tri-correlation inequation is used to eliminate unrelated member pairs, and in the second phase, an Inverse-Order-Enumeration-Tree (IOET) method is designed to share common computations. Experiments over both synthetic datasets and real life datasets are performed to examine our algorithm's performance. The results show that our algorithm has much higher efficiency than the naive algorithm, and this model can discover meaningful correlated patterns in high dimensional database.; Computer Science, Artificial Intelligence; Computer Science, Information Systems; SCI(E); CPCI-S(ISTP); 1 |
语种 | 英语 |
内容类型 | 其他 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/292204] |
专题 | 信息科学技术学院 |
推荐引用方式 GB/T 7714 | Jiang, LZ,Yang, DQ,Tang, SW,et al. Mining maximal correlated member clusters in high dimensional database. 2006-01-01. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论