• Article •    

Parallel frequent pattern growth algorithm optimization in cloud manufacturing environment

WANG Jie, DAI Qing-hao , ZENG Yu,YANG Dong-ri   

  1. 1.School of Management, Capital Normal University, Beijing 100089, China;2.Beijing Computing Center, Beijing 100094, China;3 College of Computing and Communication Engineering,Graduate University of the Chinese Academy of Sciences,Beijing 100049,China
  • Online:2012-09-15 Published:2012-09-25

云制造环境下并行频繁模式增长算法优化

王洁戴清灏曾宇杨东日   

  1. 1.首都师范大学 管理学院,北京100089;2.北京市计算中心,北京100094;3 中国科学院研究生院 计算与通信工程学院,北京100049

Abstract: Aiming at the massive data mining task in cloud manufacturing environment, the realization of existing parallel frequent pattern growth algorithm and its disadvantages were analyzed. By using key value store system, its counting and grouping parts were optimized. Based on simple, auto-increment and orderly manner of key value store system, the information of counting and grouping was stored on key value database. Through reducing the read-write of Distributed File System (DFS) and parallel executing the process of counting and grouping, the network and memory cost of storage node was decreased by optimization algorithm. On real datasets, the performance and file system I/O cost of algorithms before and after optimization were compared by experiments.

Key words: cloud manufacturing, parallel frequent pattern growth algorithm, key-value storage system, data mining, algorithm optimization

摘要: 针对云制造环境下的海量数据挖掘,分析了现有并行频繁模式增长算法的实现和不足。研究了利用键值存储系统对其中的计数和分组部分进行优化。利用键值型数据库存储简单、自动增长且有序的方式,将计数和分组的信息存储在了键值型数据库上。通过减少对分布式文件系统的读写,并将计数过程和排序过程并行化执行,优化后的算法减小了存储节点的网络及内存开销。在真实数据集上,通过实验对比了优化前后算法的性能以及对于文件系统I/O的开销。

关键词: 云制造, 并行频繁模式增长算法, 键值存储系统, 数据挖掘, 算法优化

CLC Number: