• 论文 •    

基于回归的多层数据立方体中的异常发现算法

胡孔法,丁有伟,陈崚,宋爱波   

  1. 1.扬州大学 信息工程学院,江苏扬州225009;2.东南大学 计算机科学与工程学院,江苏南京210096
  • 出版日期:2009-12-15 发布日期:2009-12-25

Regression-based exceptions finding algorithm in multi-level data cube

HU Kong-fa, DING You-wei, CHEN Ling, SONG Ai-bo   

  1. 1.College of Information Engineering, Yangzhou University, Yangzhou 225009, China;2.School of Computer Science & Engineering, Southeast University, Nanjing 210096, China
  • Online:2009-12-15 Published:2009-12-25

摘要: 为了快速有效地挖掘数据立方体中的数据,提出了阈值异常和区间异常两种基于回归分析的异常发现方法,根据回归系数帮助用户快速地找出数据单元内的异常数据。阈值异常方法通过比较数据的规格化残差和用户给定的偏差阈值来发现异常数据。区间异常方法通过比较数据点的残差绝对值和置信区间来发现异常数据。最后,对这些算法的性能进行了分析,理论分析和实验结果验证了这两种算法的有效性。

关键词: 数据挖掘, 数据立方体, 回归分析, 异常发现, 阈值, 置信区间

Abstract: To mine data in data cube rapidly and effectively, two exception finding algorithms based on regression analysis were proposed,which used threshold and confidence interval respectively. By regression coefficient, users could find the exceptions in the data cells quickly. Data was considered as an exception by comparing its normal residual to the users specified threshold in the process of threshold exception, while comparing the absolute value of residual to the confidence interval in the process of interval exception. Performances of these two algorithms were analyzed, and their validity & efficiency were verified by experiment.

Key words: data mining, data cube, regression analysis, exception finding, threshold, confidence interval

中图分类号: