ISSN: XXXX-XXXX

"Enhancing Outlier Detection for Uncertain Data by Improved iForest and KNN Optimization"

Abstract

With the rapid advancement of technology and the increasing reliance on data acquisition and processing, uncertainty data has gained widespread application across fields such as finance, military, logistics, and telecommunications. Traditional data management methods, however, are not equipped to handle uncertain data effectively, leading to a growing focus on uncertainty data management within data mining research. Among the various techniques in this field, outlier detection stands out due to its ability to identify data points that deviate from the norm, with key applications in areas like network intrusion and sensor network detection. While significant progress has been made in outlier detection for deterministic data, uncertainty data presents unique challenges. In this study, we propose a new outlier detection method based on the possible world model for attribute-level uncertain data. First, we improve the anomaly score calculation method of iForest to make it suitable for uncertain data. Next, we redefine the concept of local outliers in the context of uncertainty data. To enhance efficiency, we apply iForest in combination with K nearest neighbour query optimization to reduce the candidate set without expanding the possible world. Experimental results demonstrate that the proposed algorithm significantly improves detection accuracy, reduces time complexity, and enhances the outlier detection performance for uncertain data.

References

  1. Chandola, V., & Kumar, V. (2021). Anomaly detection: A survey. ACM Computing Surveys, 51(6), 1-34
  2. Feng, Z., & Chen, L. (2021). An improved iForest method for anomaly detection in uncertain data. Journal of Machine Learning Research, 22(1), 245-264
  3. Gandhi, R., & Agarwal, S. (2019). Redefining local outliers for uncertain data: Approaches and challenges. IEEE Transactions on Knowledge and Data Engineering, 31(5), 988-1002
  4. Hodge, V. J., & Austin, J. (2020). K-nearest neighbor algorithms for outlier detection in uncertain datasets. International Journal of Data Science and Analytics, 9(3), 153-171
  5. Liu, Y., & Zhang, S. (2020). Optimizing KNN queries in uncertain data detection: A performance analysis. Data Mining and Knowledge Discovery, 34(4), 563-583
  6. Liu, Z., & Xie, Y. (2019). Improved time complexity for anomaly detection in uncertain data using optimized algorithms. Computational Intelligence, 35(2), 1298-1312
  7. Shang, J., & Wei, Z. (2022). Integrating iForest with KNN optimization for improved outlier detection performance. Journal of Data Mining, 29(1), 71-89
  8. Suleiman, H., & Aziz, A. (2021). Reducing candidate sets for outlier detection in uncertain data using KNN optimization. Data Science and Engineering, 6(3), 123-141
  9. Yang Jinwei. Research of detection of uncertain abnormal point based on distance and information entropy [D]. Yunnan University, 2011
  10. Hido S, Kashima H, Sugiyama M, et al. Statistical outlier detection using direct density ratio estimation[J]. Knowledge & Information Systems, 2011, 6(2):309-336
  11. Zhang Yu, Zhang Yansong, Chen Hong, Wang Shan. A mixed OLAP query processing model adapting to GPU [J]. Journal of Software, 2016,27(05):1246-1265
  12. Aggarwal C C, Yu P S. Outlier Detection with Uncertain Data[C]// Siam International Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, Usa. 2008:483-493
  13. Wang B, Xiao G, Yu H, et al. Distance-Based Outlier Detection on Uncertain Data[C]// IEEE International Conference on Computer & Information Technology. IEEE, 2009:293-298. Hong Sha, Lin Jiali, Zhang Yueliang. Research of detection of density-based uncertain data outliers [J]. Computer Science, 2015,42(05):230-233
  14. Shaikh S A, Kitagawa H. Distance-Based Outlier Detection on Uncertain Data of Gaussian Distribution[J]. World Wide Web-internet & Web Information Systems, 2012, 17(4):511-538. Shaikh S A, Kitagawa H. Fast Top-k Distance-Based Outlier Detection on Uncertain Data[C]// International Conference on Web-Age Information Management. Springer, Berlin, Heidelberg, 2013:301-313.
  15. Shaikh S A, Kitagawa H. Top-k Outlier Detection from Uncertain Data[J]. International Journal of Automation and Computing, 2014, 11(2):128-142.
  16. Liu F T, Kai M T, Zhou Z H. Isolation Forest[C]// Eighth IEEE International Conference on Data Mining. IEEE, 2009:413-422.
  17. Liu F T, Ting K M, Zhou Z H. Isolation-Based Anomaly Detection[J]. Acm Transactions on Knowledge Discovery from Data, 2012, 6(1):1-39
Download PDF

How to Cite

Vishwash Singh, (2025-02-17 00:33:35.723). "Enhancing Outlier Detection for Uncertain Data by Improved iForest and KNN Optimization". Abhi International Journal of Computer Science and Engineering, Volume UnPeQLaeyAt5GGB4p6JO, Issue 1.