Skip to main content

Advertisement

Log in

Mitigating concept drift in data streams: an incremental decision tree approach

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

While recognizing the significance of data in machine learning, we focus on addressing the challenge of concept drift, particularly in dynamic data streams. We propose an innovative incremental decision tree algorithm tailored for learning regression trees and model trees from evolving data streams. Vital to ensuring the quality and accuracy of predictive models is addressing this challenge. In this context, we present a novel solution: an incremental decision tree algorithm tailored for learning regression trees and model trees from time-varying data streams. Our algorithm is designed to operate at high speeds, effectively accommodating the influx of data at any scale, including scenarios with potentially unlimited data. Key innovations of our approach include a probabilistic defined sampling strategy that enhances the learning process and an advanced automatic method capable of handling non-stationary data distributions. However, the primary innovation lies in our methodology for detecting concept drift. Unlike conventional methods that reactively respond to increased errors, we introduce a proactive approach: monitoring the quality of individual subtrees by tracking their error evolution. This method allows us to detect changes in the objective function promptly, leading to timely adaptations in the model structure. Through extensive experimentation and evaluation, we demonstrate the effectiveness of our proposed algorithm in terms of prediction accuracy, model size, and change detection capabilities. Representing a significant advancement in the field of machine learning, particularly in addressing the challenge of concept drift in data streams, the proposed algorithm offers a competitive alternative to existing flow classifiers. Showcasing superior performance in terms of precision, recall, Fisher measure, and scalability, it underscores its potential to enhance decision-making processes across various domains by adapting swiftly to changing data patterns and maintaining high accuracy. The algorithm’s innovative approach to incremental learning of decision rules, coupled with its adaptive extension for handling concept drift, holds promise for real-world applications where accurate and timely insights are paramount. Overall, the algorithm’s robustness, adaptability, and efficiency position it as a valuable asset in stream data classification and decision support systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.

References

  • Abbasi A et al (2021) ElStream: an ensemble learning approach for concept drift detection in dynamic social big data stream learning. IEEE Access 9:66408–66419. https://doi.org/10.1109/ACCESS.2021.3056182

    Article  Google Scholar 

  • Abu-Hashem M, Gutub A (2022) Efficient computation of Hash Hirschberg protein alignment utilizing hyper threading multi-core sharing technology. CAAI Trans Intell Technol 7(2):278–291. https://doi.org/10.1049/cit2.12070

    Article  Google Scholar 

  • Amin M et al (2023) Cyber security and beyond: detecting malware and concept drift in AI-based sensor data streams using statistical techniques. Comput Electr Eng 108:108702. https://doi.org/10.1016/j.compeleceng.2022.108702

    Article  Google Scholar 

  • Banar F, Tabatabaei A, Saleh M (2023) Stream data classification with Hoeffding tree: an ensemble learning approach. In: 2023 9th International conference on web research (ICWR). IEEE

  • Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. 249–260. https://doi.org/10.1007/978-3-642-03915-7_22

  • Breiman L, Friedman J, Olshen R, Stone C (2017) Classification and regression trees. Chapman and Hall/CRC, New York. https://doi.org/10.1201/9781315139470

    Book  Google Scholar 

  • Bryman A (1988) Quantity and quality in social research (1st ed.). Routledge, London. https://doi.org/10.4324/9780203410028

    Book  Google Scholar 

  • Cohen L, Avrahami G, Last M, Kandel A (2008) Info-fuzzy algorithms for mining dynamic data streams. Appl Soft Comput 8(4):1283–1294. https://doi.org/10.1016/j.asoc.2007.11.006

    Article  Google Scholar 

  • Das M, Pratama M, Savitri S, Zhang J (2019) Muse-rnn: a multilayer self-evolving recurrent neural network for data stream classification. In: 2019 IEEE international conference on data mining (ICDM). IEEE. https://doi.org/10.1109/ICDM.2019.00028

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80

  • Fischer L, Wollstadt P (2023) Precision and recall reject curves for classification. arXiv preprint arXiv:2308.08381

  • Friedman M (1939) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 34:109

    Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:86–92

    Article  MathSciNet  Google Scholar 

  • Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67

    MathSciNet  Google Scholar 

  • Gama J, Castillo G (2004) Learning with local drift detection. In: International conference on advanced data mining and applications. Springer, pp 42–55. https://doi.org/10.1007/11941439_5

  • Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia H (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37. https://doi.org/10.1145/2523813

    Article  Google Scholar 

  • Goncalves PM Jr et al (2014) A comparative study on concept drift detectors. Expert Syst Appl 41(18):8144–8156. https://doi.org/10.1016/j.eswa.2014.06.022

    Article  Google Scholar 

  • Gutub A, Shambour MK, Abu-Hashem MA (2023) Coronavirus impact on human feelings during 2021 Hajj season via deep learning critical Twitter analysis. J Eng Res 11(1):100001

    Article  Google Scholar 

  • Hemalatha J, Sekar M, Kumar C, Gutub A, Sahu AK (2023) Towards improving the performance of blind image steganalyzer using third-order SPAM features and ensemble classifier. J Inf Secur Appl 76:103541

    Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30. https://doi.org/10.1080/01621459.1963.10500830

    Article  MathSciNet  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh acm sigkdd international conference on knowledge discovery and data mining. https://doi.org/10.1145/502512.502529

  • Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2008) Learning model trees from data streams. In: Proceedings of the 11th international conference on discovery science. Springer, pp 52–63

  • Ikonomovska E, Gama J, Sebastiao R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. In: Proceedings of the international conference on discovery science. Springer, pp 121–135

  • Karalic A (1992) Employing linear regression in regression tree leaves. In: Proceedings of the 10th European conference on artificial intelligence. Wiley, New York, pp 440–441

  • Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. Citeseer, pp 367–371

  • Kim S, Guy S, Hillesland K, Zafar B, Gutub AAA, Manocha D (2014) Velocity-based modeling of physical interactions in dense crowds. Vis Comput 31:541–555. https://doi.org/10.1007/s00371-014-0946-1

    Article  Google Scholar 

  • Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300. https://doi.org/10.3233/IDA-2004-8303

    Article  Google Scholar 

  • Kumar A, Kaur P, Sharma P (2015) A survey on Hoeffding tree stream data classification algorithms. CPUH-Res J 1(2):28–32

    Google Scholar 

  • Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: 30th annual symposium on foundations of computer science, pp 256–261

  • Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1):1–12

    Article  MathSciNet  Google Scholar 

  • Mimran O (2013) Data stream mining with multiple sliding windows for continuous prediction. Ben-Gurion University of the Negev, Israel

    Google Scholar 

  • Ouyang Z, Zhou M, Wang T, Wu Q (2009) Mining concept-drifting and noisy data streams using ensemble classifiers. In: 2009 International conference on artificial intelligence and computational intelligence, vol 4. IEEE

  • Pasquadibisceglie V et al (2023) DARWIN: an online deep learning approach to handle concept drifts in predictive process monitoring. Eng Appl Artif Intell 123:106461. https://doi.org/10.1016/j.engappai.2021.106461

    Article  Google Scholar 

  • Pfahringer B, Holmes G, Kirkby R (2008) Handling numeric attributes in Hoeffding trees. In: Proceedings of the 12th Pacific-Asian conference on knowledge discovery and data mining. Springer, pp 296–307

  • Potts D, Sammut C (2005) Incremental learning of linear model trees. Mach Learn 61:5–48. https://doi.org/10.1007/s10994-005-1121-8

    Article  Google Scholar 

  • Pratama M, Anavatti SG, Er M, Lughofer E (2014) pClass: an effective classifier for streaming examples. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2014.2312983

    Article  Google Scholar 

  • Pratama M, Pedrycz W, Webb GI (2019a) An incremental construction of deep neuro-fuzzy system for continual learning of nonstationary data streams. IEEE Trans Fuzzy Syst 28(7):1315–1328. https://doi.org/10.1109/TFUZZ.2019.2903985

    Article  Google Scholar 

  • Pratama M, Za’in C, Ashfahani A, Ong YS, Ding W (2019b) Automatic construction of multi-layer perceptron network from streaming examples. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1171–1180

  • Quinlan JR (1992) Learning with Continuous Classes. In: Proceedings of australian joint conference on artificial intelligence, Hobart 16-18 november 1992, pp 343–348

  • Salzberg SL (1993) C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc, California

    Google Scholar 

  • Suárez-Cetrulo A, Quintana D, Cervantes A (2022) A survey on machine learning for recurring concept drifting data streams. Expert Syst Appl 213:118934. https://doi.org/10.1016/j.eswa.2022.118934

    Article  Google Scholar 

  • Svoboda R et al (2023) A natural gas consumption forecasting system for continual learning scenarios based on Hoeffding trees with change point detection mechanism. arXiv preprint. arXiv:2309.03720

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 106(2):58

    Google Scholar 

  • Weinberg AI, Last M (2023) EnHAT—synergy of a tree-based ensemble with Hoeffding adaptive tree for dynamic data streams mining. Inf Fus 89:397–404. https://doi.org/10.1016/j.inffus.2023.04.019

    Article  Google Scholar 

  • Wickham H (2011) ASA 2009 data expo. J Comput Graph Stat 20:281–283. https://doi.org/10.2307/23110483

    Article  MathSciNet  Google Scholar 

  • Wu Y et al (2023) AEWAE: an efficient ensemble framework for concept drift adaptation in IoT data stream. arXiv preprint arXiv:2305.06638

  • Xu Y, Xu R, Yan W, Ardis P (2017) Concept drift learning with alternating learners. In: 2017 International joint conference on neural networks (IJCNN). IEEE, pp 2104–2111

  • Zenisek J, Holzinger F, Affenzeller M (2019) Machine learning based concept drift detection for predictive maintenance. Comput Ind Eng 137:106031. https://doi.org/10.1016/j.cie.2019.07.005

    Article  Google Scholar 

  • Zliobaite I (2010) Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H. Tarazodar, K. Bagherifard, and S. Nejatian. The first draft of the manuscript was written by H. Tarazodar, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Karamollah Bagherifard.

Ethics declarations

Conflict of interests

The authors declare they have no relevant financial interests.

Non-financial interests

The authors have no relevant non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tarazodar, H., Bagherifard, K., Nejatian, S. et al. Mitigating concept drift in data streams: an incremental decision tree approach. Soft Comput (2024). https://doi.org/10.1007/s00500-024-09921-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00500-024-09921-7

Keywords