Abstract
While recognizing the significance of data in machine learning, we focus on addressing the challenge of concept drift, particularly in dynamic data streams. We propose an innovative incremental decision tree algorithm tailored for learning regression trees and model trees from evolving data streams. Vital to ensuring the quality and accuracy of predictive models is addressing this challenge. In this context, we present a novel solution: an incremental decision tree algorithm tailored for learning regression trees and model trees from time-varying data streams. Our algorithm is designed to operate at high speeds, effectively accommodating the influx of data at any scale, including scenarios with potentially unlimited data. Key innovations of our approach include a probabilistic defined sampling strategy that enhances the learning process and an advanced automatic method capable of handling non-stationary data distributions. However, the primary innovation lies in our methodology for detecting concept drift. Unlike conventional methods that reactively respond to increased errors, we introduce a proactive approach: monitoring the quality of individual subtrees by tracking their error evolution. This method allows us to detect changes in the objective function promptly, leading to timely adaptations in the model structure. Through extensive experimentation and evaluation, we demonstrate the effectiveness of our proposed algorithm in terms of prediction accuracy, model size, and change detection capabilities. Representing a significant advancement in the field of machine learning, particularly in addressing the challenge of concept drift in data streams, the proposed algorithm offers a competitive alternative to existing flow classifiers. Showcasing superior performance in terms of precision, recall, Fisher measure, and scalability, it underscores its potential to enhance decision-making processes across various domains by adapting swiftly to changing data patterns and maintaining high accuracy. The algorithm’s innovative approach to incremental learning of decision rules, coupled with its adaptive extension for handling concept drift, holds promise for real-world applications where accurate and timely insights are paramount. Overall, the algorithm’s robustness, adaptability, and efficiency position it as a valuable asset in stream data classification and decision support systems.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The datasets generated during and/or analyzed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.
References
Abbasi A et al (2021) ElStream: an ensemble learning approach for concept drift detection in dynamic social big data stream learning. IEEE Access 9:66408–66419. https://doi.org/10.1109/ACCESS.2021.3056182
Abu-Hashem M, Gutub A (2022) Efficient computation of Hash Hirschberg protein alignment utilizing hyper threading multi-core sharing technology. CAAI Trans Intell Technol 7(2):278–291. https://doi.org/10.1049/cit2.12070
Amin M et al (2023) Cyber security and beyond: detecting malware and concept drift in AI-based sensor data streams using statistical techniques. Comput Electr Eng 108:108702. https://doi.org/10.1016/j.compeleceng.2022.108702
Banar F, Tabatabaei A, Saleh M (2023) Stream data classification with Hoeffding tree: an ensemble learning approach. In: 2023 9th International conference on web research (ICWR). IEEE
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. 249–260. https://doi.org/10.1007/978-3-642-03915-7_22
Breiman L, Friedman J, Olshen R, Stone C (2017) Classification and regression trees. Chapman and Hall/CRC, New York. https://doi.org/10.1201/9781315139470
Bryman A (1988) Quantity and quality in social research (1st ed.). Routledge, London. https://doi.org/10.4324/9780203410028
Cohen L, Avrahami G, Last M, Kandel A (2008) Info-fuzzy algorithms for mining dynamic data streams. Appl Soft Comput 8(4):1283–1294. https://doi.org/10.1016/j.asoc.2007.11.006
Das M, Pratama M, Savitri S, Zhang J (2019) Muse-rnn: a multilayer self-evolving recurrent neural network for data stream classification. In: 2019 IEEE international conference on data mining (ICDM). IEEE. https://doi.org/10.1109/ICDM.2019.00028
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Fischer L, Wollstadt P (2023) Precision and recall reject curves for classification. arXiv preprint arXiv:2308.08381
Friedman M (1939) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 34:109
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:86–92
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67
Gama J, Castillo G (2004) Learning with local drift detection. In: International conference on advanced data mining and applications. Springer, pp 42–55. https://doi.org/10.1007/11941439_5
Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia H (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37. https://doi.org/10.1145/2523813
Goncalves PM Jr et al (2014) A comparative study on concept drift detectors. Expert Syst Appl 41(18):8144–8156. https://doi.org/10.1016/j.eswa.2014.06.022
Gutub A, Shambour MK, Abu-Hashem MA (2023) Coronavirus impact on human feelings during 2021 Hajj season via deep learning critical Twitter analysis. J Eng Res 11(1):100001
Hemalatha J, Sekar M, Kumar C, Gutub A, Sahu AK (2023) Towards improving the performance of blind image steganalyzer using third-order SPAM features and ensemble classifier. J Inf Secur Appl 76:103541
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30. https://doi.org/10.1080/01621459.1963.10500830
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh acm sigkdd international conference on knowledge discovery and data mining. https://doi.org/10.1145/502512.502529
Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2008) Learning model trees from data streams. In: Proceedings of the 11th international conference on discovery science. Springer, pp 52–63
Ikonomovska E, Gama J, Sebastiao R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. In: Proceedings of the international conference on discovery science. Springer, pp 121–135
Karalic A (1992) Employing linear regression in regression tree leaves. In: Proceedings of the 10th European conference on artificial intelligence. Wiley, New York, pp 440–441
Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. Citeseer, pp 367–371
Kim S, Guy S, Hillesland K, Zafar B, Gutub AAA, Manocha D (2014) Velocity-based modeling of physical interactions in dense crowds. Vis Comput 31:541–555. https://doi.org/10.1007/s00371-014-0946-1
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300. https://doi.org/10.3233/IDA-2004-8303
Kumar A, Kaur P, Sharma P (2015) A survey on Hoeffding tree stream data classification algorithms. CPUH-Res J 1(2):28–32
Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: 30th annual symposium on foundations of computer science, pp 256–261
Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1):1–12
Mimran O (2013) Data stream mining with multiple sliding windows for continuous prediction. Ben-Gurion University of the Negev, Israel
Ouyang Z, Zhou M, Wang T, Wu Q (2009) Mining concept-drifting and noisy data streams using ensemble classifiers. In: 2009 International conference on artificial intelligence and computational intelligence, vol 4. IEEE
Pasquadibisceglie V et al (2023) DARWIN: an online deep learning approach to handle concept drifts in predictive process monitoring. Eng Appl Artif Intell 123:106461. https://doi.org/10.1016/j.engappai.2021.106461
Pfahringer B, Holmes G, Kirkby R (2008) Handling numeric attributes in Hoeffding trees. In: Proceedings of the 12th Pacific-Asian conference on knowledge discovery and data mining. Springer, pp 296–307
Potts D, Sammut C (2005) Incremental learning of linear model trees. Mach Learn 61:5–48. https://doi.org/10.1007/s10994-005-1121-8
Pratama M, Anavatti SG, Er M, Lughofer E (2014) pClass: an effective classifier for streaming examples. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2014.2312983
Pratama M, Pedrycz W, Webb GI (2019a) An incremental construction of deep neuro-fuzzy system for continual learning of nonstationary data streams. IEEE Trans Fuzzy Syst 28(7):1315–1328. https://doi.org/10.1109/TFUZZ.2019.2903985
Pratama M, Za’in C, Ashfahani A, Ong YS, Ding W (2019b) Automatic construction of multi-layer perceptron network from streaming examples. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1171–1180
Quinlan JR (1992) Learning with Continuous Classes. In: Proceedings of australian joint conference on artificial intelligence, Hobart 16-18 november 1992, pp 343–348
Salzberg SL (1993) C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc, California
Suárez-Cetrulo A, Quintana D, Cervantes A (2022) A survey on machine learning for recurring concept drifting data streams. Expert Syst Appl 213:118934. https://doi.org/10.1016/j.eswa.2022.118934
Svoboda R et al (2023) A natural gas consumption forecasting system for continual learning scenarios based on Hoeffding trees with change point detection mechanism. arXiv preprint. arXiv:2309.03720
Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 106(2):58
Weinberg AI, Last M (2023) EnHAT—synergy of a tree-based ensemble with Hoeffding adaptive tree for dynamic data streams mining. Inf Fus 89:397–404. https://doi.org/10.1016/j.inffus.2023.04.019
Wickham H (2011) ASA 2009 data expo. J Comput Graph Stat 20:281–283. https://doi.org/10.2307/23110483
Wu Y et al (2023) AEWAE: an efficient ensemble framework for concept drift adaptation in IoT data stream. arXiv preprint arXiv:2305.06638
Xu Y, Xu R, Yan W, Ardis P (2017) Concept drift learning with alternating learners. In: 2017 International joint conference on neural networks (IJCNN). IEEE, pp 2104–2111
Zenisek J, Holzinger F, Affenzeller M (2019) Machine learning based concept drift detection for predictive maintenance. Comput Ind Eng 137:106031. https://doi.org/10.1016/j.cie.2019.07.005
Zliobaite I (2010) Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H. Tarazodar, K. Bagherifard, and S. Nejatian. The first draft of the manuscript was written by H. Tarazodar, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare they have no relevant financial interests.
Non-financial interests
The authors have no relevant non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tarazodar, H., Bagherifard, K., Nejatian, S. et al. Mitigating concept drift in data streams: an incremental decision tree approach. Soft Comput (2024). https://doi.org/10.1007/s00500-024-09921-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s00500-024-09921-7