Context-based abrupt change detection and adaptation for categorical data streams

From National Research Council Canada

DOI	Resolve DOI: https://doi.org/10.1007/978-3-319-67786-6_1
Author	Search for: D’ettorre, Sarah; Search for: Viktor, Herna L.; Search for: Paquet, Eric¹
Affiliation	National Research Council of Canada. Digital Technologies
Format	Text, Book Chapter
Subject	data streams; categorical data; concept drift; context-based change detection; unsupervised learning; ensembles; online learning
Abstract	The identification of changes in data distributions associated with data streams is critical in understanding the mechanics of data generating processes and ensuring that data models remain representative through time. To this end, concept drift detection methods often utilize statistical techniques that take numerical data as input. However, many applications produce data streams containing categorical attributes, where numerical statistical methods are not applicable. In this setting, common solutions use error monitoring, assuming that fluctuations in the error measures of a learning system correspond to concept drift. Context-based concept drift detection techniques for categorical streams, which observe changes in the actual data distribution, have received limited attention. Such context-based change detection is arguably more informative as it is data-driven and directly applicable in an unsupervised setting. This paper introduces a novel context-based algorithm for categorical data, namely FG-CDCStream. In this unsupervised method, multiple drift detection tracks are maintained and their votes are combined in order to determine whether a real change has occurred. In this way, change detections are rapid and accurate, while the number of false alarms remains low. Our experimental evaluation against synthetic data streams shows that FG-CDCStream outperforms the state-of-the art. Our analysis further indicates that FG-CDCStream produces highly accurate and representative post-change models.
Publication date	2017-09-16
Publisher	Springer
In	Discovery Science (16 September 2017): 3–17.
Series	Lecture Notes in Computer Science, no. 10558.
Language	English
Peer reviewed	Yes
NPARC number	23002678
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	726339b4-e95e-4473-9f1e-fb8671ed4b0e
Record created	2017-12-20
Record modified	2020-06-18

Date modified:: 2025-04-04