Outlier Detection vs Data Drift Detection vs Concept Drift
Created: 24 Apr 2023, 11:27 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge, GeneralDL
Or Out Of Distribution Detection vs Data Drift
Data drift detectionhelps define when the overall distributions of the input data changed. We design this test to be robust to outliers so that it alerts only to the meaningful shifts. We would typically react to drift by retraining or updating the model.
Outlier detectionhelps detect individual unusual data inputs. We design this test to be sensitive enough to catch a single deviating input. We would typically react to outliers by applying some business logic or manually processing this individual object to take a decision.
Then what is concept drift?
The definitions differ because of the different research, textbook, and production environments people work with. For example, “concept drift” is used as an umbrella term inonline learning. However, batch learning papers refer to the same thing as “dataset drift” (e.g.,hereandhere).
The Difference Between Data Drift And Real Concept Drift
- In (Real) concept drift, the decision boundary P(Y|X) changes while, in the case of data drift (or virtual drift), the boundary remains the same even though P(X) has changed.

- Another difference is that in data drift, the cause is somewhat internal to the process of collecting and processing data and training our model on it. In the case of concept drift, the reason is usually an external event.
- With data drift only the features are affected, while with concept drift, either the labels or the features or both are affected.
From <https://deepchecks.com/data-drift-vs-concept-drift-what-are-the-main-differences/>
In data drift, the input has changed. The trained model is no longer relevant on the new data.
Inconcept driftthe data distribution hasn’t changed. Rather, the interaction between inputs and outputs is different than before. ֵIt practically means that what we are trying to predict has changed. A classic example is spam detection: over time, spammers try new tactics, so the spam filters need to be retrained to react to these new patterns.
For a deep dive on how to build a drift-aware production ML system, check outthis blog.
For a super-simple explanation of these concepts, check outMeor Amer’sbrilliant illustration of this idea:

From <https://www.iguazio.com/questions/what-is-the-difference-between-data-drift-and-concept-drift/>

