KL Divergence
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Overview
Related fields
Introduction

- Kullback-Leibler divergence metric (aka relative entropy)
- Statistical measurement from information theory
- ⇒ Quantify difference between 2 probability distributions
- It is an asymmetric divergence metric!
- asymmetric ⇒ given P(A) and P(B), divergence between A and B is different from between B and A
- Can be used to measure distance between features
Usecases
- Monitoring data drift
- applied to data in discrete form by forming data bins
- data points are binned according to the features to form discrete distributions
- divergence scores for each bin are summed up to get a final picture
- Loss function
- Variational Auto-Encoder (VAE) Optimisation
Limitations
- can not be used as strictly a distance measure since “distance” between two entities should remain the same from either perspective
- if the data samples are pulled from distributions that use different parameters (mean and variance), KL divergence will not yield reliable results
- KL is very sensitive to values that are very small in q(x) (say 0.0001% of values) and slightly less small in p(x) (say 0.01% of values).
- estimates of p(x) and q(x) are probably noisy, and they represent rare data which may not even be relevant to your model’s performance
Alternatives
- D_1 Distance by Google
- Earth mover’s distance
Questions
- is KL divergence a similarity / distance metric? ✅ 2023-12-29
- yes, but with caveat (see above Limitations)
Theoretical References
Papers
Articles
- Intuitive Guide to Understanding KL Divergence | by Thushan Ganegedara | Towards Data Science
- KL Divergence in Machine Learning | Encord
- Gantry.io | You’re probably monitoring your models wrong