KL Divergence


Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge


Overview

Introduction

  • Kullback-Leibler divergence metric (aka relative entropy)
  • Statistical measurement from information theory
  • Quantify difference between 2 probability distributions
  • It is an asymmetric divergence metric!
    • asymmetric given P(A) and P(B), divergence between A and B is different from between B and A
  • Can be used to measure distance between features

Usecases

  • Monitoring data drift
    • applied to data in discrete form by forming data bins
    • data points are binned according to the features to form discrete distributions
    • divergence scores for each bin are summed up to get a final picture
  • Loss function
  • Variational Auto-Encoder (VAE) Optimisation

Limitations

  • can not be used as strictly a distance measure since “distance” between two entities should remain the same from either perspective
  • if the data samples are pulled from distributions that use different parameters (mean and variance), KL divergence will not yield reliable results
  • KL is very sensitive to values that are very small in q(x) (say 0.0001% of values) and slightly less small in p(x) (say 0.01% of values).
    • estimates of p(x) and q(x) are probably noisy, and they represent rare data which may not even be relevant to your model’s performance

Alternatives

Questions

  • is KL divergence a similarity / distance metric? ✅ 2023-12-29
    • yes, but with caveat (see above Limitations)

Theoretical References

Papers

Articles

Courses


Code References

Methods

Tools, Frameworks