Softmax temperature
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Overview
Introduction
- Softmax normalizes all model outputs, so they sum up to 1 and can be interpreted as probabilities.
- Softmax temperature is a variant where all the outputs of the model are divided by a temperature parameter , where penalizes bigger logits more than the smaller logits
- Temperature controls smoothness of output distribution
- T < 1, distribution becomes rougher / harder.
- T = 1, distribution is the same as normal softmax.
- T > 1, distribution becomes smoother / softer.

# source: https://jdhao.github.io/2022/02/27/temperature_in_softmax/#fn1
import math
def softmax(vec, temperature):
"""
turn vec into normalized probability
"""
sum_exp = sum(math.exp(x/temperature) for x in vec)
return [math.exp(x/temperature)/sum_exp for x in vec]Why needed?
- control the entropy of a distribution
Questions
- how related is it to Neural Network Calibration?
Theoretical References
Papers
Articles
- DistilBERT uses Softmax temperature
- Softmax with Temperature Explained - jdhao’s digital space
- You Don’t Really Know Softmax - Sewade Ogun’s Website
- to read!