Logit soft-capping

Logit soft-capping

Logit softcapping (introduced in Gemma 2) is a technique to cap the value between $-soft\_cap$ and $+soft\_cap$:

$$ logits ← soft\_cap \cdot tanh(\frac{logits}{soft\_cap}) $$

30

From: Methods of Improving LLM Training Stability