From google:
the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.
Logit softcapping (introduced in Gemma 2) is a technique to cap the value between $-soft\_cap$ and $+soft\_cap$:
$$ logits ← soft\_cap \cdot tanh(\frac{logits}{soft\_cap}) $$