activation functions, part 1 : softmax and sigmoid for classification output layers in dl

2 min readJan 31, 2025

In this first article, we will study the concept of logit and the activation functions softmax and sigmoid. these concepts are essential in deep learning for classification problems.

let’s start with the basics: what is a logit?

A logit is the foundation of the sigmoid and softmax functions. It is a raw, non-interpretable value because it is not normalized. It is produced by the model (obtained from a linear combination of the model’s features, weights, and biases) and will then be transformed into a comprehensible probability using activation functions like sigmoid or softmax.

what is the Softmax function?

The softmax activation function is used in multi-class classification. It converts logits into interpretable probabilities for EACH class. These probabilities indicate the likelihood (probability) that an observation belongs to a given class.

Example

Suppose our animal classification model gives the following raw scores for an image:

Logits = 2.0 for cat, 1.0 for dog, 0.1 for rabbit

a) Compute the exponentials :

exp⁡(2.0) ≈ 7.39
exp⁡(1.0) ≈ 2.72
exp⁡(0.1) ≈ 1.11

b) Apply the Softmax function (sum of exponentials) :

7.39+2.72+1.11=11.22

c) Final probabilities :

Cat : 7.39/11.22 ≈ 0.66 (66%)
Dog : 2.72/11.22 ≈ 0.24 (24%)
Rabbit : 1.11/11.22 ≈ 0.10 (10%).

-Result :

The image is most likely a cat (66%)

What is the Sigmoid function?

The sigmoid activation function is used in binary classification problems (e.g., yes/no, true/false). It transforms a logit into a probability between 0 and 1.

If the sigmoid output is close to 0, the model “thinks” the observation belongs to the negative class.
If the sigmoid output is close to 1, the model “thinks” the observation belongs to the positive class.
An output of 0.5 means total uncertainty (the model cannot decide between the two classes).

Example:

Suppose our model must predict whether an image represents a dog or not (yes = 1 for “dog”, no = 0 for “not a dog”).

The model produces the following raw score (logit):

Logit = 2.5

a) Compute the exponential:

e^(-2.5) ≈ 0.082

b) Apply the Sigmoid function :

σ(2.5) = 1/1+0.082 ≈ 1/1.082 ≈ 0.92

-Result:

The probability that the image is a dog is 0.92 or 92%.

Conclusion: comparison table

Sirine Amrane