understanding layers in dl : input, hidden, and output layers

Sirine Amrane
5 min readFeb 2, 2025

--

a neural network is structured in successive layers that process data hierarchically. each layer consists of neurons that perform mathematical transformations on the input data. the more layers there are, the deeper the network is and the more capable it becomes of learning complex relationships.

there are 3 types of layers:

  1. the input layer
  2. the hidden layers (these layers exist only in neural networks, so in deep learning. in traditional machine learning models (decision trees, linear or logistic regression, svm…), there is no concept of hidden layers because there is no neural layers)
  3. the output layer

input layer

  • this is the first layer of the network
  • it does no computation.
  • it receives input data (features, images, text, etc.).
  • it simply passes the raw data to the first hidden layer.
  • example: if we have a black-and-white image of 28x28 pixels, the input layer will have 784 unique variables, so 784 neurons (one per pixel). for a color image, we have three channels (red, green, blue), so 3 × 28 × 28 = 2352 neurons. in the case of tabular data (like numbers in an excel table), each column corresponds to an input neuron.

hidden layers (relu)

  • these are the intermediate layers between the input and output.
  • this is where learning happens.
  • they extract complex features by transforming data through mathematical operations.
  • each hidden layer contains multiple neurons.
  • each neuron receives input values, transforms them via a weighted linear combination (weights + bias), applies an activation function (relu, sigmoid, tanh), and then transmits the result to the neurons of the next layer.
  • the more hidden layers there are, the deeper the network is (hence deep learning). each hidden layer learns increasingly abstract representations of the data.
  • the first layers capture low-level features.
  • the deeper layers capture more complex patterns.

formula :

z=X∗W+bz = X * W + bz=X∗W+b a=f(z)a = f(z)a=f(z)

  • X is the input vector
  • W is the weight vector
  • b is the bias
  • f(z) is the activation function

output layer (sigmoid, softmax)

this is the last layer of the network.
the number of neurons depends on the type of problem:

  • regression → 1 neuron (continuous output).
  • binary classification → 1 neuron with sigmoid activation.
  • multi-class classification → 1 neuron / class with softmax activation.

why does the number of neurons vary from one layer to another?

the choice of the number of neurons in each layer depends on several factors related to the network architecture. here’s an example:

a) feature extraction
the first hidden layer (555 neurons) is often designed to learn basic representations of the input data.
the second layer (777 neurons) can extract more abstract or complex representations by combining the outputs of the first layer.
in general, increasing the number of neurons in the initial hidden layers helps improve the network’s learning capacity.

b) dimensionality reduction and generalization
the output layer should have a number of neurons that match the desired output dimension (333 here, possibly for a classification problem with three classes or a three-dimensional output vector).
in the last hidden layer (777 neurons), feature combinations are refined to produce an optimal output.
progressively reducing the number of neurons helps prevent overfitting by forcing the network to retain only the most essential features.

c) heuristic approach
sometimes, the number of neurons is chosen empirically, starting with a high number and gradually reducing it to find a good balance between performance and complexity.

other quick example of a neural network structure:

  • input layer: 10 neurons (10 features)
  • hidden layers: 64 → 32 → 16 neurons (gradual reduction)
  • output layer: 3 neurons (for a 3-class classification with softmax)

why are there no neuron layers in machine learning?

in classical machine learning, there are no neuron layers like in deep learning. ml models, such as random forest, xgboost, svm, or linear regression, work with predefined mathematical algorithms and do not go through successive activation layers. they process data directly using statistical methods and decision rules, without the need for activation functions or backpropagation. unlike deep learning, where information flows through multiple layers of neurons, ml models operate without hierarchical transformations, making them computationally efficient but sometimes requiring more manual feature selection, which is why they are often used with feature engineering.

are there any layers at all in machine learning ?

yes, but they are not neuronal. some models, such as decision tree networks or ensemble learning methods, can be represented as hierarchical levels (for example, the trees in a random forest). however, these “layers” do not contain neurons activated by nonlinear functions.

do neural networks have activation functions in every layer?

yes, most layers in a neural network use an activation function, but not all.

a. layers where an activation function is used:

  • hidden layers → always have an activation function to introduce non-linearity (e.g., relu, tanh, sigmoid).
  • output layer → depends on the task type:
  • softmax (multi-class classification).
  • sigmoid (binary classification).
  • linear (regression, no activation).

b. layers without activation:

  • output layers for regression → no activation (linear function).
  • some transformation layers (e.g., normalization, dropout, attention layers) → no activation needed.

in short, each hidden layer generally has an activation function to help the network learn complex relationships, but the output layer may not always have one (ex, in regression).

conclusion

deep learning is all about stacking layers, each transforming data step by step to uncover patterns. the input layer is just a gateway, the hidden layers do the real work, and the output layer gives the final prediction. the more hidden layers, the deeper the network, and the more powerful (but also more complex) the model becomes

Sirine Amrane

--

--

Sirine Amrane
Sirine Amrane

No responses yet