🪴 Anil's Garden

❯

❯

Weight Initialisation

Weight Initialisation

06 Jan 20254 min read

In Understanding the Difficulty of Training Neural Networks, Xavier Glorot and Yoshua Bengio investigate the effect of various different activation functions (like the sigmoid, hyperbolic tangent (tanh) and softsign on the saturation of weights in an 4-layer MLP toy network when trying to train for image classification

point out that the softmax + negative log-likelihood loss induces a more steep loss landscape than use of the quadratic (MSE) loss (which was apparently used for classification problems before this - they cite Rumelhart et al. 1986)
They show that the weights of Layer 4 all go to zero (there is no variance) pretty much immediately when training their toy network → this is a problem because the model is relying on the biases of the final layer to perform the classification and ignoring the features that come from the previous layer (Layer 3)

Fan-in and Fan-out

In deep learning, in the context of initialization I often encounter “fan_in” and “fan_out” parameters. I recently discovered this comes from electronic engineering (and something to do with gates).

Could you please explain what these parameters are in the context of initialising layers in a neural network, and explain why they are named as such - making reference to the origin of these terms from electronics

ChatGPT’s response:

In the context of initializing layers in a neural network, the terms “fan_in” and “fan_out” refer to the number of inputs and outputs, respectively, of a particular layer. These terms were indeed borrowed from electronic engineering, specifically from the field of digital logic gates.

In electronics, a gate is a fundamental building block of digital circuits that performs a specific logical operation. Gates have input lines and output lines, and the terms “fan-in” and “fan-out” are used to describe the number of inputs and outputs connected to a gate. The “fan-in” of a gate indicates the number of input lines connected to the gate, while the “fan-out” represents the number of output lines connected to the gate.

When these concepts were borrowed for use in deep learning, they were applied to describe the connectivity between layers in a neural network. In a neural network, each layer consists of multiple artificial neurons (also known as nodes or units) that take inputs and produce outputs. The “fan-in” of a layer refers to the number of connections from the previous layer to the neurons in the current layer. Conversely, the “fan-out” of a layer represents the number of connections from the neurons in the current layer to the next layer.

In the context of weight initialization in deep learning, the terms “fan-in” and “fan-out” are commonly used to determine the appropriate scale of random weights. Initializing the weights randomly is crucial for training neural networks effectively. The magnitude of the weights at initialization should be carefully chosen to ensure proper learning dynamics and prevent issues like vanishing or exploding gradients.

By considering the “fan-in” and “fan-out” of a layer, you can adjust the initialization scheme accordingly. For example, one popular initialization method is Xavier initialization (also known as Glorot initialization), which scales the randomly initialized weights based on the “fan-in” and “fan-out” values of the layer. This helps to maintain the variance of activations and gradients throughout the network, promoting stable and efficient training.

In summary, the terms “fan-in” and “fan-out” used in deep learning initialization come from electronic engineering and describe the number of inputs and outputs, respectively, of a layer in a neural network. These terms were borrowed due to the analogy between the connectivity in electronic circuitry (gates) and neural networks (layers of artificial neurons), and have since become standard terminology in the field.

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋