# How do you choose an activation function

### Transfer functions

The formal static neuron represents a form of technical modeling of biological neurons and forms the basis of a large number of artificial neural networks. The complex biological structure has been greatly simplified. Since the formal static neuron is principally based on biological structures, its components have biological equivalents. The presentation of an input

The activation function z

The activation function z

The scalar product activation of a neuron corresponds to the weighted sum of the inputs.

For the interpretation of the scalar product activation it must be taken into account that an equation of the form

a (hyper) plane is defined by the coordinate origin. That is, the activation z

In order to enable the use of hyperplanes in a general situation, each neuron i must also have a threshold value

Using a neural network with two layers of trainable weights, n input neurons, one output neuron and a restriction of the inputs to the interval [0, 1], the input space can be represented as an n-dimensional cube. This space is then separated by (n-1) -dimensional hyperplanes, which are determined by the hidden neurons. The output neuron in turn defines a hyperplane in the space formed by the hidden neurons, which enables various sub-areas of the input space to be linked.

The following figure shows the hyperplanes or straight lines in the input space of a corresponding network that can handle the logical AND operation.

In combination with the scalar product activation, output functions should be used that take the sign of the activation into account, since otherwise it is not possible to differentiate between the subspaces separated by the hyperplanes.

Activation functions based on distance measures represent an alternative to scalar product activation. The weight vectors of the neurons represent data points in the input space. A neuron i is activated based on the distance of its weight vector

Various dimensions can be used to determine the distance. A selection of these is presented below.

The activation of the neurons takes place exclusively due to the spatial distance of their weight vectors from the input. The direction of the difference vector

The activation of the neurons takes place on the basis of the maximum absolute amount of the components of the difference vector

The activation of the neurons corresponds to the minimum absolute amount of the components of the difference vector

The Manhattan distance results in an activation of the neurons based on the sum of the absolute amounts of the components of the difference vector

The output function defines the output y

The identity function has no threshold values and results in an output y

This function is the classic all-or-nothing threshold function. It is similar to the ramp function, whereby the function changes its function value abruptly if the activation rises above a threshold value T.

The ramp function combines the step function with a linear output function. As long as the activation exceeds the threshold T

This sigmoid function can be configured with the help of two parameters. T

The Gaussian function reaches its maximum function value for an activation of zero. It is an even function (f (-x) = f (x)). As the absolute amount of activation increases, its functional value falls. This drop can be adjusted using the parameter to be controlled; higher values for result in a delay in the reduction of the function values starting from the maximum.

__x__corresponds to the transmission of information at the synapses. The weight vector__w__symbolizes the dendrites of a biological neuron. The cell body (soma) of a neuron i is activated by the activation function z_{i}= f (__x__,__w___{i}) and the information processing in the axon hillock as well as the transmission along the axon by the output function y_{i}= f (e.g._{i}). Based on biological neurons, the output can be interpreted as the mean spike frequency. A formal static neuron can thus be represented schematically as follows.The activation function z

_{i}= f (__x__,__w___{i}) and the output function y_{i}= f (e.g._{i}) are referred to as transfer functions. Various transfer functions are presented in detail below.### 1. Activation functions

The activation function z

_{i}= f (__x__,__w___{i}) links the weights__w___{i}of a neuron i with the input__x__and from this determines the activation or state of the neuron. The activation functions can be divided into two basic groups: the scalar product activation (see section 1.1) and the activation based on distance measures (see section 1.2). The functioning of the activation functions of both groups is fundamentally different.**1.1 Dot product activation ("dot product")**The scalar product activation of a neuron corresponds to the weighted sum of the inputs.

For the interpretation of the scalar product activation it must be taken into account that an equation of the form

a (hyper) plane is defined by the coordinate origin. That is, the activation z

_{i}= f (__x__,__w___{i}) receives the value zero if an input is on a hyperplane determined by the weights. Activation increases or decreases with increasing distance from the plane. The sign of the activation indicates on which side of the hyperplane the input is located__x__which means that the input room can be divided into two areas.In order to enable the use of hyperplanes in a general situation, each neuron i must also have a threshold value

_{i}receive. This results in the following description of the hyperplanes.Using a neural network with two layers of trainable weights, n input neurons, one output neuron and a restriction of the inputs to the interval [0, 1], the input space can be represented as an n-dimensional cube. This space is then separated by (n-1) -dimensional hyperplanes, which are determined by the hidden neurons. The output neuron in turn defines a hyperplane in the space formed by the hidden neurons, which enables various sub-areas of the input space to be linked.

The following figure shows the hyperplanes or straight lines in the input space of a corresponding network that can handle the logical AND operation.

In combination with the scalar product activation, output functions should be used that take the sign of the activation into account, since otherwise it is not possible to differentiate between the subspaces separated by the hyperplanes.

**1.2 Activation functions based on distance measurements**Activation functions based on distance measures represent an alternative to scalar product activation. The weight vectors of the neurons represent data points in the input space. A neuron i is activated based on the distance of its weight vector

__w___{i}to input__x__.Various dimensions can be used to determine the distance. A selection of these is presented below.

**Euclidian distance ("euclidian distance")**The activation of the neurons takes place exclusively due to the spatial distance of their weight vectors from the input. The direction of the difference vector

__x__-__w___{i}is insignificant here.**Mahalanobis distance ("mahalanobis distance")**The Mahalanobis distance is a statistically corrected Euclidean distance of the weight vector of a neuron i to the input__x__based on an estimation model of the data distribution represented by this neuron. The estimation model is given by the covariance matrix__C.___{i}described.**Maximum activation ("max distance")**The activation of the neurons takes place on the basis of the maximum absolute amount of the components of the difference vector

__x__-__w___{i}.**Minimum activation ("min distance")**The activation of the neurons corresponds to the minimum absolute amount of the components of the difference vector

__x__-__w___{i}.**Manhattan distance or amount sum norm ("manhattan distance")**The Manhattan distance results in an activation of the neurons based on the sum of the absolute amounts of the components of the difference vector

__x__-__w___{i}.### 2. Output functions

The output function defines the output y

_{i}of a neuron i depending on its activation z_{i}(__x__,__w___{i}). In general, monotonically increasing functions are used for this. Based on biological neurons, this results in a growing willingness to spike (in the form of a higher mean spike frequency) with increasing activation. With the help of an upper limit value for the output of a neuron, the refractory periods of a biological neuron can be simulated. A selection of possible output functions is presented below.**Identity function ("linear")**The identity function has no threshold values and results in an output y

_{i}that with the activation z_{i}(__x__,__w___{i}) is identical.**Step function ("step")**This function is the classic all-or-nothing threshold function. It is similar to the ramp function, whereby the function changes its function value abruptly if the activation rises above a threshold value T.

**Ramp function ("ramp")**The ramp function combines the step function with a linear output function. As long as the activation exceeds the threshold T

_{1}falls below, a neuron receives the output y_{i}= 0; However, if the activation exceeds the threshold value T._{2}, the output is y_{i}= 1. For activations in the interval between the two threshold values (T_{1}≤z_{i}(__x__,__w___{i}) ≤T_{2}) there is a linear interpolation of the output.**Fermi function ("sigmoidal")**This sigmoid function can be configured with the help of two parameters. T

_{i}("shift") gives a shift of the function by the amount -T_{i}at; the parameter c ("steepness") is a measure of its steepness. Assuming that c has a value greater than zero, the function is strictly monotonically growing, continuous and differentiable in the entire domain of definition. This is why it is of particular importance for backpropagation networks.**Gaussian function**The Gaussian function reaches its maximum function value for an activation of zero. It is an even function (f (-x) = f (x)). As the absolute amount of activation increases, its functional value falls. This drop can be adjusted using the parameter to be controlled; higher values for result in a delay in the reduction of the function values starting from the maximum.

Please open the simulation (Java applet) first. This Java applet can be used to calculate and visualize the outputs of a neuron for various activation and output functions. The neuron receives two-dimensional inputs with the components x

The diagram shows the output ("Netoutput") for 21x21 inputs equidistantly distributed in the input room. It has an additional context menu that is activated by pressing the right mouse button over the image. This means that different settings can be made with regard to the type of display.

The parameters of the activation (left) and output function (right) can be varied below the diagram. Depending on the selected function, different parameters are available. The slider to the right of the diagram moves a virtual parting plane parallel to the x

_{0}and x_{1}from the interval [0,1]. Its weights are called w_{0}and w_{1}designated.The diagram shows the output ("Netoutput") for 21x21 inputs equidistantly distributed in the input room. It has an additional context menu that is activated by pressing the right mouse button over the image. This means that different settings can be made with regard to the type of display.

The parameters of the activation (left) and output function (right) can be varied below the diagram. Depending on the selected function, different parameters are available. The slider to the right of the diagram moves a virtual parting plane parallel to the x

_{0}-x_{1}- Level (red - outputs below the parting line, green - above the parting line).1. Please set the weights to the values w

_{0}= 0.5 and w

_{1}= -0.5! As the activation function ("activation function") select the dot product activation ("dotproduct")!

a) Set the output function ("output function") to the identity ("linear")! Where does the straight line defined by the weights of the neuron run? Select a step function ("step") with the threshold value T = 0 as output function and verify your result!

b) Increase the threshold T to 0.2! Where does the straight line defined by the weights of the neuron run? How is the difference to the illustration from Exercise 1a to be explained?

c) Set the output function to the Fermi function ("sigmoidal") and determine the parameters T

_{i}("shift") and c ("steepness") in such a way that a picture comparable to exercise 1b results! Explain the result!

d) Select a Gaussian function with = 0.1 as output function! Does the use of a Gaussian function in combination with the scalar product activation make sense? Give reasons for your answer!

2. Set both weights to the value 0.5 and the activation function to the Euclidian distance ("euclidian distance")!

a) Select a Gaussian function with = 0.1 as output function! Compare the result with exercise 1d! What is the main difference?

b) Now choose a Fermi function ("sigmoidal") with the parameters T

_{i}= -0.1 ("shift") and c = 25 ("steepness") as output function! Compare the result with exercise 2a! Which output function (Fermi function or Gaussian function) is more suitable for a neural network in which a high output is supposed to represent a great similarity between the weight vector of a neuron and the input?

c) Try to simulate the function sequence from exercise 2b as well as possible with a ramp function ("ramp") as the output function! Which amounts must the two threshold values T

_{1}and T

_{2}receive?

3. Set both weights to the value 0.5 and the output function to the step function ("step")! Compare the output for the activation functions "euclidian distance", "manhattan distance", "max distance" and "min distance". Vary the threshold value T in the interval [0.2,0.4]! Which geometric shapes can be used to describe the areas of the input space in which the neuron delivers the output zero? Why do these areas have the respective shape?

- Why do many people fear mathematics?
- Is the dwarf fortress completed
- How do you thank a teacher
- Can I be tracked through my computer?
- What is 9 9 9 9 23
- What attracts women most to money or brains
- How can an astrophysicist help someone
- What is the best beer in china
- How many players can Ronaldo dribble past
- Fashion design is only associated with clothing
- What is a turnpike ticket
- How deep is the Arctic Sea
- How do I make myself study
- Which social media sites should I join?
- What is the mud pump impeller
- What role does telephoning play
- When is social exercise not helpful?
- ArcGIS is free for students
- Is PEC better than NITs
- Mentally awakened people have sex
- Modem or router for faster speeds
- Humanity is becoming more hairless
- Is Nike and Nike Air different
- The OPPO phone has software updates