How do you choose an activation function

Transfer functions

The formal static neuron represents a form of technical modeling of biological neurons and forms the basis of a large number of artificial neural networks. The complex biological structure has been greatly simplified. Since the formal static neuron is principally based on biological structures, its components have biological equivalents. The presentation of an input x corresponds to the transmission of information at the synapses. The weight vector w symbolizes the dendrites of a biological neuron. The cell body (soma) of a neuron i is activated by the activation function zi= f (x,wi) and the information processing in the axon hillock as well as the transmission along the axon by the output function yi= f (e.g.i). Based on biological neurons, the output can be interpreted as the mean spike frequency. A formal static neuron can thus be represented schematically as follows.

The activation function zi= f (x,wi) and the output function yi= f (e.g.i) are referred to as transfer functions. Various transfer functions are presented in detail below.

1. Activation functions

The activation function zi= f (x,wi) links the weights wi of a neuron i with the input x and from this determines the activation or state of the neuron. The activation functions can be divided into two basic groups: the scalar product activation (see section 1.1) and the activation based on distance measures (see section 1.2). The functioning of the activation functions of both groups is fundamentally different.

1.1 Dot product activation ("dot product")

The scalar product activation of a neuron corresponds to the weighted sum of the inputs.

For the interpretation of the scalar product activation it must be taken into account that an equation of the form

a (hyper) plane is defined by the coordinate origin. That is, the activation zi= f (x,wi) receives the value zero if an input is on a hyperplane determined by the weights. Activation increases or decreases with increasing distance from the plane. The sign of the activation indicates on which side of the hyperplane the input is located x which means that the input room can be divided into two areas.

In order to enable the use of hyperplanes in a general situation, each neuron i must also have a threshold value i receive. This results in the following description of the hyperplanes.

Using a neural network with two layers of trainable weights, n input neurons, one output neuron and a restriction of the inputs to the interval [0, 1], the input space can be represented as an n-dimensional cube. This space is then separated by (n-1) -dimensional hyperplanes, which are determined by the hidden neurons. The output neuron in turn defines a hyperplane in the space formed by the hidden neurons, which enables various sub-areas of the input space to be linked.

The following figure shows the hyperplanes or straight lines in the input space of a corresponding network that can handle the logical AND operation.

In combination with the scalar product activation, output functions should be used that take the sign of the activation into account, since otherwise it is not possible to differentiate between the subspaces separated by the hyperplanes.

1.2 Activation functions based on distance measurements

Activation functions based on distance measures represent an alternative to scalar product activation. The weight vectors of the neurons represent data points in the input space. A neuron i is activated based on the distance of its weight vector wi to input x.

Various dimensions can be used to determine the distance. A selection of these is presented below.

Euclidian distance ("euclidian distance")

The activation of the neurons takes place exclusively due to the spatial distance of their weight vectors from the input. The direction of the difference vector x-wi is insignificant here.

Mahalanobis distance ("mahalanobis distance")The Mahalanobis distance is a statistically corrected Euclidean distance of the weight vector of a neuron i to the input x based on an estimation model of the data distribution represented by this neuron. The estimation model is given by the covariance matrix C.i described.

Maximum activation ("max distance")

The activation of the neurons takes place on the basis of the maximum absolute amount of the components of the difference vector x-wi.

Minimum activation ("min distance")

The activation of the neurons corresponds to the minimum absolute amount of the components of the difference vector x-wi.

Manhattan distance or amount sum norm ("manhattan distance")

The Manhattan distance results in an activation of the neurons based on the sum of the absolute amounts of the components of the difference vector x-wi.

2. Output functions

The output function defines the output yi of a neuron i depending on its activation zi(x,wi). In general, monotonically increasing functions are used for this. Based on biological neurons, this results in a growing willingness to spike (in the form of a higher mean spike frequency) with increasing activation. With the help of an upper limit value for the output of a neuron, the refractory periods of a biological neuron can be simulated. A selection of possible output functions is presented below.

Identity function ("linear")

The identity function has no threshold values ​​and results in an output yithat with the activation zi(x,wi) is identical.

Step function ("step")

This function is the classic all-or-nothing threshold function. It is similar to the ramp function, whereby the function changes its function value abruptly if the activation rises above a threshold value T.

Ramp function ("ramp")

The ramp function combines the step function with a linear output function. As long as the activation exceeds the threshold T1 falls below, a neuron receives the output yi= 0; However, if the activation exceeds the threshold value T.2, the output is yi= 1. For activations in the interval between the two threshold values ​​(T1≤zi(x,wi) ≤T2) there is a linear interpolation of the output.

Fermi function ("sigmoidal")

This sigmoid function can be configured with the help of two parameters. Ti ("shift") gives a shift of the function by the amount -Ti at; the parameter c ("steepness") is a measure of its steepness. Assuming that c has a value greater than zero, the function is strictly monotonically growing, continuous and differentiable in the entire domain of definition. This is why it is of particular importance for backpropagation networks.

Gaussian function

The Gaussian function reaches its maximum function value for an activation of zero. It is an even function (f (-x) = f (x)). As the absolute amount of activation increases, its functional value falls. This drop can be adjusted using the parameter to be controlled; higher values ​​for result in a delay in the reduction of the function values ​​starting from the maximum.

Please open the simulation (Java applet) first. This Java applet can be used to calculate and visualize the outputs of a neuron for various activation and output functions. The neuron receives two-dimensional inputs with the components x0 and x1 from the interval [0,1]. Its weights are called w0 and w1 designated.

The diagram shows the output ("Netoutput") for 21x21 inputs equidistantly distributed in the input room. It has an additional context menu that is activated by pressing the right mouse button over the image. This means that different settings can be made with regard to the type of display.

The parameters of the activation (left) and output function (right) can be varied below the diagram. Depending on the selected function, different parameters are available. The slider to the right of the diagram moves a virtual parting plane parallel to the x0-x1 - Level (red - outputs below the parting line, green - above the parting line).

1. Please set the weights to the values ​​w0= 0.5 and w1= -0.5! As the activation function ("activation function") select the dot product activation ("dotproduct")!

a) Set the output function ("output function") to the identity ("linear")! Where does the straight line defined by the weights of the neuron run? Select a step function ("step") with the threshold value T = 0 as output function and verify your result!

b) Increase the threshold T to 0.2! Where does the straight line defined by the weights of the neuron run? How is the difference to the illustration from Exercise 1a to be explained?

c) Set the output function to the Fermi function ("sigmoidal") and determine the parameters Ti ("shift") and c ("steepness") in such a way that a picture comparable to exercise 1b results! Explain the result!

d) Select a Gaussian function with = 0.1 as output function! Does the use of a Gaussian function in combination with the scalar product activation make sense? Give reasons for your answer!

2. Set both weights to the value 0.5 and the activation function to the Euclidian distance ("euclidian distance")!

a) Select a Gaussian function with = 0.1 as output function! Compare the result with exercise 1d! What is the main difference?

b) Now choose a Fermi function ("sigmoidal") with the parameters Ti= -0.1 ("shift") and c = 25 ("steepness") as output function! Compare the result with exercise 2a! Which output function (Fermi function or Gaussian function) is more suitable for a neural network in which a high output is supposed to represent a great similarity between the weight vector of a neuron and the input?

c) Try to simulate the function sequence from exercise 2b as well as possible with a ramp function ("ramp") as the output function! Which amounts must the two threshold values ​​T1 and T2 receive?

3. Set both weights to the value 0.5 and the output function to the step function ("step")! Compare the output for the activation functions "euclidian distance", "manhattan distance", "max distance" and "min distance". Vary the threshold value T in the interval [0.2,0.4]! Which geometric shapes can be used to describe the areas of the input space in which the neuron delivers the output zero? Why do these areas have the respective shape?