How do you make a neural network

You are here:
  • expenditure
  • #42
  • Understanding Neural Networks - How? Just try!

Understanding Neural Networks - How? Just try!

Kai Spriestersbach

Kai Spriestersbach is the owner of the online magazine for successful websites SEARCH ONE and works as a freelancer for eology GmbH. There he takes on tasks in the training and further education of employees and gives lectures at online marketing conferences and company events. Kai Spriestersbach also advises companies on technology and marketing issues and works as an affiliate publisher. Overall, Kai Spriestersbach has more than 16 years of experience in online marketing and web development.

Download more from this author article as PDF

Since the announcement by Google in October 2015 that its RankBrain algorithm uses so-called artificial neural networks to improve search results, the topic of artificial intelligence and machine learning has come into the focus of the attention of many online marketers. But really understanding the underlying concepts is not easy. As always, you understand things better when you can try them out in practice. This is now quite easy and without much prior knowledge on a playground set up by Google's open source framework TensorFlow. There you can easily experiment with such a network directly in your browser and thereby better understand the functional principles in detail and thus develop your own, better understanding. Quasi neural networks for everyone - to try out. And best of all: it is completely free to use. Kai Spriestersbach shows you how you can best approach this step by step.

Google's CEO Sundar Pichai announced at the end of 2015 in the discussion of the quarterly figures with the concise sentence "Machine learning is a core transformative way by which we are rethinking everything we are doing" that the search engine giant sees great potential in the use of machine learning. Machine learning is suitable for improving existing products as well as developing completely new products. Many features and approaches are only possible on the basis of these technologies. It is therefore not surprising that the company from Mountain View also has one of the most advanced frameworks for researching and developing application scenarios in the field of machine learning. This framework, called TensorFlow, was released a month later under the Apache license 2.0 as open source software and has been available to everyone ever since.

Info: The story of TensorFlow

The TensorFlow framework already represents the second generation of development tools for machine learning from Google. The immediate predecessor DistBelief was created in 2011 by Jeff Dean and Greg Corrado as large-scale deep learning software on Google's cloud computing infrastructure in an internal Developed a project called “Google Brain”. After Google was able to win the AI ​​researcher Geoffrey Hinton for this project, his new approach of so-called "backpropagation" provided a breakthrough in improving the results. For example, the error rate in the previous speech recognition software could be reduced by 25% in one fell swoop. Dean and Hinton rewrote the DistBelief framework into a more flexible, faster, more robust and, above all, ready-to-use software library, which was subsequently published as TensorFlow. This framework makes it possible to use different approaches to machine learning and is not restricted to artificial neural networks. In addition, the framework was designed in such a way that the development and communication processes between researchers and development are implemented directly within the application, which enables immediate practical applications from newly gained knowledge. Ultimately, applications can even be improved during runtime using new learning data and several versions of a learning model can be tested in parallel.

Google's AI playground

Google has a keen interest in getting more people involved with artificial intelligence for a number of reasons. On the one hand, the acceptance of these technologies by users should be increased in the long term and, on the other hand, urgently needed developers must be introduced to the booming technology branch.

For normal users, Google set up an AI playground with experiments at aiexperiments.withgoogle.com in November 2016, in which one can playfully explore the possibilities of artificial intelligence. Unfortunately, the underlying functions in these admittedly very interesting and entertaining gadgets remain hidden from the user.

A special “Neural Network Playground” was set up in July at playground.tensorflow.org for interested developers, students and researchers. In this JavaScript web app you can play around with an artificial neural network (ANN) and thereby gain an understanding of how it works.

"Put simply, a neural network is a function that learns the expected output using training data for certain input values."

Example 1: Classification of data points with a separate Gaussian distribution

Figure 1 shows a simple problem of classification. In the present data set, each data point has two values: x1 (horizontal axis) and x2 (vertical axis). There are two types of data points in it, the orange and the blue.

For this relatively uncomplicated example, one could write a simple algorithm that separates the two groups by means of a diagonal line and at this boundary determines which of the groups the respective data point belongs to. As shown in Figure 2, the limit can be calculated with a simple line function using the formula

x1 + x2> b

be determined. For the more flexible use, this formula is still around the weights w1 and w2 expands so that

w1 x1 + w2 x2> b

results. Thus, using the values ​​for w1 and w2 the angle of the line can be set freely and using the value for b the position of the line can be changed. This formula is now suitable for any data set that can be separated into two groups by a straight line. The programmer now “only” has to use suitable values ​​for w1, w2 and b to tell the computer how to classify the data points. The neural network can calculate these values ​​itself by using the training data to learn where the limits are.

Under simply.st/tensor2, the neural network can be observed after a click on the play button, how it is with the input values x1 and x2 changes the weights on two neurons until the line is at the correct angle and the test data is classified correctly. At the beginning, the angle of the line changes very quickly, that is, the network is still learning very quickly, but in the end it needs more than 10,000 iterations to distinguish between blue and orange without errors (Figure 3).

For this example, using an ANN would certainly not be particularly efficient in practice. But with non-linear problems, i.e. when the boundary between the groups cannot be described by a simple line, the self-learning approach shows its strengths to the full. The exciting thing is that most of the real-world problems are not linear and that most of the real-world data sets are significantly more than two dimensions in size. This makes the manual creation of algorithms, formulas and weightings extremely time-consuming or even impossible.

Example 2: Classification of data points arranged in a circle

Not all problems are linear and therefore simple. In the following example in Figure 4, the approach of setting a meaningful limit by rotating and shifting the line fails. In order to solve this problem, the ANN now has to be extended by a hidden layer of neurons, which can, as it were, add further lines.

If you add it to the network three neurons in the hidden layer In addition, within a few hundred iterations, the ANN succeeds in finding a kind of triangle, i.e. the combination of three lines to separate the inner circle of the blue points from the outer circle of the orange ones (Figure 5).

Example 3: simple regression problem

But neural networks can not only make classifications. In the Playground web app, the regression is also available under Problem type. In contrast to the classification, in which each data point is either blue or orange, the data points in the regression can also assume any value in between. The simplest data set (Figure 6) can again have two input neurons for x1 and x2 can be solved without a hidden layer.

Example 4: Complex regression problem

But the second data set, which is available for the regression, can only be achieved with two hidden layers with at least three or two neurons on the basis of the linear input neurons for x1 and x2 to solve. This results in almost six open triangles that describe the data set well, but not without errors (Figure 7).

More data sets to experiment with

In addition to the two data sets for the regression, a total of four different data sets are available for the problem of classification. The Gaussian data set was already solved in the first example (Figure 3) and the circular one in the second example (Figure 5). The linear input functions are sufficient for the data record “exclusive or” (XOR) x1 and x2 also still out. Using the configuration in Figure 8, the problem is solved error-free with three hidden layers with six, five and two neurons.

You should experiment with the possibilities of the network yourself to deal with the problem of data points arranged in a spiral, which can no longer only be solved with linear input functions. In the simplest case, you switch on all input functions and equip the network with the maximum number (six) of hidden layers with eight neurons per layer.

Due to the self-regulating properties of an artificial neural network, unnecessary input signals are gradually ignored due to a low weighting and unnecessary neurons can also be identified after a few hundred iterations after running through the training data due to their low weighting (Figure 9). For example, the input function of multiplication (x1 * x2) is weighted relatively lightly and can even be deactivated without significantly influencing the quality of the results.

Further setting options for the network

For further experiments with the network, you can define how many percent of the data set should be used for training and how many should be used for testing the ANN (Ratio of training to test data) how much noise there should be in the data set (Noise) and after how many data records (between 1 and 30) the ANN can change its weightings (Batch size). The properties of the neurons themselves can also be turned. The learning rate (Learning rate) affects the speed at which the weights change and can be varied between 0.00001 and 10. The activation function of the neurons can also be selected.

Small digression: activation functions in detail

As activation functions (Activation) for the neurons, in addition to the standard value hyperbolic tangent (Tanh) still Sigmoid as ReLU and Linear to select. The variation of the activation functions leads to very different results in the data sets and provides an initial understanding of the complexity of the subject.

Sigmoid and hyperbolic tangent

The Hyperbolic tangent is basically also a sigmoid function, as this also describes an S-curve, but does not provide negative values ​​(Figure 10). Sigmoid activation functions are used in practice in most models that simulate cognitive processes, since biological neurons in brains have similar characteristics. In contrast to the linear activity function, the activity level of the sigmoid functions is limited both upwards and downwards. This not only indicates a higher biological plausibility (cf. the limited intensity of the action potential of biological neurons), but also has the advantage that the activity in the network cannot unintentionally "spill over" and only produces error values. Due to the existing horizontal asymptote, however, when using sigmoid functions, the transitions often disappear and thus a learning standstill in the respective sub-area of ​​the network occurs.

Linear function and ReLU

The linear activation function is ideal for solving the linear regression problem from the regression example in Figure 6. The sigmoid functions can only approximately describe this linear transition, but can never depict it exactly. However, the linear activation function is completely unsuitable for non-linear problems, which is why this activation function is not used in practice.

The ReLU function (rectified linear unit) is the common activation function in so-called convolutional neural networks (CNN), which are used, for example, to recognize handwriting and objects in images. This function describes a simple straight line which, however, projects any negative input to 0 (Figure 11). The constant course of the straight line leads to faster and more constant learning. The fact that the ReLU function does not deliver negative values ​​ensures that the weightings are stabilized during the learning phase.

Possibilities of regulation

Last but not least, the regulation (Regularization) and their regulation rate (Regularization rate) can be set. Regulation can be understood as a form of smoothing out the boundaries that have been found. It prevents the problem of so-called overfitting. Overfitting occurs when an ANN is learned on the basis of a finite number of training data, the defined limits then describe this exactly, but are drawn too narrow and are no longer suitable for generalization to other data.

Conclusion: The easy-to-understand visualization is particularly successful in the TensorFlow Playground. Both the input functions and the resulting combinations within the neurons are shown graphically. This means that even laypeople who are less mathematically versed can imagine something under the resulting limits. In addition, the weightings within the network are mapped with increasing learning success via the strength of the connecting lines of the neurons and thus give an impression of more important and less important neurons.

Unfortunately, the transition from this playground to real use is difficult in practice. TensorFlow is not as straightforward as PHP, which you can just start with and get meaningful results straight away. In particular, the correct architecture of the ANN, i.e. how many input neurons, hidden layers and neurons are created in each layer, requires just as in-depth knowledge of the topic as the analysis of the existing data and their normalization for use in the ANN.

As Google's Greg Corrado said in an interview with Bloomberg, machine learning is unfortunately not a magical syrup that you just pour on a problem to solve it. Rather, it takes a lot of good thought and care to build something that really pays off. Google's playgrounds are certainly suitable for a first introduction to the topic, but if you are serious about developing something with artificial intelligence or machine learning, you cannot avoid solid training in this area.

display