How does biological evolution inspire machine learning

Evolutionary Algorithms

Transcript

1 Evolutionary Algorithms Seminar paper by Patrick Flick At the Faculty of Computer Science Institute for Theoretical Computer Science Supervising professor: Supervising staff: Juniorprof. Dr. Henning Meyerhenke Dr. Roland Glantz KIT University of the State of Baden-Württemberg and national research center of the Helmholtz Society

2 Declaration I affirm that I have written the work without outside help and without using sources other than those specified, and that the work in the same or a similar form has not yet been submitted to any other examining authority and has been accepted by them as part of an examination. All statements that have been adopted literally or in the same way are marked as such. Karlsruhe, August 24, 2012 (Patrick Flick)

3

4 Table of Contents iv Gene Recombination Performance Example Application - Symbolic Regression Symbolic Regression Example Conclusion 17

5 1 Introduction Metaheuristics are problem-unspecific methods for solving optimization problems by iterative improvement of candidate solutions in relation to a given quality measure. In contrast to S-metaheuristics, in which a single solution is iteratively changed in order to find a good solution, in population-based metaheuristics populations of several solutions are used and these are changed iteratively in order to find a good solution. Evolutionary algorithms (EA) are population-based metaheuristics. An EA is based on a population P of possible solutions e i P for the optimization problem or the heuristic to be solved. A solution is also referred to as an individual in the context of population-based metaheuristics. Evolutionary algorithms find application in solving combinatorial optimization problems, machine learning, engineering, and much more. 1.1 Biological background Evolutionary algorithms are motivated by biological evolution. The driving forces of biological evolution are variation, selection and genetic drift. Variation In order for a selection to be made, one first needs a variety (variation) of the organism that is subject to evolution. Only hereditary variations in the genetic material (DNA) play a role in biological evolution. Variation is created by mutations in DNA and by recombination in reproduction. Both of these processes are undirected and have no aim. The selection decides whether the variations in the gene pool are retained over the long term. Selection From the population of different individuals, individual characteristics or a complex interplay of characteristics are preferred for the chance of survival and partner search. This creates a selective pressure for these features. Suppose a characteristic can be represented in a simplified way as a simple mathematical quantity. Given an initial population in which this characteristic is approximately evenly distributed. Let it also be assumed that individuals with a smaller trait size in this population are disadvantaged by the selection. The gene combination for a more pronounced trait is more common in the offspring. The mean value of the feature is thus permanently shifted to a higher value in subsequent generations.

6 1.2 Process of an evolutionary algorithm 2 Genetic drift Genetic drift describes random influences in evolution that cause a population to drift in a random direction. This development has nothing to do with selection or fitness. The effects of genetic drift are particularly evident in the case of small population sizes and are decisive for biological evolution. In human evolution, for example, the influence of genetic drift is estimated to be significantly higher than the influence of selection [AC]. 1.2 Process of an evolutionary algorithm An evolutionary algorithm basically goes through the following steps: Initialization First, a population consisting of random individuals is initialized. Evaluation The fitness f (e i) is evaluated for all e i P. The fitness function describes the value to be optimized (the evaluation function) of the optimization problem or the heuristic. Figure 1: Rough sequence of evolutionary algorithms. Selection Based on the fitness value f (e i) of the individuals, random individuals who serve as parents for the next generation are determined. Individuals with a high fitness f (e i) are preferred. Reproduction The offspring are created from the selected parents. This can be done by simply copying or by recombining 2 or more parents. Recombination and mutation The offspring are recombined with one another (binary operation) and individuals are mutated (unary operation). This step creates evolutionary variation. Replacement The offspring replace parts of the current population or the entire population. These steps (with the exception of the initialization) are repeated until a termination criterion is met. Such a termination criterion can represent, for example, a maximum number of iterations or a maximum fitness achieved. 1.3 Overview In the following elaboration, the basic methods of the evolutionary algorithms are first presented and each of them is biologically motivated (Section 2). Building on the methods, two classic evolutionary algorithms are presented: genetic algorithms (GA) and genetic programming (GP) in section 3.

7 3 The biological relationship between genotype and phenotype is then discussed in order to motivate Gene Expression Programming (GEP). The idea and methods of Gene Expression Programming (GEP) are then presented (Section 4). Finally, a widespread application of GP and GEP, symbolic regression, is presented as an example (Section 5). 2 Basic methods 2.1 Representation The representation indicates how a solution or an individual in the population is coded. Various representations are dealt with in this thesis. The algorithmic difference is mainly in the selection, mutation and recombination. First, however, the biological representation, the DNA, briefly explained biology: DNA DNA stands for deoxyribonucleic acid and is the molecule in all living cells that carries the genetic information. DNA has the shape of a double helix of two Strands in the middle of which the coding bases A, T, C and G are arranged and paired with one another. The DNA stores the information redundantly. While one of the two strands encodes directly, the other is an inverse, complementary copy of the other strand. A pairs with T and C with G. The two strands bind tightly to one another and thus form the double helix (see Figure 2) [WC53]. Figure 2: Left: DNA double helix (source Wikipedia base pairs. 1). Right: DNA - It's a long way from DNA to solution or phenotype. This very complex relationship is discussed in more detail in Section 4.1. Linear Representation Analogous to the linear vector from {A, T, C, G} in DNA, the solution is represented as a string of fixed length made up of symbols. This representation is also called linear representation. For a symbol set V this representation is shown as a0 a1 a2 ... al 1 with ai V. l is the constant length of the representation [Hol75]. 1 Overview.png & filetimestamp =

8 2.2 Selection 4 The binary representation is the most common form of the linear representation [Tal09]. In this case, V = {0, 1} applies. A binary representation looks, for example, as follows (with l = 16): Tree representation Another common representation is trees. For example, programs or formulas can be represented as trees. In genetic programming, for example, solutions or individuals are whole, executable (computer) programs [KP05]. + * - Such a syntax tree of a simple formula can be seen in Figure 3. a b c d 2.2 Selection Figure 3: Syntax tree for (a + b) (c d). During the selection, individuals are determined from the current population, who then act as parents for the next generation. Stronger individuals are favored in the selection. In the case of evolutionary algorithms, this means that individuals with a higher fitness value f (e i) have a greater chance of being selected. Biology In biology, the selection is subject to very complex circumstances and fitness in general cannot be traced back to a single characteristic. In the following, a small example is used to explain what such a fitness function could look like in biology: A population consists of beetles that only differ in their color. In the initial population, all colors between brown and green are equally represented. Let us assume that this beetle population lives in a green meadow. It is then easy to understand that brown beetles have a selective disadvantage, as these are easily visible by birds. The fitness function can therefore be derived from the color Figure 4: Selection in beetles 2.. If a tree with a brown trunk is added to the example world, brown beetles have a selective advantage over green beetles if they are on the brown trunk. In this very simplified example, the fitness function no longer only depends on the color, but also on the location or the environment. 2 Copyright: The University of California Museum of Paleontology, Berkeley, 14

9 2.2 Selection Roulette Selection According to [Tal09], roulette selection is the most widely used selection method in evolutionary algorithms. An individual with better fitness is said to have a greater likelihood of selection. Roulette selection was already used (without this name) by J. H. Holland in [Hol75] as a selection strategy. The roulette selection can be clearly displayed with a random selection with selection. Figure 5: Roulette a roulette wheel. All individuals are placed on the wheel. Individuals with greater fitness have a greater proportion of the size of the roulette wheel. The selection is made by repeatedly turning the roulette wheel (see Figure 5). This idea can be formalized as follows: Let f (e i) be the fitness of the individual e i. Then the probability of the selection of the individual e i is given by: p i = f (e i) / (j f (e j)) That means the selection probability is proportional to the fitness. In roulette selection, a single individual with p i> 0 can be selected again and again, so that the entire next population consists only of this individual. This problem is called unlimited spread [Bak87] Stochastic Universal Sampling Stochastic Universal Sampling (SUS) was designed by Baker to keep the spread to a minimum and thus to solve the problem of roulette selection [Bak87]. For this purpose, all individuals for the next generation are selected at the same time. If a total of k individuals are to be selected, the idea is that k pins with the same distance to each other simultaneously determine the k positions on the roulette wheel and thus the k individuals (see Figure 6) Tournament selection Figure 6: Stochastic universal sampling. In tournament selection, t individuals are selected at random from the population. The fitness of the t individuals are then compared with one another and the individual with the best fitness is selected. If k individuals are to be selected, k tournaments are carried out. Rank-based selection Instead of choosing the selection probability proportional to the fitness value f (e i), only the rank of the individuals with regard to their fitness is considered in the rank-based selection. This is particularly advantageous when there are a few individuals with a very high fitness value in the population. In roulette selection and stochastic universal sampling, these individuals are very favored and the population quickly loses its diversity. Exactly this problem no longer occurs with the rank-based selection.

10 2.3 Reproduction - Mutation Reproduction - Mutation The mutation is a unary operation which converts an individual or a solution e i into a mutated individual e i. There are algorithm- and problem-specific mutation probabilities for the mutation, which are usually chosen so that about one or two mutations occur per individual [Tal09] Biology: DNA mutations are changes in DNA. The simplest form of a mutation is the point mutation: a base (A, T, C, G) is replaced by another base. Mutations can be caused by: Radioactive radiation Ionizing radiation Mutagenic chemicals Viruses Errors during DNA duplication Criteria In evolutionary algorithms, there are some criteria that a mutation operator should adhere to. Initially, a mutation should only generate solutions that are still in the search space and represent legal solutions. A mutation should not cause too great changes, but rather a solution should mutate such that e i is in the local neighborhood in the search area. If a mutation causes large jumps in the search space, the evolutionary algorithm can quickly degrade to a random search [Tal09] Mutation in linear representation The mutation in linear representation corresponds to the point mutation in genetics. At random positions i in the representation a 0, a 1, ..., a i, ..., a l 1, the symbol a i V is replaced by a random new symbol a i V [Hol75]. In the binary representation, the mutation operator most commonly used is the bit flip [Tal09]. Analogous to the biological point mutation, one bit is exchanged for another. In this case, a 0 is replaced by a 1 and a 1 by a 0. The bit flip operator can become problematic if the solution involves one (or more) binary-coded numerical values. A bit flip near the most significant bit (MSB) creates a mutated solution that is very different from the original solution. In particular, the mutated solution is no longer in the local neighborhood of the original solution. A bit flip operator must therefore be adapted very precisely to the underlying interpretation of the solution so that the mutation operator generates local solutions that are still in the search space and represent legal solutions. This can be made possible, for example, by the restriction that only insignificant bits are flipped.

11 2.4 Reproduction - Recombination Mutation in the Tree Representation As a mutation operator, Koza [KP05] suggests generating a random mutation point in the syntax tree and replacing the subtree starting there with a randomly generated subtree (see Figure 7). Figure 7: Mutation in trees according to Koza. 3 Talbi introduces further mutation operators for trees [Tal09]: Grow A random leaf of the tree is replaced by a randomly generated tree. Shrink A randomly selected inner node is replaced by a random terminal leaf. Switch Two child nodes are exchanged at a randomly selected inner node. Cycle Two nodes with functions with the same number of arguments are selected at random. The functions of the two nodes are exchanged. 2.4 Reproduction - Recombination Recombination is a binary operation that combines two individuals with one another and thus creates new individuals that each have the characteristics of both parents. Biology A cell has 2n chromosomes. Two chromosomes are almost identical, one from each parent. During the so-called meiosis, the chromosomes of the parents are distributed and cells with only n chromosomes, the so-called gametes (sperm or egg cells), are formed [Alb08]. If one considers only this division of the parental chromosomes for the sake of simplicity, there are 2 n different possible combinations for the formation of gametes. During reproduction, a sperm cell fuses with an egg cell, creating a fertilized cell (the child) with in turn 2n chromosomes. Given 2 parents, there are 2 2n possible combinations. In humans, for example, n = 23, so there are at least 2 46> possible combinations. 3 Source: Koza [KP05].

12 2.4 Reproduction - Recombination 8 Figure 8: Crossing-over 4. However, there is another form of recombination. Before the parental chromosomes are split up, they are recombined with each other during the so-called crossing-over (see Figure 8). Whole parts of the chromosomes are exchanged with one another. The crossing-over happens up to several times per chromosome and increases the possible combinations by several orders of magnitude. Recombination in the linear representation n-point crossover The n-point crossover is a generalization of the 1-point crossover presented in [Hol75] and is a recombination Operator for the linear representation, which is motivated by the biological crossing-over [Tal09]. Given two individuals, their representations are recombined with one another, creating two new individuals. In the case of the n-point crossover, the linear vectors are divided into n + 1 pieces with n cuts and then put together alternately (see Figure 9) Figure 9: Example of a 2-point crossover with binary representation. Uniform crossover In the uniform crossover, each individual symbol is chosen randomly and distributed uniformly by one of the two parents. Thus, both parents have an approximately equal share in each child. 4 Copyright University of Waikato, Media / Images / Chromosomes-crossing-over

13 2.5 Replacement strategies Recombination in the tree representation Koza defines a crossover variant for trees as a recombination operator in the tree representation [KP05]. For this purpose, a random subtree is determined in each of two parents and these subtrees are then exchanged (see Figure 10). Figure 10: Recombination via crossover in trees according to Koza Replacement strategies Replacement strategies in evolutionary algorithms deal with how the current population is replaced by the offspring. In the following, two alternative strategies are presented [Tal09] Generational replacement In generational replacement, the offspring generation replaces the entire population. Since the population size is usually fixed in evolutionary algorithms, a fixed number of offspring must be generated here. Biologically, the complete, complete replacement does not initially appear realistic. With many plant species, however, this complete generation change is not so unusual. The so-called annual plants bloom only once in summer and only the seeds survive the winter. Wheat, for example, is such a plant [SSZ98] Steady-state replacement In steady-state replacement, only one individual is exchanged in each generation. For example, the weakest individual is exchanged. In biology, this strategy can best be compared with a small group of mammals, in which dying animals and new births rarely occur. Further strategies Generational replacement and steady-state replacement are extreme approaches. There are other approaches [Tal09] that tend to strive for a middle ground. Any number 1 <λ

14 10 will. In the case of elitism, the λ weakest individuals are always exchanged. This can lead to too rapid a convergence. In order to keep the search space large, a stochastic approach to replacement can be used here. 3 Classical Algorithms 3.1 Genetic Algorithms The genetic algorithms (GA) are a very popular group of evolutionary algorithms [Tal09] for solving discrete optimization problems. In genetic algorithms, solutions are represented as binary strings (see Section 2.1.2). Genetic algorithms were originally presented or disseminated by J. H. Holland [Hol75]. A fitness-proportional algorithm such as roulette selection (Section 2.2.2) or stochastic universal sampling (Section 2.2.3) is generally used as the selection algorithm for the GAs. Roulette selection was introduced in [Hol75] without this name. There are many different mutation and recombination operators in the different flavors of genetic algorithms. The operators already presented (Section and 2.4.2): n-point crossover, uniform crossover and point mutation are the most frequently used. The 1-point crossover operator, the point mutation and an inversion operator not dealt with in this work were originally presented in [Hol75]. 3.2 Genetic programming Genetic programming (GP) comes from the field of machine learning. In genetic programming, individuals do not consist of strings of fixed length, but of executable programs that in turn solve a task. The programs are represented as syntax trees and are subject to evolution. The evolutionary algorithm modifies the programs themselves in their tree representation. The mutation and recombination operators for trees explained above (see Section and 2.4.3) are used for this purpose [KP05]. In order to determine the fitness of an individual, the programs are carried out and assessed on a problem-specific basis. A disadvantage of genetic programming is that very large populations are required to maintain the required genetic diversity. The syntax trees can also become very large, making the evaluation of the fitness function very computationally intensive and the entire process slow [Tal09]. In section 5 we will use the example to show how genetic programming can be used for the problem of symbolic regression.

15 3.3 Further evolutionary algorithms Further evolutionary algorithms Two further evolutionary algorithms that are not dealt with in more detail in this paper are evolutionary strategies and evolutionary programming. The evolutionary strategies are used for continuous optimization problems. Mutation and recombination operators specially developed for continuous values ​​are used for this purpose. Evolutionary programming was developed to find rules for finite state machines [Tal09]. 4 Gene Expression Programming 4.1 Biological Motivation - Genotype and Phenotype In biology a distinction is made between the genotype and the phenotype. The genotype of an organism describes the genetic makeup, i.e. the genetic information or the DNA of the organism. On the other hand, the phenotype of an organism describes the characteristics of the genes and characteristics of the organism. This extends from the proteins of the cell to the external appearance and the psychological properties of the organism. The difference can already be clearly seen at the protein level. The DNA codes for the amino acid sequence of the proteins. These sequences fold into complex structures and thus result in functional units: the proteins of the cell. These proteins carry out the functions of the cell: from motor proteins that transport other substances through the cell, to proteins that are involved in metabolism, to proteins that repair defective DNA. So there is a clear difference between the genotype (the base sequence in DNA) and the phenotype (the proteins and their function). During evolution, it is not the protein that is mutated and recombined, but the DNA. A mutation in the coding DNA can have no effect at all, or it can have a very drastic effect that changes the protein structure permanently and thus converts the protein or makes it completely inoperable. In the evolutionary algorithms discussed so far, the representation is chosen so that the result is shown directly. In genetic programming, the solution consists of syntax trees that represent programs. The recombination and mutation operators in genetic programming operate directly on the syntax trees and thus directly on the solution. So there is no indirect expression here, as there is in biology. The Gene Expression Programming [Fer01] shown below is similar to genetic programming (GP), an evolutionary algorithm that works with syntax trees or entire programs. Gene Expression Programming encodes the syntax trees in linear chromosomes and thus achieves an indirect representation similar to that of biology. 4.2 Overview Gene Expression Programming was presented by Candida Ferreira in [Fer01].

16 4.3 Representation 12 Just like in genetic programming, in gene expression programming the individuals represent abstract syntax trees. For example, programs are sought that solve a specific problem. The evolutionary algorithm works with whole populations of programs, which are represented as syntax trees, in order to find such a good solution. Another application is symbolic regression (see Section 5), in which mathematical formulas are represented as trees and the EA is used to search for a formula that best reproduces certain measurement data. With the same field of application as genetic programming, gene expression programming achieves a performance that is two orders of magnitude better [Fer01] than genetic programming. This approach is explained in more detail below. The following illustration is based on [Fer01]. 4.3 Representation In gene expression programming, the syntax trees are encoded in linear chromosomes of fixed length. A special tree coding is used for this, in which the nodes are arranged top-down. First the root, then all level 1 nodes from left to right, then all level 2 nodes, etc. In [Fer01] this coding is referred to as K-expressions. The following syntax tree for the expression (a + b) (c d) is given: * + - a b c d Figure 11: Syntax tree for (a + b) (c d). The associated linear K-expression coding or the chromosome is given by: * + - a b c d This mechanism enables a separation of genotype (the linear chromosome) and phenotype (the syntax tree). Now the length of the chromosome is fixed, but a syntax tree can take on different shapes and sizes. To make this possible, non-coding code is allowed in the linear chromosome. The chromosome is divided in half. The front part, the head, can contain all possible symbols, terminals and non-terminals. The rear half (tail) only contains terminal symbols. This ensures that all possibilities in the chromosome are legal

17 4.3 Representation 13 result in the syntax tree. For the exact size of tail, the following must apply for a given size of the head h and a maximum number of arguments of a function (of a non-terminal) n, so that only legal syntax trees can be coded: t = h (n 1) + 1 ( 1) Let the following gene be given with h = 10, n = 2 and t = 11: / b * aabaabaabbaaab The thicker edge at the end of the illustration represents the tail in contrast to the thinner edge of the head. The chromosome codes for the following syntax tree (Figure 12): + * / - b * a a b a Figure 12: Syntax tree for a a + b a b. Only the first eleven characters in the chromosome code parts of the syntax tree. The remaining characters are non-coding. After a simple mutation at the 10th position from a b to a +, the termination slips two characters further to the right and a different syntax tree is the result (see Figure 13). + * / - b * a a + a a b Figure 13: Syntax tree for a a + b a (a + b).

18 4.4 The algorithm 14 Just as a syntax tree can be enlarged by a simple mutation, the tree can also become smaller. If, for example, the root mutates from a non-terminal to a terminal symbol, only the first symbol of the gene is coding and the remaining part is non-coding. Chromosomes with multiple genes In gene expression programming, multiple genes are accommodated in one chromosome. Each gene codes for one subtree. The subtrees are merged with a specified operation. For arithmetic operations (as in the previous example), for example, the addition (+) is used, for Boolean operations the Boolean OR. A gene in a chromosome with several genes can solve partial problems of an overall problem. Such a hierarchical composition of several basic building blocks to form a complex system is made possible by the structure of several genes. The basic building blocks are subject to evolution separately and are exchanged between individuals during gene recombination. Ferreira argues [Fer01] that complex problems can be solved more efficiently this way. Such a chromosome looks, for example, as follows: b * b a b b a b * b + a b b b a - * a b b a b a 4.4 The algorithm Like any other evolutionary algorithm, the Gene Expression Programming algorithm starts with an initial population with a fixed size λ. First of all, the individuals are evaluated or the programs are carried out, and thus the fitness of each individual is determined. With roulette selection, λ offspring are determined. During reproduction, there is initially no recombination, the selected individuals are only copied. Recombination and mutation operations are then carried out on the entire set of offspring. There are fixed probabilities for each of these operations, so that recombination or mutation occurs in an individual. For example, with a mutation probability of 0.7, around 0.7 λ many individuals in the progeny population are mutated. To close the circle, the population is now replaced by the progeny population, which is then evaluated in turn. 4.5 Mutation operators Several different mutation operators are used in gene expression programming: The point mutation, which is based on the mutation explained in section for the binary representation, and various transpositions.

19 4.6 Recombination point mutation With point mutation, a symbol at a random position in the chromosome is replaced by a random, different symbol. Only terminal symbols may serve as replacements in the tail. Ferreira chooses the mutation probability so that around two point mutations occur per chromosome. If a symbol is mutated in the non-coding part of the chromium, there is no effect on the syntax tree. However, if a symbol is mutated in the coding part of the syntax tree, it can have very drastic effects. Transposition The transposition is biologically motivated. During transposition, parts of a gene or entire genes change their position in the chromosome [Wat87]. This mutation is also used in three different variants in gene expression programming. 1. IS Elements A short fragment is transposed into the head (excluding the root) of the gene. All symbols before the insertion point are retained, all symbols after the insertion point are shifted to the right by the number of transposed elements. The tail remains completely intact. 2. RIS elements As with the IS element, a short fragment is transposed. With the Root Insertion Sequence (RIS), the fragment is transposed to the root position. 3. Gene transposition In a multigene chromosome, a gene is transposed to the beginning of the chromosome. This means that the position of the genes in the chromosome is changed. In the case of addition as a union operation of the subtrees of the multigene chromosome, this initially has no effect, but in the case of recombination this can lead to new individuals. 4.6 Recombination Three different recombination operators are used. The 1-point crossover operator, the 2-point crossover operator (see Section 2.4.2) and the gene recombination. These recombination operators are defined in such a way that they select two random individuals from the population and recombine them into two new individuals. The two new individuals replace the two selected individuals in the population point and 2-point crossover For a detailed description of the n-point crossover operator see section The chromosomes of the parents are split into two partial sequences in the 1-point crossover and in cut up three partial sequences. The partial sequences are alternately attached to one another. If you use each partial sequence in this way, two new, recombined chromosomes result.

20 4.7 Performance of gene recombination With multigene chromosomes, whole genes are distributed similarly to the n-point crossover. The recombined individuals abcd and AbCd arise from the chromosomes abcd and ABCD. The letters a, b, c, d stand for whole genes and upper and lower case for their expression. 4.7 Performance Ferreira defines its own performance measure F z, which indicates how often the fitness has to be evaluated. Where: G number of generations P population size F z = G P C R z (2) C number of fitness calculations per individual per generation (number of test cases) R z number of executions until a correct solution up to generation G is found with probability z = 0.99. With this performance measure, Ferreira uses various sample applications to show that gene expression programming is consistently at least two orders of magnitude better than genetic programming [Fer01]. One of these sample applications, symbolic regression, is explained in more detail in the following section. The other applications that Ferreira used for comparison are sequence induction, block stacking, Boolean concept learning and cellular automata rules. 5 Sample application - symbolic regression 5.1 Symbolic regression Symbolic regression refers to a process in which a mathematical formula (e.g. 3x 2 + e sin (x)) is sought that matches the measured data. No assumption is made about the distribution of the data. Genetic programming (GP) was developed by Koza [KP05] to solve symbolic regression with an evolutionary algorithm. The population, consisting of trees representing formulas, is mutated and recombined in order to obtain a symbolic solution for the measurement data. 5.2 Example With the open source program GPdotNET 6, the symbolic regression can be solved with genetic programming. As a first example, the measured values ​​6

21 17 generated from the simple function g (x) = x 3 + 2x 2 + 3x. A perfect solution was found after less than 20 generations. Figure 14 shows two different solutions to this problem. Figure 14: Simple example of GPdotNET. Two different solutions to the same problem. With a more complex data model, the program could not find a perfectly suitable function even after 500 generations. The measured values ​​and the best found function after 500 generations can be seen in Figure 15. The GP algorithm has a lot of problem-specific tuning potential. This was not exploited here, you can see the results with standard settings. Figure 15: Example of GPdotNET with difficult data. The graphic shows the measurement data in blue and the model of the GP in orange. 6 Conclusion Evolutionary algorithms are used to solve many optimization problems, especially those for which no (good) methods are available. The EA give a general scheme that can be adapted to a very large number of problems.The associated parameter tuning is usually done through trial and error. Adapting to a specific problem often requires a lot of work and experience. An EA does not guarantee that it will find an optimal solution in a finite time, and the runtime of an EA suffers from large population sizes. However, the population-based approach enables an EA to be easily parallelized.

22 Literature [AC] Rebecca Rogers Ackermann and James M. Cheverud: Detecting genetic drift versus selection in human evolution. articlerender.fcgi? artid = [Alb08] Bruce Alberts: Molecular biology of the cell. Garland Science Taylor & Francis, New York, 5th ed. Edition, 2008, ISBN; ; ; [Bak87] James E. Baker: Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the Second International Conference on Genetic Algorithms on Genetic Algorithms and their Application, Pages 14 21, Hillsdale, NJ, USA, L. Erlbaum Associates Inc., ISBN acm.org/citation.cfm?id= [Fer01] [ Hol75] Cândida Ferreira: Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems, 13 (2): 87 129, http: // John H. Holland: Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Univ. of Michigan Pr., Ann Arbor, 1975, ISBN [KP05] John Koza and Riccardo Poli: Genetic Programming. Springer US, 2005, ISBN [SSZ98] E. Strasburger, P. Sitte and H. Ziegler: Textbook of botany for universities. G. Fischer, 1998, ISBN books? Id = phsqaqaamaaj. [Tal09] El Ghazali Talbi: Metaheuristics: From Design to Implementation. Wiley Publishing, 2009, ISBN, [Wat87] Jane R. Watson, James D.; Gillen: Molecular biology of the gene. Benjamin / Cummings, Menlo Park, Calif. [i.a.], complete 4th ed. edition, 1987, ISBN [WC53] J. D. Watson and F. H. C. Crick: Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature, 171: 1953.