What is automated machine learning

Automated machine learning - the holy grail?

Machine learning is on the rise. Machine learning is the method of choice especially for recognizing patterns in data, specifically used, for example, in fraud detection, image recognition, in the area of ​​predictive machine maintenance or the prognosis of train delays.

There is a whole toolbox of algorithms behind machine learning, each of which can be set more precisely with so-called hyperparameters. With the common “k-Nearest Neighbors”, for example, the number of neighbors considered “k” can be set, with a neural network including the entire architecture of the network. An important task of a data scientist is to find the right algorithm for the respective problem and to "set" it correctly. In fact, the range of tasks is significantly larger: A data scientist must understand the “business” perspective of a problem, deal with the data situation, preprocess the data appropriately and arrive at a model that can be evaluated. Typically this is a cyclical process that follows the “Cross-industry standard process for data mining” (CRISP-DM).

Accordingly, projects in the field of machine learning are complex: they are demanding in terms of content and require time from people with different qualifications (business, IT, data scientist). In addition, it is often unclear at the beginning which result will ultimately be achieved - in this sense, the projects are also risky.

Data science projects can still be seen todayNot automate. However, certain steps of the project can in some cases be automated, this is what is hidden under the concept of "Automated Machine Learning" (AutoML). AutoML can, for example, provide assistance in choosing the right algorithm. A data scientist usually compares the results of many algorithms on the problem and selects an algorithm based on various aspects (such as quality, complexity / runtime, robustness). The setting of hyperparameters can also be automated in certain cases: many algorithms can be adjusted using adjusting screws and their quality can be optimized in relation to the specific problem.

AutoML helps and accelerates data science projects by automating parts or individual steps of the project, thereby increasing productivity. AutoML is of great help, for example, when evaluating algorithms. In this respect, many libraries and tools have now included AutoML as a supplement to their functional scope. To be mentioned are z. B. Auto-sklearn (in the Python area) or the DataRobot, which specializes in AutoML. The following example from RapidMiner shows how you can quickly compare different algorithms on a specific problem with the help of wizards:

However, AutoML should not be understood as a universal solution that completely automates data science projects and makes data scientists superfluous.

As in other specialist areas, automation is particularly useful for technical hard work, where otherwise highly trained specialists would have to systematically try out certain parameter sets and then compare the results - a job that is better left to an automatic machine.

There remain diverse challenges that still need to be addressed by people. This begins with the actual understanding of the problem and extends through various, mostly particularly time-consuming tasks in connection with data engineering to deployment. AutoML is a helpful tool, but so far not the holy grail.