How does Apache Airflow compare to Luigi

Netflix provides the Metaflow data science framework as open source

The streaming provider Netflix has made the specially developed framework Metaflow available as an open source. This is a tool for the data science application area that Netflix has been using internally for two years to implement projects such as natural language processing (NLP) or research in the field of operations.

At its core, Metaflow is a simple Python library with which users can create their workflow as a directed acyclic graph (directec acyclic graph, DAG) with Python code. Metaflow is probably moving in the same environment as the Apache Airflow or Luigi products, but according to Netflix it offers some more tricks.

Metaflow in detail

Metaflow is based on the work of the machine learning infrastructure team at Netflix. During the development, the goal at Metaflow was to put the productivity of the data scientists in the foreground. According to this, developers save the data and models as normal Python instance variables so that they also work on distributed platforms. Metaflow also has a built-in artifact store, which should simplify the otherwise tedious management of artifacts.

Since Netflix relies on Amazon Web Services (AWS) for its cloud applications, Metaflow integration in AWS is included. Users can automatically make their code and data available in Amazon S3 as a snapshot. This should enable versioning and tracking of experiments without developers having to do anything themselves.

Netflix also promises a high-performance S3 client that can load data at up to 10 GB / second. For general data processing, the framework has a connection to the AWS Batch service. In addition, a connection to the common machine learning frameworks should be possible without any problems.

A blog post by the Netflix team provides a detailed overview of how Metaflow works. (bbo)

Read comments (5) Go to the homepage
Ad ad
Forum at heise online: Tools