Apache Airflow quickly attracts users

Amazon Managed Workflows for Apache Airflow (MWAA) Q&A

Q: What is Amazon MWAA?

Amazon Managed Workflows for Apache Airflows (MWAA), is an Apache Airflow managed service designed to provide business insights into an organization by combining, enriching, and transforming data through a series of tasks known as workflow. Managed workflows free you from the administration, configuration and scaling of the Airflow environment, while you orchestrate data processing workflows and manage their execution through AWS-based logging and monitoring functions. You can run your existing Airflow workflows on Amazon MWAA and interact with their environment programmatically using the AWS console, API, and command line interface (CLI).

Q: When should I use managed workflows?

You should leverage Amazon MWAA to spend more time engineering / data science building workflows and less time managing infrastructure and airflow environment, while getting consistent performance from the managed service. Data engineering and data science teams use Airflow as the leading open source orchestration environment for creating and executing workflows that define ETL (Extract-Transform-Load) jobs and data pipelines for machine learning. You will appreciate Airflow's ability to programmatically create, schedule, and monitor workflows written in Python, the preferred language for computing. The Airflow task plugin model and open architecture that allows you to build custom workflows, including support for on-premise data sources. However, a team that wants to take advantage of Airflow's programmatic user interface must first configure and maintain the servers and monitoring in order for them to work. Many customers use data technicians to manage the worker fleet, install dependencies, scale the system up and down, and restart the scheduler. With managed workflows, these steps are no longer necessary, because you get a managed airflow environment that is highly available, monitored and automatically scalable.

Q: What does Amazon MWAA manage for me?

Amazon MWAA manages the work associated with setting up Airflow, from providing the infrastructure capacity (server instances and storage) to installing the software to providing simplified user management and authorization through AWS Identity and Access Management (IAM) and Single Sign-On (SSO). Once your Airflow is up and running, Amazon MWAA scales your staff to handle the volume of workflows running and automates common management tasks like patching the operating system and updating Airflow software.

Q: How is this service connected or working with other AWS services?

Amazon MWAA is a workflow environment that enables data engineers and data scientists to build workflows with other AWS, on-premise, and other cloud services. Amazon MWAA workflows use Athena queries to pull input from sources such as S3, perform transformations on EMR clusters, and can use the resulting data to train machine learning (ML) models on SageMaker. Workflows in Amazon MWAA are created as directed acyclic graphs (DAGs) using Python. A key advantage of Airflow is the open extensibility through plugins, which allows you to create task plugins for any AWS or on-premises resources that you need for your workflows, including Athena, Batch, Cloudwatch, DynamoDB, DataSync, EMR, ECS / Fargate, EKS, Firehose, Glue, Lambda, Redshift, SQS, SNS, Sagemaker and S3.

Q: How are new Airflow versions, patches and upgrades handled?

Amazon MWAA provides automatic minor version upgrades and patches by default, with the option to specify a maintenance window during which these upgrades are performed. The maintenance window is your ability to control when software patches are applied, if requested or required. If a maintenance event is scheduled for a specific week, it will be triggered and completed within the maintenance window you specified. Maintenance windows have a duration of 2 hours.

Q: How can I monitor my Amazon MWAA service and workflow execution?

Amazon MWAA will provide access to available Airflow environments through the AWS Management Console, AWS CLI and SDK. The Airflow user interface can be configured for direct internet and / or VPC access. Airflow command line instructions will be available through an API call and the AWS CLI.

Q: What Airflow plugins does the service support?

Amazon MWAA supports all of 100+ Airflow community plugins developed to date, as well as any custom plugins that you create by simply dropping them into an S3 bucket.

Q: How can I monitor my Amazon MWAA service and workflow execution?

You can access any Airflow environment directly from the Amazon MWAA management console and the Airflow user interface. Airflow metrics are published in Amazon CloudWatch Metrics and logs are published in CloudWatch Logs.

Q: When should I use Amazon MWAA versus AWS Step Functions?

You should use Amazon MWAA if you value open source and portability. Airflow has a large and active open source community that regularly contributes new features and integrations. Amazon MWAA supports existing Airflow workflows and integrations with no code changes, migration is easy, and the environment is familiar.

You should use Step Functions when your priority is cost and performance. For example, if you are processing streaming data and transforming it in several steps before putting it in a DynamoDB database or S3, you should use Step Functions because they offer higher performance at a lower cost.