Airflow/ssh_connectivity.py at Main · AccentFuture-dev/Airflow

Contribute to AccentFuture-dev/Airflow development by creating an account on GitHub.

github.com

 

Workshop Agenda

1.Introduction

2.Key Technologies Overview

3.Architecture

4.Key Concepts

5.Hands-On Session: Real-World Use Case

6.Best Practices and Optimization Tips

7.Q&A and Wrap-Up

Introduction

Apache Airflow is a powerful workflow orchestration tool that automates and monitors workflows. A common use case is executing commands on remote servers via SSH, which facilitates automating script execution, file transfers, and system management.

In this blog, we will demonstrate how to set up SSH connectivity in Airflow using the SSHOperator, allowing for the execution of shell commands on remote servers securely.

Key Technologies Overview

This setup relies on the following core components:

  • Apache Airflow – A workflow automation platform for managing data pipelines.
  • SSH (Secure Shell Protocol)– A secure protocol for remote machine access.
  • SSHOperator– An Airflow operator that executes commands on remote servers over SSH.
  • Airflow Connections– A secure mechanism to store credentials and configurations for external systems.

Architecture

The SSH-based task execution workflow in Airflow follows this structure:

  1. Apache Airflow DAG: Defines the workflow sequence and dependencies.
  2. Airflow Scheduler: Manages task execution timing.
  3. SSHOperator: Establishes an SSH connection and executes commands remotely.
  4. Airflow Connection Manager: Stores SSH credentials securely.
  5. Remote Server: The system where commands are executed.

Key Concepts

1.DAG (Directed Acyclic Graph): A DAG in Airflow defines the execution sequence of tasks. Here, we create a DAG to execute a command remotely via SSH.

2.SSHOperator: A built-in Airflow operator that enables remote command execution over SSH.

3.SSH Connection ID: A stored Airflow connection that securely holds SSH credentials and host details.

4.Retries & Scheduling: Configures retry attempts in case of failure and defines the scheduling interval.

Implementation: Real-World Example

Step 1: Define Default Arguments

The following Python code defines default settings for the DAG, including retry policies and email notifications.

from airflow.models import DAG 
from datetime import timedelta, datetime
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.utils.dates import days_ago

default_args = {
'owner': 'venkat',
'start_date': days_ago(0),
'email': ['[email protected]'],
'retries': 1,
'retry_delay': timedelta(minutes=5),
}

Step 2: Create the DAG

The DAG below includes an SSH task that connects to a remote server and executes a test command.

with DAG('ssh_connectivity', default_args=default_args, schedule_interval=None) as dag: 
SSHOperator(
task_id='test_ssh_remotely',
ssh_conn_id='ssh_test',
command='echo "Testing SSH connectivity"',
)

Step 3: Explanation of the Code

  • DAG Definition: The DAG is named ssh_connectivity and has no predefined schedule.

SSHOperator Task:

task_id='test_ssh_remotely'

task_id=’test_ssh_remotely’: Identifies the task.

ssh_conn_id='ssh_test'

ssh_conn_id=’ssh_test’: Specifies the SSH connection stored in Airflow.

command='echo "Testing SSH connectivity"'

command=’echo “Testing SSH connectivity”’: Runs a basic test command remotely.

Best Practices for Using SSH in Airflow

  1. Use SSH Keys Instead of Passwords:  Enhances security by avoiding plaintext passwords.
  2. Securely Store Credentials: Utilize Airflow’s Connection Manager or a secret manager.
  3. Enable Retries: Configure automatic retries to handle network-related failures.
  4. Monitor Task Logs: Check Airflow logs to debug SSH execution failures.
  5. Optimize Network Configuration: Ensure proper firewall settings for seamless connectivity.

Q&A session

This guide demonstrated how to set up SSH connectivity in Apache Airflow using SSHOperator. By implementing this approach, users can automate remote server management, script execution, and system administration tasks.If you have any questions feel free to reach out.

Conclusion

This blog covered the process of configuring SSH connectivity in Apache Airflow using SSHOperator. It simplifies system management, reduces manual effort, and ensures efficient workflow automation. By following best practices like using SSH keys and securing credentials, organizations can optimize their remote command execution in Airflow.

Should you encounter any issues, our team is readily available to provide guidance and support with a prompt response. Please do not hesitate to reach out to us at any time [email protected]

Airflow Training:

Accentfuture delivers complete Apache Airflow learning via online training platforms which teach students to master airflow automation of intricate work procedures. The Airflow Online Training delivered by our platform includes basic and advanced airflow methodologies so you can handle workflow operations efficiently. Accentfuture teaches Airflow skills using practical workshops combined with qualified instructors who provide multiple learning opportunities and teach you the abilities needed to succeed in actual working environments.