Recent Talks

Data Pipeline Patterns from ETL to ML to LLM Applications

Event: Microsoft India Development Center Hyderabad
Date: December 2023
Co-presenter: Vikram Koka, SVP of Engineering at Astronomer

Abstract: As the excitement around Large Language Models (LLMs) grows and initial LLM applications are being deployed, there’s a growing desire to understand how to operationalize LLM applications. The data pipelines required for LLM applications can build on prior experience with data pipelines using Apache Airflow. This talk focuses on applicable data pipeline patterns and leverages AskAstro, a real-life open-source example currently deployed on Apache Airflow Slack.

Learning Outcomes:

  • Data practitioners will learn how to extend ETL pipeline knowledge to build ML and LLM pipelines for production
  • Non-practitioners will gain high-level knowledge of LLM applications and deployment challenges
  • Understanding of how to leverage existing data teams for AI applications

📊 Presentation


Love for Writing Deferrable Operators: Why and How to Defer

Event: Airflow Summit 2022
Date: August 2022

Problem Statement: Have you faced a scenario where 100 worker slots are available to run tasks, but you have 100 DAGs waiting on a Sensor that’s currently running but idle? This leaves your entire Airflow cluster essentially idle while tasks wait.

Solution: Deferrable Operators provide an elegant solution to this problem using async operators and triggers.

Topics Covered:

  • Introduction to deferrable operators and triggers
  • Real-world use cases and implementation strategies
  • Writing custom deferrable operators
  • Deferrable S3 Operator example
  • Advantages over Smart Sensors and reschedule mode
  • Implementation guide using astronomer-providers repository
  • Concurrency concepts and Python asyncio

🎥 Video 1 🎥 Video 2 📊 Presentation