Xcom In Airflow -
@task def extract() -> dict: return "user_id": 123, "name": "Alice" # pushed automatically
✅ or ensure upstream dependencies with >> . ❌ Using XComs for many small values across many tasks Each XCom is a DB row. 10 000 tasks × 5 XComs = 50 000 rows – fine. But 100 000 tasks × 10 XComs = 1 million rows – slow. Advanced: XCom Backends Airflow 2.0+ lets you store XComs outside the metadata DB. Useful if you need slightly larger values or lower DB load. xcom in airflow
Now go build DAGs that actually share information – cleanly and reliably. @task def extract() -> dict: return "user_id": 123,
Here, each mapped task gets its own XCom value, and aggregate receives a list of all results. ❌ Passing large data # BAD – will bloat metadata DB @task def bad_task(): return large_dataframe.to_dict() # can be MB/GB ✅ Better: Store data in S3/GCS and pass the path as an XCom. ❌ Pulling from a task that hasn’t run @task def step_one(): return 1 @task def step_two(x): # If step_one failed or was skipped, this will raise an error return x + 1 But 100 000 tasks × 10 XComs = 1 million rows – slow
push = PythonOperator(task_id='push_task', python_callable=push_function) pull = PythonOperator(task_id='pull_task', python_callable=pull_function)
XCom (short for cross‑communication ) is Airflow’s built‑in mechanism for exchanging small pieces of data between tasks. When used wisely, they unlock powerful patterns. When abused, they break your DAGs. Let’s see how to use them correctly. XComs are key‑value pairs stored in Airflow’s metadata database. A task can push an XCom (write a value under a key), and another task can pull that value (read it).
