There is so many orchestrator these days for data team to choose, airflow, dagster, prefect, mage ai just to name a few …

This weekend, I decided to build a mini project to test out both tool.

The project is similar to before, extract sports news from different news outlet and send them automatically to discord via webhook.

png

Let’s jump right in.

Prerequisites

  • Python3.8+

Dagster

pip install dagster

To spin up a project

pip install -e ".[dev]"

dagster dev

I have refactored the folder strcuture a bit using poetry instead of its original config.

Dagster Folder Structure

.
├── README.md
├── app
├── dagster_discord_news
│   ├── __init__.py
│   ├── common.py
│   ├── helpers
│   ├── jobs.py
│   └── main.py
├── news_data
│   ├── bbcf1.txt
│   ├── bbcfootball.txt
│   └── officialf1.txt
├── poetry.lock
├── pyproject.toml
├── setup.cfg
├── setup.py
├── tests
│   └── test_assets.py
└── workspace.yaml

Dagster UI

png

png

Photo updated on Jan 2024.

Prefect

pip install prefect

Prefect provide a free tier prefect cloud for user, which we can all register

Then what we need to do is login using the command line and start an agent for the job

prefect cloud login

prefect agent start --pool "default-agent-pool"

Prefect Folder Structure

.
├── app
│   ├── common.py
│   ├── errors.py
│   ├── helpers
│   │   ├── bbcnews.py
│   │   ├── discordapi.py
│   │   └── f1news.py
│   ├── main.py
│   └── subflows.py
├── publish_officialf1_news-deployment.yaml
└── publish_sports_news_to_dc-deployment.yaml

Prefect UI

png

png

And here is the news from BBC Sports!

png

Conclusion

I found that Prefect is a bit easier to set up for my project this time, with their free tier prefect cloud account, everything just comes together easily. On the other hand, I like the asset concept from dagster, and I think their integration with dbt is better.

Both Prefect and Dagster have the majority of their community on Slack, which is different from Airflow, where users typically ask on forums like Stack Overflow for debugging. This makes googling and debugging a bit tricky.

However, both of their documentations are clear. So, for a mini-project this time, I can get it over the line pretty easily. But if it’s an enterprise-level project and considering only using the community version to deploy on our infrastructure, it may be less straightforward compared to Airflow.

That’s it for now.

Thank you for reading and have a nice day!

If you want to support my work,

Buy me a coffee

Honestly, if you made it this far you already made my day :)