Mage is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.
- Extract and synchronize data from 3rd party sources.
- Transform data with real-time and batch pipelines using Python, SQL, and R.
- Load data into your data warehouse or data lake using our pre-built connectors.
- Run, monitor, and orchestrate thousands of pipelines without losing sleep.
Plus hundreds of enterprise-class features, infrastructure innovations, and magical surprises.
For teams. Fully managed platform for integrating and transforming data. | Self-hosted. System to build, run, and manage data pipelines. |
Documentationย ย ย ๐ช๏ธย ย ย Get a 5 min overviewย ย ย ๐ย ย ย Play with live toolย ย ย ๐ฅย ย ย Get instant help
1๏ธโฃ ๐๏ธ
We designed an easy developer experience that youโll enjoy.
โ
2๏ธโฃ ๐ฎ
Get instant feedback from your code each time you run it.
โ
3๏ธโฃ ๐
Easy for a solo developer or large team to scale up and manage thousands of pipelines.
Mage is an open-source data pipeline tool for transforming and integrating data.
The recommended way to install the latest version of Mage is through Docker with the following command:
docker pull mageai/mageai:latest
You can also install Mage using pip or conda, though this may cause dependency issues without the proper environment.
pip install mage-ai
conda install -c conda-forge mage-ai
Looking for help? The fastest way to get started is by checking out our documentation here.
Looking for quick examples? Open a demo project right in your browser or check out our guides.
Build and run a data pipeline with our demo app.
WARNING
The live demo is public to everyone, please donโt save anything sensitive (e.g. passwords, secrets, etc).
Click the image to play video
- Load data from API, transform it, and export it to PostgreSQL
- Integrate Mage into an existing Airflow project
- Train model on Titanic dataset
- Set up dbt models and orchestrate dbt runs
๐ฎ Features
๐ถ | Orchestration | Schedule and manage data pipelines with observability. |
๐ | Notebook | Interactive Python, SQL, & R editor for coding data pipelines. |
๐๏ธ | Data integrations | Synchronize data from 3rd party sources to your internal destinations. |
๐ฐ | Streaming pipelines | Ingest and transform real-time data. |
โ | dbt | Build, run, and manage your dbt models with Mage. |
A sample data pipeline defined across 3 files โ
- Load data โ
@data_loader def load_csv_from_file() -> pl.DataFrame: return pl.read_csv('default_repo/titanic.csv')
- Transform data โ
@transformer def select_columns_from_df(df: pl.DataFrame, *args) -> pl.DataFrame: return df[['Age', 'Fare', 'Survived']]
- Export data โ
@data_exporter def export_titanic_data_to_disk(df: pl.DataFrame) -> None: df.to_csv('default_repo/titanic_transformed.csv')
What the data pipeline looks like in the UI โ
New? We recommend reading about blocks and learning from a hands-on tutorial.
๐๏ธ Core design principles
Every user experience and technical design decision adheres to these principles.
๐ป | Easy developer experience | Open-source engine that comes with a custom notebook UI for building data pipelines. |
๐ข | Engineering best practices built-in | Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts. |
๐ณ | Data is a first-class citizen | Designed from the ground up specifically for running data-intensive workflows. |
๐ช | Scaling is made simple | Analyze and process large data quickly for rapid iteration. |
๐ธ Core abstractions
These are the fundamental concepts that Mage uses to operate.
Project | Like a repository on GitHub; this is where you write all your code. |
Pipeline | Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code. |
Block | A file with code that can be executed independently or within a pipeline. |
Data product | Every block produces data after it's been executed. These are called data products in Mage. |
Trigger | A set of instructions that determine when or how a pipeline should run. |
Run | Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc. |
Add features and instantly improve the experience for everyone.
Check out the contributing guide to set up your development environment and start building.
Individually, weโre a mage.
๐ง Mage
Magic is indistinguishable from advanced technology. A mage is someone who uses magic (aka advanced technology). Together, weโre Magers!
๐งโโ๏ธ๐ง Magers (
/หmฤjษr/
)A group of mages who help each other realize their full potential! Letโs hang out and chat together โ
For real-time news, fun memes, data engineering topics, and more, join us on โ
GitHub | |
Slack |
Check out our FAQ page to find answers to some of our most asked questions.
See the LICENSE file for licensing information.