Code
import polars as pl
import duckdb as db
import globAn Overview
Jesus LM
Jan, 2025
This project explores how to leverage the strengths of DuckDB and MotherDuck to build a robust data processing and storage solution. DuckDB excels at fast in-memory analytics, while MotherDuck provides a scalable and cost-effective cloud data warehouse. By combining these technologies, you can achieve optimal performance for both local and cloud-based data operations.
MotherDuck is serverless, easy-to-use, cloud-based platform, based on the DuckDB query engine.
MotherDuck makes it easy to connect to your data, explore it, and ask questions about it.
It is a great option for businesses of all sizes who need to do basic data analysis.
Known for its speed and ease of use, and provides a managed service in the cloud. This means you don’t have to worry about setting up and maintaining your own DuckDB instance.
MotherDuck can function as a data warehouse, allowing you to store and analyze large datasets. It can also be used as a query engine for data lakes, which are central repositories for storing various data formats.
One of MotherDuck’s strengths is collaboration. You can share snapshots of your databases with colleagues, enabling everyone to work from the same data view. This is useful for data exploration and building machine learning models.
MotherDuck can handle data residing in the cloud (on MotherDuck itself) and data stored locally on your device. It intelligently figures out the most efficient way to run your queries, considering where the data resides.
Just like DuckDB, MotherDuck is known for being user-friendly. This makes it accessible to data analysts who may not be experts in complex database systems.
[(0, './datasets/watercollection.csv'),
(1, './datasets/ContainsNull.csv'),
(2, './datasets/sales_info.csv'),
(3, './datasets/cdmx-subway.csv'),
(4, './datasets/airports.csv'),
(5, './datasets/colors.csv'),
(6, './datasets/sets.csv'),
(7, './datasets/appl_stock.csv'),
(8, './datasets/sales.csv')]
┌──────────────┐
│ name │
│ varchar │
├──────────────┤
│ retail_sales │
└──────────────┘
┌─────────────┐
│ name │
│ varchar │
├─────────────┤
│ restaurants │
└─────────────┘
┌───────────────┐
│ database_name │
│ varchar │
├───────────────┤
│ my_database │
└───────────────┘
┌──────────────────┐
│ name │
│ varchar │
├──────────────────┤
│ airports │
│ appl_stock │
│ cdmx_subway │
│ colors │
│ contains_null │
│ people │
│ prevalencia │
│ restaurants │
│ retail_sales │
│ sales │
│ sales_info │
│ sets │
│ water_collection │
├──────────────────┤
│ 13 rows │
└──────────────────┘
| name | rating_count | cost | city | cuisine | rating |
|---|---|---|---|---|---|
| str | i64 | f64 | str | str | i64 |
| "The Golden Wok" | 1477 | 33.620488 | "Berlin" | "American" | 5 |
| "Greek Gyros" | 770 | 68.388874 | "New York" | "French" | 1 |
| "Taste of Italy" | 4420 | 88.23168 | "Amsterdam" | "Chinese" | 0 |
| "Midnight Diner" | 2155 | 12.965985 | "Lisbon" | "Mexican" | 1 |
| "Taste of Italy" | 3375 | 52.785226 | "Sydney" | "Chinese" | 1 |
airports_pl = conn.sql('select * from airports').pl()
appl_stock_pl = conn.sql('select * from appl_stock').pl()
cdmx_subway_pl = conn.sql('select * from cdmx_subway').pl()
colors_pl = conn.sql('select * from colors').pl()
contains_null_pl = conn.sql('select * from contains_null').pl()
people_pl = conn.sql('select * from people').pl()
prevalencia_pl = conn.sql('select * from prevalencia').pl()
restaurants_pl = conn.sql('select * from restaurants').pl()
retail_sales_pl = conn.sql('select * from retail_sales').pl()
sales_pl = conn.sql('select * from sales').pl()
sales_info_pl = conn.sql('select * from sales_info').pl()
sets_pl = conn.sql('select * from sets').pl()
water_collection_pl = conn.sql('select * from water_collection').pl()┌──────────────────┐
│ name │
│ varchar │
├──────────────────┤
│ airports │
│ appl_stock │
│ cdmx_subway │
│ colors │
│ contains_null │
│ people │
│ prevalencia │
│ restaurants │
│ retail_sales │
│ sales │
│ sales_info │
│ sets │
│ water_collection │
├──────────────────┤
│ 13 rows │
└──────────────────┘
By combining DuckDB’s in-memory processing capabilities with MotherDuck’s cloud-based data warehousing, you can create a powerful and flexible data processing and storage solution. This approach allows you to efficiently handle both local and cloud-based data operations, optimize performance, and scale your data infrastructure as needed.
MotherDuck is a cloud service that makes working with DuckDB, a fast and easy relational database, even easier.
MotherDuck provides a managed DuckDB experience, so you don’t have to worry about setup or maintenance.
You can use it for data warehousing, analytics, and data exploration in collaboration with others.
Even if your data is spread between the cloud and your device, MotherDuck can handle it.
Jesus LM
Economist & Data Scientist