Show code
import numpy as np
import pandas as pd
import polars as pl
import duckdb as dbA valuable tool for data scientists working with Python due to its speed, ease of use, and tight integration.
Jesus LM
Mar, 2024
DuckDB excels in handling large datasets efficiently due to its columnar storage and vectorized execution engine. It is well-suited for scenarios where you need a fast and embedded database solution within your Python application. Here are some key things you can do with it:
Execute SQL Queries
Leverage In-memory and Persistent Databases
Work with Various Data Formats
Build Complex Queries Incrementally
Utilize DuckDB’s SQL Dialect
Load and manipulate data
Explore and analyze data
Prototype and experiment
You can create in-memory or persistent tables using SQL syntax from Python
Create table
Insert data
Retrieve data
Create table
Insert data
Retrieve data
Create table
Insert data
Retrieve data
Create table
Insert data
db.sql('''
INSERT INTO delivery
VALUES (1, 1, 1, '2005-10-21', 240),
(1, 1, 2, '2005-10-21', 48),
(1, 2, 3, '2005-10-22', 60),
(1, 4, 5, '2005-10-22', 4),
(2, 2, 3, '2005-10-23', 48),
(2, 2, 5, '2005-10-23', 2),
(2, 4, 1, '2005-10-24', 480),
(2, 4, 2, '2005-10-24', 72),
(3, 3, 3, '2005-10-24', 48),
(3, 3, 4, '2005-10-25', 20)
''')Retrieve data
| CodE | CodB | CodC | Delivery_date | Quantity | |
|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 2005-10-21 | 240 |
| 1 | 1 | 1 | 2 | 2005-10-21 | 48 |
| 2 | 1 | 2 | 3 | 2005-10-22 | 60 |
| 3 | 1 | 4 | 5 | 2005-10-22 | 4 |
| 4 | 2 | 2 | 3 | 2005-10-23 | 48 |
| 5 | 2 | 2 | 5 | 2005-10-23 | 2 |
| 6 | 2 | 4 | 1 | 2005-10-24 | 480 |
| 7 | 2 | 4 | 2 | 2005-10-24 | 72 |
| 8 | 3 | 3 | 3 | 2005-10-24 | 48 |
| 9 | 3 | 3 | 4 | 2005-10-25 | 20 |
Also, you can execute SQL queries within python environment
Obtain the name of the employees who delivered to the Stop bar during the week of October 17 to 23, 2005.
┌───────────────┬──────────┬─────────┐
│ Delivery_date │ Name │ Name │
│ date │ varchar │ varchar │
├───────────────┼──────────┼─────────┤
│ 2005-10-21 │ John Doe │ Stop │
│ 2005-10-21 │ John Doe │ Stop │
└───────────────┴──────────┴─────────┘
Obtain the CIF and name of the bars to which bottle-type beer with a capacity of less than 1 liter has been distributed, ordered by location.
┌─────────┬────────────┬─────────┬──────────┬──────────────┐
│ Cif │ Name │ Package │ Capacity │ Location │
│ varchar │ varchar │ varchar │ float │ varchar │
├─────────┼────────────┼─────────┼──────────┼──────────────┤
│ 333333Z │ Otra Ronda │ Botella │ 0.2 │ La Esponja │
│ 333333Z │ Otra Ronda │ Botella │ 0.33 │ La Esponja │
│ 111111X │ Stop │ Botella │ 0.2 │ Villa Botijo │
│ 111111X │ Stop │ Botella │ 0.33 │ Villa Botijo │
└─────────┴────────────┴─────────┴──────────┴──────────────┘
Obtain the deliveries (name of the bar, container and capacity of the drink, date and quantity) made by Prudencio Caminero.
┌────────────┬─────────┬──────────┬───────────────┬──────────┬──────────┬───────┐
│ Name │ Package │ Capacity │ Delivery_date │ Quantity │ Name │ CodE │
│ varchar │ varchar │ float │ date │ int32 │ varchar │ int32 │
├────────────┼─────────┼──────────┼───────────────┼──────────┼──────────┼───────┤
│ Stop │ Botella │ 0.2 │ 2005-10-21 │ 240 │ John Doe │ 1 │
│ Stop │ Botella │ 0.33 │ 2005-10-21 │ 48 │ John Doe │ 1 │
│ Las Vegas │ Lata │ 0.33 │ 2005-10-22 │ 60 │ John Doe │ 1 │
│ Otra Ronda │ Barril │ 60.0 │ 2005-10-22 │ 4 │ John Doe │ 1 │
└────────────┴─────────┴──────────┴───────────────┴──────────┴──────────┴───────┘
Obtain the bars to which bottle-type containers with a capacity of 0.2 or 0.33 have been distributed.
┌────────────┬─────────┬──────────┐
│ Name │ Package │ Capacity │
│ varchar │ varchar │ float │
├────────────┼─────────┼──────────┤
│ Stop │ Botella │ 0.2 │
│ Stop │ Botella │ 0.33 │
│ Otra Ronda │ Botella │ 0.2 │
│ Otra Ronda │ Botella │ 0.33 │
└────────────┴─────────┴──────────┘
Name of the employees who have distributed bottled beers to the “Stop” and “Las Vegas” bars.
┌──────────┬─────────┬─────────┐
│ Name │ Name │ Package │
│ varchar │ varchar │ varchar │
├──────────┼─────────┼─────────┤
│ John Doe │ Stop │ Botella │
│ John Doe │ Stop │ Botella │
└──────────┴─────────┴─────────┘
Obtain the name and number of trips that each employee has made outside of Villa Botijo.
┌──────────────┬─────────┐
│ Name │ Travles │
│ varchar │ int64 │
├──────────────┼─────────┤
│ John Doe │ 1 │
│ Tom Simpson │ 2 │
│ Vicent Meren │ 2 │
└──────────────┴─────────┘
Obtain the name and location of the bar that has purchased the most liters of beer.
┌─────────────┬──────────────┬────────┐
│ Name │ Location │ Liters │
│ varchar │ varchar │ int32 │
├─────────────┼──────────────┼────────┤
│ Otra Ronda │ La Esponja │ 480 │
│ Stop │ Villa Botijo │ 240 │
│ Las Vegas │ Villa Botijo │ 60 │
│ Club Social │ Las Ranas │ 48 │
└─────────────┴──────────────┴────────┘
Obtain bars that have purchased all types of beer with bottle packaging and a capacity less than 1 liter.
┌─────────────┬─────────┬──────────┐
│ Name │ Package │ Capacity │
│ varchar │ varchar │ float │
├─────────────┼─────────┼──────────┤
│ Stop │ Botella │ 0.2 │
│ Stop │ Botella │ 0.33 │
│ Las Vegas │ Lata │ 0.33 │
│ Las Vegas │ Lata │ 0.33 │
│ Otra Ronda │ Botella │ 0.2 │
│ Otra Ronda │ Botella │ 0.33 │
│ Club Social │ Lata │ 0.33 │
└─────────────┴─────────┴──────────┘
Raise the salary of the employee who has worked the most days by 5%.
┌───────┬──────────────┬──────────┐
│ CodE │ Name │ Salary │
│ int32 │ varchar │ float │
├───────┼──────────────┼──────────┤
│ 1 │ John Doe │ 120000.0 │
│ 2 │ Vicent Meren │ 110000.0 │
│ 3 │ Tom Simpson │ 100000.0 │
│ 1 │ John Doe │ 126000.0 │
│ 2 │ Vicent Meren │ 115500.0 │
└───────┴──────────────┴──────────┘
Insert a new distribution from the employee “Vicent Meren” to the “Stop” bar of 48 canned beers on 2005-10-26.
┌───────┬───────┬───────┬───────────────┬──────────┐
│ CodE │ CodB │ CodC │ Delivery_date │ Quantity │
│ int32 │ int32 │ int32 │ date │ int32 │
├───────┼───────┼───────┼───────────────┼──────────┤
│ 2 │ 1 │ 3 │ 2005-10-26 │ 48 │
└───────┴───────┴───────┴───────────────┴──────────┘
DuckDB emerges as a powerful asset for data analysis in Python.
Its in-memory processing engine delivers lightning-fast query execution, while the familiar SQL interface makes data manipulation a breeze.
The seamless integration with Python, including support for Pandas and Polars DataFrames and Python DuckB API compliance, further strengthens its appeal.
By incorporating DuckDB into your Python workflows, you can achieve significant performance gains and streamline your data analysis tasks, ultimately leading to more efficient and productive data exploration.
Jesus LM
Economist & Data Scientist