Stop Building Dashboards

Abstract

In today’s data-driven world, Business Intelligence (BI) tools promise powerful insights and streamlined reporting. Yet, the humble spreadsheet and slideshow persist in the white-collar world. While BI tools manages complex analysis and visualization, spreadsheets and slideshows offer unique advantages that keep them firmly entrenched in our workflows.

Overview

Some reasons why spreadsheets and slideshows persist in the office workflow, include:

Familiarity and Ease of Use
Flexibility and Control
Storytelling and Communication
Collaboration and Sharing
Ad-hoc Analysis and Exploration
Cost and Accessibility

While BI tools may be transforming the way we analyze data, it’s clear that spreadsheets and slideshows aren’t going anywhere anytime soon. They serve a different purpose, filling a gap that BI tools often miss.

Building a Python-Powered Data Pipeline

Business Intelligence (BI) tools are powerful, but they can also be expensive and complex. What if you could build a custom, flexible, and potentially more cost-effective solution using Python?

This post explores how you can leverage Python to collect, transform, and deliver data to Spreadsheets and Slides for compelling presentations. Why Python?

Python has become a powerhouse in data science and automation. Its rich ecosystem of libraries makes it ideal for data manipulation, and seamless slides interaction. This combination offers a powerful alternative to traditional BI tools for certain use cases.

In this post we shall propose the use of Python (to collect, cleanse and transform data), Google Spreadsheets (to store transformed data) and Google Slides (to showcase visualizations). Proposed Workflow

Imagine you need to generate a weekly sales report and all you have to do is to run the next command:

%%bash
jupyter-execute ./projects/weekly-report.ipynb

And, voila! you have your weekly report updated and ready to present in Google Slides.

Environment settings

Show code

# Import authenticator and gspread to manage g-sheets
from oauth2client.service_account import ServiceAccountCredentials
import gspread

# Import other libraries
import numpy as np
import pandas as pd
import polars as pl
import duckdb as db
import json
import warnings
warnings.filterwarnings('ignore')

Show code

# get token
filename = 'credentials.json'

# read json file
with open(filename) as f:
    keys = json.load(f)

# read credentials
token = keys['md_token']

Extract Phase

Show code

# connect to motherduck cloud
conn = db.connect(f'md:?motherduck_token={token}')

Show code

conn.sql('show databases')

┌───────────────────────┐
│     database_name     │
│        varchar        │
├───────────────────────┤
│ md_information_schema │
│ my_db                 │
│ my_portfolio          │
│ sample_data           │
└───────────────────────┘

Table 1: Databases

Show code

# select specific database
conn.sql('use my_portfolio')

Show code

# show tables in database
conn.sql('show tables')

┌──────────────────┐
│       name       │
│     varchar      │
├──────────────────┤
│ airports         │
│ appl_stock       │
│ cdmx_subway      │
│ colors           │
│ contains_null    │
│ houses           │
│ people           │
│ prevalencia      │
│ restaurants      │
│ retail_sales     │
│ sales            │
│ sales_info       │
│ sets             │
│ water_collection │
├──────────────────┤
│     14 rows      │
└──────────────────┘

Table 2: Tables in database

Show code

# dataset
dataset = conn.sql('select * from restaurants').df()

(
    dataset.head()
        .style
        .hide()    
        .format({'rating_count': '{:,.0f}', 'cost': '${:.2f}'})
)

name	rating_count	cost	city	cuisine	rating
The Golden Wok	1,477	$33.62	Berlin	American	5
Greek Gyros	770	$68.39	New York	French	1
Taste of Italy	4,420	$88.23	Amsterdam	Chinese	0
Midnight Diner	2,155	$12.97	Lisbon	Mexican	1
Taste of Italy	3,375	$52.79	Sydney	Chinese	1

Table 3: Dataset Preview

Transform Phase

Which restaurant chain has the maximum number of restaurants?

Show code

chains = (
    conn.sql('''
    select name, count(name) as no_of_chains
    from restaurants
    group by name
    order by no_of_chains DESC
    limit 10
    ''').df()
)
chains

	name	no_of_chains
0	The Burger Joint	721
1	Pizza Palace	703
2	Greek Gyros	696
3	Cafe Delight	692
4	French Delights	681
5	The BBQ Shack	671
6	The Golden Wok	667
7	Ocean Breeze	665
8	Spice & Bloom	665
9	Midnight Diner	657

Table 4: Data grouped by restaurant chains

Which restaurant chain has generated maximum revenue?

Show code

revenue = (
    conn.sql('''
    select name, sum(rating_count * cost) as revenue
    from restaurants
    group by name
    order by revenue DESC
    limit 10
    ''').df()
)

(
    revenue
        .style
        .hide()    
        .format({'revenue': '{:,.2f}'})
)

name	revenue
The Burger Joint	108,820,424.64
Pizza Palace	100,382,853.92
Cafe Delight	98,037,063.93
Greek Gyros	96,403,445.25
The BBQ Shack	96,286,414.02
Ocean Breeze	95,664,612.77
The Golden Wok	94,860,125.75
Spice & Bloom	91,824,854.56
French Delights	91,236,701.38
Midnight Diner	91,170,162.30

Table 5: Data grouped by restaurant and revenue

Which city has generated maximum revenue?

Show code

cities = (
    conn.sql('''
    select city, sum(rating_count * cost) as revenue
    from restaurants
    group by city
    order by revenue DESC
    limit 10
    ''').df()
)

(
    cities
        .style
        .hide()    
        .format({'revenue': '${:,.2f}'})
)

city	revenue
Amsterdam	$148,839,878.62
Tokyo	$148,035,421.32
Madrid	$141,487,618.97
Paris	$141,219,374.56
London	$140,876,613.54
Rome	$139,622,129.63
New York	$138,621,609.28
Lisbon	$136,814,247.27
Berlin	$136,434,163.25
Sydney	$131,656,513.97

Table 6: Data grouped by city and revenue

Load Phase

Show code

# Create scope to authenticate
SCOPES = ['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive']

# Read credentials
GOOGLE_SHEETS_KEY_FILE = 'arkham-538.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(GOOGLE_SHEETS_KEY_FILE, SCOPES)
gc = gspread.authorize(credentials)

Show code

import pytz
import datetime

tz = pytz.timezone('America/Mexico_City')
update = datetime.datetime.now(tz).strftime('%b %d, %Y')
period = update

Show code

def save_to_gsheets(df, sheet_name, worksheet_name, period):
    creds = ServiceAccountCredentials.from_json_keyfile_name(GOOGLE_SHEETS_KEY_FILE, SCOPES)
    client = gspread.authorize(creds)
    sheet = client.open(sheet_name)
    worksheet = sheet.worksheet(worksheet_name)

    # Convert datetimes to strings in advance
    for column in df.columns[df.dtypes == 'datetime64[ns]']:
        df[column] = df[column].astype(str)

    # Prepare data for batch update
    data = [df.columns.values.tolist()] + df.fillna('').values.tolist()

    # Freeze rows and update cell values with a single batch update
    worksheet.freeze(4)
    worksheet.update('A4:M', data)

    #fija fecha de consulta o actualizacion
    update_data = {
    'Last update': [
        period,]
    }

    # convert to dataframe
    update_data = pd.DataFrame(update_data, columns=['Last update'])

    worksheet.update([update_data.columns.values.tolist()] + update_data.fillna('').values.tolist(),'A1:A2',)

    print(f'DataFrame uploaded to: workbook: {sheet_name}, sheet: {worksheet_name}')

Show code

save_to_gsheets(dataset, 'restaurants', 'data', period)

DataFrame uploaded to: workbook: restaurants, sheet: data

Show code

save_to_gsheets(chains, 'restaurants', 'chains', period)

DataFrame uploaded to: workbook: restaurants, sheet: chains

Show code

save_to_gsheets(revenue, 'restaurants', 'revenue', period)

DataFrame uploaded to: workbook: restaurants, sheet: revenue

Show code

save_to_gsheets(cities, 'restaurants', 'cities', period)

DataFrame uploaded to: workbook: restaurants, sheet: cities

Close connection

Show code

# close connection
conn.close()

Retrieve data from gsheets

Show code

# Access worksheet id
df_id = '1JNAWb2QkFwh61v7QwEEVZnNhTPS0csbdMdll9y1csEg'
df_workbook = gc.open_by_key(df_id)
# Access data by worksheet sheet
df = df_workbook.worksheet('data')
# Save data to table
df = df.get_all_values()
# Save accessed data from google sheets to dataframe
df = pd.DataFrame(df[1:], columns=df[0])

Show code

df.head()

	Last update
0	Feb 24, 2025
1
2	name	rating_count	cost	city	cuisine	rating
3	The Golden Wok	1477	33.62048759	Berlin	American	5
4	Greek Gyros	770	68.38887409	New York	French	1

Table 7: Data Saved on Gogle Sheets

Google Sheets Report Data

Figure 2: Google Sheets Data for Presentation Report

Sync between Google Sheets and Google Slides

Simply we copy and paste with sync for each table and chart and customize our slides.

Figure 3: Synchronization between Google Sheets and Slides

Google Slides

You can see the final report on Google Slides

Conclusions

While BI tools are valuable, Python offers a compelling alternative for building custom data pipelines. By leveraging the power of Python using polars and duckdb libraries for data collection and transformation, and libraries like plotly for visualization you can create a flexible, cost-effective, and automated solution for delivering data to Google Spreadsheets, using gspread, and Google Slides for impactful presentations, by sync between these Google apps.

This approach empowers you to take control of your data and create highly tailored reporting solutions by replacing BI license costs.

References

Business (2023) How to Design a Dashboard Presentation: A Step-by-Step Guide in slidemodel.com
Karlson, P. (2022) Are Spreadsheets Secretly Running Your Business? in Forbes
Monroy, Jesus (2024) Why BI Tools Fall Short: PowerPoint and Excel Still Rule the Business World in Medium
Moore J. (2024) But, Can I Export it to Excel? in Do Mo(o)re with Data
Schwab, P. (2021) Excel dominates the business world… and that’s not about to change in Into the Minds

Contact

Jesus LM
Economist & Data Scientist

Medium | Linkedin | Twitter