import pandas as pd
from snowflake.snowpark import Session, Window
import snowflake.snowpark.functions as F
import json
import warnings
warnings.filterwarnings('ignore')Overview
Companies use data warehouses or data lakes for centralized data storage, consistency, data quality, scalability and easy access. Business Intelligence (BI) solutions in conjunction with data warehouses are used to make more informed, data-driven decisions by means of dahsboards for stakeholders.
In this fashion, a dashboard is created using a BI application, connected to a data warehouse with the aim to be consumed by end users for their business activities.
It is an open secret, nonetheless, that staff steadily use spreadsheets to store information and manipulate data sets coming from data warehouses, other information systems and dashboards.
In a similar fashion, staff steadily use slide presentations to showcase insights and reports to managers and other stake holders. This means there are countless presentations and data analyses stored in local Excel and PowerPoint files.
Beyond the Dashboard: Reporting with Slideshows
Dashboards can be used to gather and analyze data, while slideshows can be used to present the findings in a clear and concise manner. Besides, it enables you to use your existing data insights in the tools you’re most familiar with, without having to switch to more complex ones.
You can just do your data analysis in Excel and then present it in PowerPoint. This provides you with just the flexibility you need for.
While a dashboard is a centralized section that displays your data visually typically by using a license BI tool (Tableau, Power BI), staff prefers to present data insights to potential customers or coworkers in Excel or PowerPoint by copying and pasting charts and tables from dashboards.
Indeed, BI tools like Tableau or Power BI offer options to download data to Excel or csv files, PowerPoint and images.
Automating Spreadsheet Data with Python
I propose the process of creating an ETL from the data warehouse to a spreadsheet using Python, and synchronizing tables and charts from Excel to PowerPoint to get an automated reporting in a local file with the needs of the end user.
ETL (Extract, Transform, Load) is a data integration process that involves 03 main steps:
- Extract Phase. Retrieving data from a source system (in this case, a data warehouse).
- Transform Phase. Manipulating, cleaning, and aggregating the extracted data.
- Load Phase. Storing the transformed data into a target system (in this case, an Excel file).
Case Study
I’ll show an example using Snowflake as data warehouse, Python for ETL process, and Excel as destination. Finally, PowerPoint will present the data insights.
By foregoing BI tools, we can substantially reduce project expenses.
- Import libraries
- Read credentials
# Credentials
file = 'credentials.json'
# read file
with open(file) as f:
keys = json.load(f)- Connect to Data Warehouse using Snowpark
# Snowflake's Snowpark Connection
connection = {
"account": keys['account'],
"user": keys['user'],
"role": keys['role'],
"authenticator": keys['authenticator'],
"warehouse": keys['warehouse'],
"database": keys['database'],
"schema": keys['schema'],
}
def snowflake_connection():
try:
session = Session.builder.configs(connection).create()
print("Connection successful!")
except:
raise ValueError("Connection failed!")
return session
session = snowflake_connection()- Extract data using Snowpark
# Extract sales data
sales = conn.sql('''
SELECT
store_id,
SUM(sales_amount) AS total_sales
FROM
dm_sales
WHERE
YEAR(dm_sales) = YEAR(CURDATE())
GROUP BY
store_id
''')
# Extract stores data
stores = (
conn.table('dm_stores')
.select('location','id','responsible')
)
# Label stores region
stores = stores.with_column(
'region',
F.when(F.col('region_abv')=='AS', 'ASPAC')
.when(F.col('region_abv')=='LA', 'LATAM')
.when(F.col('region_abv')=='EU', 'EMEA')
.when(F.col('region_abv')=='NA', 'NA')
.otherwise('null')
)
# Combine tables
data = (sales.join(stores, sales.store_id == stores.id))
# Save to pandas
data = data.to_pandas()- Transform data using Pandas
# Calculate top 10 products
top_10 = data.nlargest(10, 'total_sales')- Load data using Pandas
# Write to Excel
top_10.to_excel('top_10_products.xlsx', index=False)From Excel to PowerPoint: Automating Report Creation
Now, you can customize your tables and charts in Excel with the data saved from Python.
To automate your customized tables and charts created in Excel onto PowerPoint, you just need to follow the next steps:
Finally, after customizing your slideshow, you will get a PowerPoint like the following:
Conclusions
The use of spreadsheets and slideshows in businesses is not going to disappear soon. Hence, even if BI tools like Tableau or Power BI are used in businesses, Excel and PowerPoint are going to be used by staff to reporting and presentations to coworkers and managers.
By following the above steps and leveraging the power of Python, you can efficiently extract, transform, and load data from your data warehouse into Excel for further analysis and insights that fulfills end user’s requirements.
Maybe is it time to recalculate the cost-benefit implications for companies to abandon expensive BI licenses in favor of flexible, cost-effective open-source solutions like Python.
References
- Business (2023) How to Design a Dashboard Presentation: A Step-by-Step Guide in slidemodel.com
- Karlson, P. (2022) Are Spreadsheets Secretly Running Your Business? in Forbes
- Moore J. (2024) But, Can I Export it to Excel? in Do Mo(o)re with Data
- Schwab, P. (2021) Excel dominates the business world.. and that’s not about to change in Into the Minds
Contact
Jesus LM
Economist & Data Scientist