Visualizing SQL data


I worked with data analysis using Panda, matplotlib and more during my studies at UW, but subsequently focused on cloud infrastructure and securing resources. Building out my database and SQL skills, I’m coming full circle to using visualization to better understand data- this is helpful in all sorts of ways!

Here are some links to resources that are helpfuL

In this exercise, I imported sqlite and pandas and then used sakila database located within same folder:

import sqlite3
import pandas as pd

con = sqlite3.connect('sakila.db')

def sql_to_df(sql_query):
    df = pd.read_sql(sql_query, con)
    return df

and now a SQL query:

query = '''
    SELECT *
    FROM sqlite_master
    WHERE type = 'table';
'''

tables = sql_to_df(query)
tables

result:

Let’s try another query:

query = '''
    SELECT first_name, last_name
    FROM customer
'''

customer_names = sql_to_df(query)
customer_names

result:

A little more complicated query:

 query = '''
    SELECT *
    FROM film
    WHERE description
    LIKE '%Pastry%'
'''

pastry_films = sql_to_df(query)
pastry_films

Let’s drill down:

query = '''
    SELECT
        COUNT(title) AS Count,
        rating
    FROM film
    WHERE description
    LIKE '%Pastry%'
    GROUP BY rating
    ORDER BY Count DESC;
'''
pastry_films_by_rating = sql_to_df(query)
pastry_films_by_rating

Additional Work

import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

con = sqlite3.connect('sakila.db')

def sql_to_df(sql_query):
    df = pd.read_sql(sql_query, con)
    return df
query = '''
    SELECT
        strftime('%Y-%m', payment_date) AS Date, ROUND(SUM(amount), 0) AS Sales
    FROM payment
    GROUP BY Date
    ORDER BY Date ASC;
'''

Now let’s run that:

Let’s use matplotlib functionality to visualize the data:

Let’s limit the data being provided to just four months:

Matplotlib and subplots

Time for some additional functionality! We’ll use this interesting library to better understand the data:

Here I used a 10×5 subplot. The for loop is doing something interesting. The for loop in our code loops over the sales_per_month DataFrame. For each item, it creates an annotation on the plot of the amount of sales per month, at the location specified by the data and the sales amount for that month. 

I’m going to include the following code simply as a personal reference:

query = '''
    SELECT
        cat.name category_name,
        sum( IFNULL(pay.amount, 0) ) revenue
    FROM category cat
    LEFT JOIN film_category flm_cat
    ON cat.category_id = flm_cat.category_id
    LEFT JOIN film fil
    ON flm_cat.film_id = fil.film_id
    LEFT JOIN inventory inv
    ON fil.film_id = inv.film_id
    LEFT JOIN rental ren
    ON inv.inventory_id = ren.inventory_id
    LEFT JOIN payment pay
    ON ren.rental_id = pay.rental_id
    GROUP BY cat.name
    ORDER BY revenue DESC
    limit 5;
'''

categories_by_gross = sql_to_df(query)
categories_by_gross

fig, ax = plt.subplots(figsize=(10, 5))

ypos = np.arange(len(categories_by_gross["revenue"]))
bars = ax.bar(ypos, categories_by_gross["revenue"].round(3), width=0.50)
ax.set_xticks(ypos)
ax.set_xticklabels(categories_by_gross["category_name"])
ax.set_ylim(ymin=3000, ymax=6000)
ax.set_title("gross by category", fontsize=14)
ax.set_ylabel("gross sales", fontsize=12)

for bar in bars:  # add data labels
    height = bar.get_height()
    ax.annotate(f"{height}",
                xy=(bar.get_x() + bar.get_width() / 2, height),
                xytext=(0, 3),  # 3 points vertical offset
                textcoords="offset points",
                ha="center", va="bottom",
                fontweight="semibold")

plt.show()

Here’s the result

And a pie chart:

explode = np.zeros(len(categories_by_gross["category_name"]))
explode[0] = 0.1
print(explode)

fig, ax = plt.subplots()
ax.pie(categories_by_gross["revenue"].round(3), explode=explode, labels=categories_by_gross["category_name"], 
       autopct='%1.1f%%', shadow=True, startangle=90)
ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

Another query:

query = '''
    SELECT
        COUNT(title) AS Count,
        rating AS Rating
    FROM film
    WHERE description
    LIKE '%Pastry%'
    GROUP BY rating
    ORDER BY Count DESC;
'''

df = sql_to_df(query)
df.set_index('Rating', inplace=True)

num_adult_pastry = df.loc['NC-17', 'Count']
total = df['Count'].sum()
labels = ['all other', 'adult pastry']
nums = np.array([total, num_adult_pastry])
nums

explode = [0, 0.2]

fig, ax = plt.subplots()
ax.pie(nums, labels=labels, explode=explode, shadow=True)
ax.axis('equal')

plt.show()

And the result:


Leave a comment