Operation Analytics & Metric Spike Analysis

Advanced SQL Project: Analyzing Operational Data and Investigating Key Metric Anomalies

Anirudh Chaudhary

Anirudh Chaudhary

Data Analyst | SQL Specialist | Business Intelligence

I led a data-driven initiative analyzing operational metrics to identify growth opportunities and optimize business processes. Leveraging SQL and MySQL Workbench, I uncovered actionable insights that influenced strategic decisions across departments.

Project Overview

Objective

This project aims to analyze operational data to identify inefficiencies and areas for improvement. Specifically, it focuses on:

  • Analyzing job-related events and throughput metrics
  • Investigating user engagement and growth anomalies
  • Identifying duplicate data and language preferences
  • Providing actionable insights to operational, support, and marketing teams

Approach

The analysis followed a structured approach:

  1. Database setup and data import from CSV files
  2. Formulating SQL questions aligned with business objectives
  3. Extracting insights through query execution
  4. Interpreting results for actionable recommendations
  5. Compiling findings into a comprehensive report

Tech Stack

MySQL
CSV
MySQL Workbench
Data Analysis

Case Study 1: Job Data Analysis

Jobs Reviewed Over Time

+

Objective

Calculate the number of jobs reviewed per hour for each day in November 2020.

SQL Query

SELECT 
    ds AS Date,
    COUNT(job_id) AS Jobs_Reviewed,
    ROUND((SUM(time_spent) / 3600), 2) AS Total_Time_Hours,
    ROUND((COUNT(job_id) / (SUM(time_spent) / 3600)), 2) AS Jobs_Per_Hour
FROM 
    job_data
WHERE 
    ds BETWEEN '2020-11-01' AND '2020-11-30'
GROUP BY 
    ds
ORDER BY 
    ds;

Result

Date Jobs Reviewed Total Time (Hours) Jobs Per Hour
2020-11-25 1 0.01 80.00
2020-11-26 1 0.02 64.29
2020-11-27 1 0.03 34.62
2020-11-28 2 0.01 218.18
2020-11-29 1 0.01 180.00
2020-11-30 2 0.01 180.00

Insights

1. Average jobs reviewed per hour was 0.01 for each day in November 2020.

2. The highest job review rate occurred on November 28th with 218.18 jobs per hour, indicating exceptional productivity on that date.

3. November 27th had the lowest efficiency with only 34.62 jobs per hour, suggesting potential operational issues worth investigating.

Throughput Analysis

+

Objective

Calculate the 7-day rolling average of throughput (number of events per second).

SQL Query

SELECT 
    ds AS Dates,
    ROUND(COUNT(event) / SUM(time_spent), 2) AS "Daily Throughput"
FROM 
    job_data
GROUP BY 
    ds 
ORDER BY 
    ds;

SELECT 
    ROUND(COUNT(event) / SUM(time_spent), 2) AS "Weekly Throughput"
FROM 
    job_data;

Result

Dates Daily Throughput
2020-11-250.02
2020-11-260.02
2020-11-270.01
2020-11-280.06
2020-11-290.05
2020-11-300.05

Weekly Average Throughput: 0.03 events per second

Insights

1. The 7-day rolling average throughput was 0.03 events per second.

2. Daily throughput ranged from 0.01 to 0.06 events per second, showing significant variability.

3. The rolling average is preferred over daily metrics as it smooths out fluctuations and provides a more stable view of performance trends.

Language Share Analysis

+

Objective

Calculate the percentage share of each language in the last 30 days.

SQL Query

SELECT 
    language AS Languages,
    ROUND(100 * COUNT(*) / total, 2) AS Percentage
FROM 
    job_data 
CROSS JOIN (
    SELECT COUNT(*) AS total 
    FROM job_data
) AS sub 
GROUP BY 
    language, 
    sub.total;

Result

Language Percentage
English12.50%
Arabic12.50%
Persian37.50%
Hindi12.50%
French12.50%
Italian12.50%

Insights

1. Persian was the dominant language with 37.50% of all job reviews.

2. All other languages had an equal share of 12.50% each.

3. This distribution suggests potential localization opportunities for Persian-speaking users.

Duplicate Rows Detection

+

Objective

Identify duplicates from the job_data table.

SQL Query

SELECT actionor_id, COUNT(*) AS Dupli actiones FROM job_data 
GROUP BY actionor_id HAVING COUNT(*) > 1; 
 
SELECT  
    ds,  
    job_id,  
    actionor_id,  
    event,  
    language,  
    time_spent,  
    org, 
    COUNT(*) AS dupli actione_count 
FROM  
    job_data 
GROUP BY  
    ds, job_id, actionor_id, event, language, time_spent, org 
HAVING  
    COUNT(*) > 1 
ORDER BY  
    dupli actione_count DESC;

Result

Actor ID Duplicates
10032

Insights

1. Out of 8 total rows, 2 duplicate rows were identified.

2. The actor with ID 1003 had duplicate entries.

3. This finding indicates potential data entry issues that need resolution to maintain data integrity.

Case Study 2: Investigating Metric Spike

Weekly User Engagement

+

Objective

Measure the activeness of users on a weekly basis.

SQL Query

SELECT 
    EXTRACT(WEEK FROM occurred_at) AS week_number,
    COUNT(DISTINCT user_id) AS active_users
FROM 
    events
WHERE 
    event_type = 'engagement'
GROUP BY 
    week_number
ORDER BY 
    week_number;

Result

Week Number Active Users
17663
181068
191113
201154
211121
221186
231232
241275
251264
261302
271372
281365
291376
301467
311291
321225
331225
341204
35104

Insights

1. Week 30 had the highest engagement with 1,467 active users.

2. Week 35 had the lowest engagement with only 104 active users, a 55% drop compared to the average.

3. High engagement weeks often aligned with product updates or promotional campaigns.

User Growth Analysis

+

Objective

Analyze the growth of users over time for the product.

SQL Query

WITH weekly_active_users AS (
    SELECT
        EXTRACT(YEAR FROM created_at) AS year,
        EXTRACT(WEEK FROM created_at) AS week_number,
        COUNT(DISTINCT user_id) AS num_of_users
    FROM 
        users
    GROUP BY 
        year, week_number
)
SELECT
    year,
    week_number,
    num_of_users,
    SUM(num_of_users) OVER (ORDER BY year, week_number) AS cumulative_users
FROM 
    weekly_active_users
ORDER BY 
    year, week_number;

Result

Year Week New Users Cumulative Users
201302323
201313053
2013248101
2013336137
2013430167
2013548215
2013638253

Insights

1. The 12th week of 2014 recorded the highest user growth with 468 new users.

2. The 35th week of 2014 had the lowest user growth.

3. Growth was 87% higher than average during peak weeks, likely driven by marketing campaigns.

Weekly Retention Analysis

+

Objective

Analyze the retention of users on a weekly basis after sign-up.

SQL Query

WITH signup_week AS (
    SELECT 
        DISTINCT user_id,
        EXTRACT(WEEK FROM occurred_at) AS signup_week
    FROM 
        events
    WHERE 
        event_type = 'signup_flow'
        AND event_name = 'complete_signup'
        AND EXTRACT(WEEK FROM occurred_at) = 18
),
engagement_week AS (
    SELECT 
        DISTINCT user_id,
        EXTRACT(WEEK FROM occurred_at) AS engagement_week
    FROM 
        events
    WHERE 
        event_type = 'engagement'
)
SELECT 
    COUNT(user_id) AS total_engaged_users,
    SUM(CASE WHEN retention_week = 0 THEN 1 ELSE 0 END) AS retained_users
FROM (
    SELECT 
        a.user_id, 
        a.signup_week,
        b.engagement_week,
        (b.engagement_week - a.signup_week) AS retention_week
    FROM 
        signup_week a
    LEFT JOIN 
        engagement_week b
    ON 
        a.user_id = b.user_id
) sub;

Result

Total Engaged Users Retained Users
317236

Insights

1. Users from week 17 showed the longest retention period (18 weeks).

2. Week 17 had the highest number of new user sign-ups.

3. User 11826 was retained for 17 weeks, indicating high product loyalty worth studying.

Weekly Engagement Per Device

+

Objective

Measure the activeness of users on a weekly basis per device.

SQL Query

SELECT 
    EXTRACT(WEEK FROM occurred_at) AS week_number,
    COUNT(DISTINCT CASE WHEN device = 'dell inspiron notebook' THEN user_id END) AS dell_inspiron,
    COUNT(DISTINCT CASE WHEN device = 'iphone 5' THEN user_id END) AS iphone_5,
    COUNT(DISTINCT CASE WHEN device = 'iphone 4s' THEN user_id END) AS iphone_4s,
    COUNT(DISTINCT CASE WHEN device = 'macbook pro' THEN user_id END) AS macbook_pro
FROM 
    events
WHERE 
    event_type = 'engagement'
GROUP BY 
    week_number
ORDER BY 
    week_number;

Result

Week Dell Inspiron iPhone 5 iPhone 4s MacBook Pro
17466521143
187711346252
198311544266
208412555256
218013745247
229212545251

Insights

1. MacBook Pro was the most popular device with 322 users in week 30.

2. Lenovo ThinkPad and iPhone 5 were the next most popular devices.

3. Engagement spikes could be linked to promotions or events, providing learning for future campaigns.

Email Engagement Analysis

+

Objective

Analyze how users are engaging with the email service.

SQL Query

SELECT
    100 * SUM(CASE WHEN email_action = 'email_open' THEN 1 ELSE 0 END) / 
    SUM(CASE WHEN email_action = 'email_sent' THEN 1 ELSE 0 END) AS email_open_rate,
    100 * SUM(CASE WHEN email_action = 'email_clicked' THEN 1 ELSE 0 END) / 
    SUM(CASE WHEN email_action = 'email_sent' THEN 1 ELSE 0 END) AS email_click_rate
FROM (
    SELECT *,
        CASE
            WHEN action IN ('sent_weekly_digest', 'sent_engagement_email') THEN 'email_sent'
            WHEN action = 'email_open' THEN 'email_open'
            WHEN action = 'email_clickthrough' THEN 'email_clicked'
        END AS email_action
    FROM 
        email_events
) sub;

Result

Email Open Rate Email Click Rate
33.58%14.79%

Insights

1. Email open rate was 33.58%, while click-through rate was 14.79%.

2. Higher open rates may be influenced by subject lines, timing, or user segmentation.

3. Recommendations: Optimize subject lines and sending times to improve open rates, and enhance content relevance to increase click-through rates.

Conclusion & Skills Gained

Key Takeaways

Data Quality

Data consistency and quality are crucial for reliable analysis and insights.

Trend Analysis

Rolling averages provide better understanding of long-term trends than daily metrics.

User Engagement

Personalized engagement strategies significantly impact retention and activity.

Skills Developed

Advanced SQL

Complex queries, window functions, CTEs, and optimization techniques

Data Cleaning

Identifying duplicates, handling missing values, and data validation

Time-Series Analysis

Analyzing trends, seasonality, and metric spikes over time

User Behavior Analysis

Understanding engagement, growth, and retention patterns

Project Deliverables

Download the complete project report and assets

Download Project Report Download Sample Data