This project analyzes Udemy course data using SQL to identify trends in course popularity, pricing patterns, and subscriber engagement. SQL queries were written to clean, transform, and analyze data to support data-driven insights.
- Source: Kaggle – Udemy Courses Dataset
- Public dataset used for educational and portfolio purposes
- Records include:
- Course title, subject, level
- Price (free vs paid)
- Number of subscribers, reviews, and lectures
- Course duration and published date
- SQL Server (SSMS)
- SQL concepts:
- Data cleaning & standardization
- CAST / CASE statements
- Common Table Expressions (CTEs)
- Window functions (
ROW_NUMBER) - Aggregations (
COUNT,SUM,AVG) - Grouping and filtering
- Standardized course levels (Beginner / Intermediate / Advanced)
- Converted numeric columns to correct data types
- Handled free vs paid course pricing
- Removed records with missing subscriber data
- Created a cleaned dataset separate from raw data
- How many courses are available on Udemy?
- Which subjects have the highest number of subscribers?
- Do paid courses attract more subscribers than free courses?
- Which course level performs best on average?
- What are the top 5 courses per subject by subscribers?
- Which subjects show the highest average engagement?
- Course popularity analysis by subject
- Free vs paid course comparison
- Subscriber engagement analysis
- Course ranking using window functions
- Data aggregation using GROUP BY
- Data cleaning and transformation using CASE and CAST
This query standardizes course levels, converts data types, handles missing values, and creates a cleaned dataset for reliable analysis.
WITH cleaned_courses AS (
SELECT
course_id,
course_title,
subject,
-- Standardize course levels
CASE
WHEN LOWER(level) LIKE 'beginner%' THEN 'Beginner'
WHEN LOWER(level) LIKE 'intermediate%' THEN 'Intermediate'
WHEN LOWER(level) LIKE 'advanced%' THEN 'Advanced'
ELSE 'Unknown'
END AS level_cleaned,
is_paid,
-- Convert price to numeric format
CASE
WHEN is_paid = 1 THEN CAST(price AS DECIMAL(10,2))
ELSE 0
END AS price_cleaned,
-- Convert numeric fields to integers
CAST(num_subscribers AS INT) AS num_subscribers_cleaned,
CAST(num_reviews AS INT) AS num_reviews_cleaned,
CAST(num_lectures AS INT) AS num_lectures_cleaned,
-- Convert timestamp to date
CAST(published_timestamp AS DATE) AS published_date
FROM Udemy_Courses
WHERE num_subscribers IS NOT NULL
)
SELECT *
INTO cleaned_courses_table
FROM cleaned_courses;This query identifies the most popular course subjects by calculating the number of courses and total subscribers for each subject.
SELECT
subject,
COUNT(*) AS course_count,
SUM(num_subscribers_cleaned) AS total_subscribers,
AVG(num_subscribers_cleaned) AS avg_subscribers
FROM cleaned_courses_table
GROUP BY subject
ORDER BY total_subscribers DESC;This query compares free and paid courses to understand pricing patterns and subscriber distribution.
SELECT
is_paid,
COUNT(*) AS course_count,
AVG(price_cleaned) AS avg_paid_price,
SUM(num_subscribers_cleaned) AS total_subscribers
FROM cleaned_courses_table
GROUP BY is_paid;This query ranks courses within each subject using a window function to identify the top-performing courses based on subscriber count.
WITH ranked_courses AS (
SELECT
subject,
course_title,
level_cleaned AS level,
price_cleaned AS price,
num_subscribers_cleaned AS num_subscribers,
ROW_NUMBER() OVER (
PARTITION BY subject
ORDER BY num_subscribers_cleaned DESC
) AS subject_rank
FROM cleaned_courses_table
)
SELECT *
FROM ranked_courses
WHERE subject_rank <= 5
ORDER BY subject, subject_rank;- udemy_sql_analysis.sql – SQL queries for cleaning and analysis
- README.md – Project documentation
- Import the dataset into SQL Server
- Execute the SQL script udemy_sql_analysis.sql
- Review query outputs for analysis results
- Beginner-level courses tend to attract higher average subscribers
- Technology-related subjects consistently dominate subscriber engagement
- Free courses drive higher enrollment volume, while paid courses create revenue opportunities
Suja
Data Analyst | SQL | Power BI | Data Visualization
Open to Remote Opportunities