Skip to content

jrlasak/databricks-code-practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databricks Code Practice

Get fluent in Databricks by typing, not watching.

104 exercises + 5 production-grade pipeline labs. All on Databricks Free Edition.

Clone once, import into Databricks, pick a folder. Exercises fail loud until your code is right; labs ship with synthetic data so you build production-style pipelines, not toy ones.

New (18 April 2026): 5 full-scale pipeline labs + 1 benchmark deep-dive just landed. If you starred this repo for the exercises, they're still here - now alongside end-to-end project work.


Author

Jakub Lasak - Databricks Data Engineer. Helping you interview like seniors, execute like seniors, and think like seniors.

Prepping for interviews? Writing code is one half of the battle - knowing the questions that actually come up is the other. I maintain Databricks Interview Cheat Sheets by seniority level (junior / mid / senior / bundle).

What's Inside

Fluency comes from reps, not reading. Three structured paths:

  • exercises/ - focused reps on a single concept. LeetCode-style, 5-30 min each.
  • pipeline-labs/ - end-to-end medallion pipelines on a business scenario. 2-3 hours each.
  • deep-dives/ - measure the impact of a technique with numbers. 1-2 hours each.
Exercises Pipeline Labs Deep-Dives
Format Single notebook, one TODO per exercise Multi-notebook guided project Single-topic deep investigation
Time 5-30 min per exercise 2-3 hours per lab 1-2 hours
Scope One concept (MERGE, window functions, ...) End-to-end project (ingestion -> bronze -> silver -> gold) One topic measured in depth
Narrative None. "Given table X, write..." Business scenario. "You're building a streaming pipeline for..." Benchmark-driven. "Apply technique, measure the delta."
Order Pick any, skip around Sequential notebooks that build on each other Sequential; each step layers on the last
Goal Drill a skill until it's automatic See how concepts fit in a real project Prove what a technique actually buys you

Catalog

Exercises (exercises/)

Topic Notebooks Exercises Description
Delta Lake 6 51 MERGE operations, time travel, schema enforcement, OPTIMIZE, liquid clustering, change data feed
ELT 7 53 Spark SQL joins, window functions, PySpark transformations, Auto Loader, batch ingestion, medallion architecture, complex data types

Total: 13 notebooks, 104 exercises

More exercise topics coming - next up: Streaming, Unity Catalog, Performance, and DLT.

Pipeline Labs (pipeline-labs/)

Multi-notebook, end-to-end medallion pipelines with a business scenario. Each runs 2-3 hours and ships with a synthetic data generator.

Lab What You Build Focus
Apparel Retail 360 (DLT) End-to-end retail analytics pipeline on Delta Live Tables with a full medallion architecture. DLT, Medallion, SCD Type 2, Streaming, Data Quality Expectations
Fintech Transaction Monitoring Real-time fraud-monitoring pipeline for a payment processor handling 500K+ transactions/day. Structured Streaming, Rescued Data, Watermarked Dedup, Stream-Static Joins, Liquid Clustering
DE Associate Certification Prep Production-grade pipeline covering every exam domain of the Databricks Data Engineer Associate cert. Auto Loader, COPY INTO, Medallion, SCD2, Jobs, Unity Catalog
PySpark Developer Cert Prep E-commerce analytics pipeline covering every domain of the Spark Developer Associate cert. DataFrame API, Structured Streaming, Data Skew, Performance Tuning

Deep-Dives (deep-dives/)

Single-topic labs that measure the impact of a technique with numbers, not intuition.

Lab What You Build Focus
6 Delta Optimization Techniques Iteratively apply and measure core Delta performance levers on a synthetic 50M-row dataset. Partitioning, Z-Order, OPTIMIZE, Auto Optimize, Liquid Clustering, VACUUM

How to Use

  1. Sign up for Databricks Free Edition (free, no credit card)
  2. Clone or import this repo into Databricks (Workspace -> Create -> Git folder)
  3. Navigate to the folder you want, open its README, follow the instructions

Everything runs on Free Edition: serverless compute, Unity Catalog, Delta Lake. No cloud account, no cluster config.

Which Should I Start With?

Stay in the Loop

New exercises and labs ship regularly. Follow on LinkedIn or subscribe to the Substack newsletter to be notified when new content drops.

Feedback

Found a bug? Have a suggestion? Open an issue.


Disclaimer: This is an independent educational resource created by Jakub Lasak. Not affiliated with, endorsed by, or sponsored by Databricks, Inc. "Databricks" and "Delta Lake" are trademarks of their respective owners.

About

Practice Databricks coding skills with hands-on exercises. Import into Databricks Free Edition, write code, run assertions, check pass/fail. Covers Delta Lake, Spark SQL, PySpark, Auto Loader, medallion architecture, window functions, and more.

Topics

Resources

Stars

Watchers

Forks