Skip to content
View Sam-24-dev's full-sized avatar

Block or report Sam-24-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sam-24-dev/README.md

👋 About Me

I don't just analyze data. I build the systems that make analysis possible.

Junior Data Engineer & Analyst | 7th Semester — Computer Engineering, ESPOL, Ecuador

I am an engineer focused on the complete data lifecycle: from designing ETL/ELT pipelines and data quality frameworks to performing deep exploratory data analysis (EDA) and deploying Machine Learning models in production-ready environments.

My approach combines data engineering (pipeline automation, query optimization, CI/CD) with business intelligence (stakeholder reporting, KPI dashboards). This allows me to not only build robust data infrastructure but also extract actionable insights that drive data-driven decisions.

💼 What I Bring to the Table:

  • Data Pipeline Automation: End-to-end architectures with strict data validation gates (Pandera) and 133+ automated tests.
  • Database Engineering: Implementing 3NF design and query optimization, achieving 40% performance boosts via indexing strategies.
  • Machine Learning: Building predictive models for dynamic pricing on real-world datasets of 1.2M+ records.
  • Business Intelligence: Uncovering $16K+ financial gaps and delivering clear, interactive reporting for stakeholders.

🔭 Current Focus

📘 Certification Track PL-300: Microsoft Power BI Data Analyst — Strengthening advanced modeling, DAX, and business storytelling for decision-focused dashboards.
☁️ Learning Path Cloud + dbt — Building stronger foundations in modern data stack practices, transformation workflows, and analytics engineering standards.
🧩 Career Optimization Portfolio optimization for job applications — refining project narratives, measurable impact, and recruiter-facing positioning for Junior Data Engineer / Data Analyst opportunities.

✅ Recently Completed

🎮 eSports Analytics Dashboard LATAM Completed end-to-end analytical product: MySQL → Python ETL → validated JSON contracts → web dashboard, including ML player projections (2026), automated testing, and GitHub Pages deployment.
📊 Customer Profile Analytics (Power BI) Delivered a reproducible analytics workflow: raw marketing data → Python preprocessing notebook → validated clean CSV → executive Power BI dashboard (desktop + mobile) with business-oriented storytelling.

🌎 Spoken Languages

      
Actively preparing for C1 certification

🏆 Certifications & Awards

🎖️ Certification / Award 🏢 Issuer 📅 Status / Date 🔗 Link
📗 Microsoft Office Specialist: Excel Associate (Microsoft 365 Apps) Microsoft Issued: Mar 2026 📄 Credential
📊 Data Analyst Associate DataCamp Issued: Mar 2026 📄 Credential
🛠️ ETL y ELT en Python DataCamp Issued: Mar 2026 📄 Credential
🌍 Galactic Problem Solver — Global Nominee NASA Space Apps Challenge Oct 2025 📄 View
🤖 Desarrollo con IA: de 0 a Producción BIG school Issued: Mar 2026 📜 Credential
📊 Data-Driven Decision Specialist (Bootcamp) ESPOL & MINTEL Completed (Graduation: Apr 2026) ⭐ Top Project

📚 Currently Preparing

🎯 Certification 🏢 Issuer 📅 Target 🔗 Status
📈 PL-300: Power BI Data Analyst Microsoft Apr 2026 In progress
☁️ AWS Cloud Practitioner / Data-related path AWS 2026 In progress
🧱 dbt Fundamentals / Analytics Engineering dbt Labs 2026 In progress

🚀 Featured Project — Highlight

End-to-End Multi-Source Data Engineering Platform

Tracking real-time developer technology trends by orchestrating data from GitHub, StackOverflow, and Reddit into a unified analytics engine.

  • 🌐 Multi-Source ETL: Consolidates developer signals from GitHub, StackOverflow, and Reddit into a canonical pipeline.
  • 🛡️ Data Quality Gates: Enforces schema and validation rules with Pandera data contracts.
  • ⚡ Modern Analytics Engine: Uses DuckDB for trend computation, ranking, and lightweight analytical workloads.
  • ✅ Production Discipline: 133+ passing tests with automated CI/CD workflows and scheduled refreshes.
  • 📱 Delivery Layer: Serves insights to a Flutter Web dashboard with stable bridge outputs for frontend consumption.
 

🏗️ Platform Architecture

Data Pipeline Architecture - Technology Trend Analysis Platform

📁 Other Key Projects

End-to-End Data Engineering & Machine Learning Project

Simulating price optimization for ride-hailing apps using a data architecture with 1.2 Million records.

  • 🔧 ETL Architecture: Engineered an automated Python pipeline to ingest 1.2M+ raw records, using complex SQL JOINs to clean and consolidate a final dataset of ~600k verified trips in SQLite.
  • 🤖 Machine Learning: Trained a Random Forest Regressor to predict dynamic pricing (Baseline RMSE: $9.00).
  • 📊 Key Insight: Feature importance analysis revealed distance (>0.6) and surge_multiplier as the absolute dominant factors, proving granular weather data added unnecessary noise.
  • Tech Stack: Python, SQL, Pandas, Scikit-Learn, Plotly.

Award: Galactic Problem Solver (Global Nominee)

  • Innovation: Built a full-stack web app analyzing 10 years of NASA satellite data across 195+ countries with <2s response time on interactive maps.
  • Impact: Developed MVP in a 48-hour hackathon, integrating real-time APIs to predict global extreme weather probabilities.
  • Tech: Python (Flask), React, TypeScript, Leaflet, Plotly.
 

End-to-end Data Engineering for Agriculture

  • Result: Engineered a Python ETL pipeline (covered by 14 unit tests) that modeled a strategic turnaround, projecting an ROI improvement from -5.58% to +15% (+20.6 pts) and a +75% boost in productivity.
  • Architecture: Built a robust MySQL -> Python -> JSON pipeline feeding a 5-page interactive dashboard for operational tracking.
  • Tech: MySQL, Python, Pandas, Pytest, JS/Bootstrap.
 

Business Intelligence

  • Insight: Analyzed sales distribution across 23 active sellers ($28.4K avg), uncovering a critical $16.66K performance gap between top and bottom performers.
  • Impact: Identified "Meat" as the top revenue driver ($80.05K) and Tulsa as the premier market (20 top clients), delivering actionable KPIs for data-driven decisions.
  • Tech: Power BI, DAX, Excel.

Scientific Research & Data Modeling

  • Validation: Built an automated R pipeline to validate a Negative Binomial Distribution model (k=3, p=0.3) on 309 observations, achieving a statistically significant p-value of 0.660.
  • Impact: Tracked a mean serve time of 1.945s (<2s threshold) and exported JSON/PNG assets into a dynamic JS web dashboard.
  • Tech: R (Tidyverse, ggplot2), HTML/CSS/JS.
 

🛠️ Technical Stack

Category Technologies
💻 Languages Python R SQL TypeScript Dart
⚙️ Data Engineering & DBs DuckDB MySQL SQLite Pandas Jupyter
🤖 Machine Learning Scikit-Learn
🧪 Testing & Quality Pytest Pandera
📊 Visualization & BI Power BI Tableau Plotly Excel
🌐 Web & Mobile React Flutter Flask Tailwind CSS Vite Bootstrap Leaflet
🚀 DevOps & Cloud GitHub Actions Vercel Git
📚 Learning AWS dbt

📊 GitHub Stats


⏱️ Weekly Coding Activity

Real-time stats powered by WakaTime — tracking every line of code I write.


WakaTime Stats

📈 Contribution Trend

---

🐍 Contribution Snake

github contribution grid snake animation

🤝 Let's Connect!

I'm a 7th-semester Computer Engineering student at ESPOL actively looking for Junior Data Engineer or Data Analyst roles where I can contribute from day one.

Profile Views

Pinned Loading

  1. Technology-trend-analysis-platform Technology-trend-analysis-platform Public

    Data intelligence platform for technology trends across GitHub, StackOverflow, and Reddit using Python ETL, Pandera quality gates, DuckDB trend engine, and Flutter Web.

    Dart

  2. Analisis-Ping-Pong Analisis-Ping-Pong Public

    Automated statistical analysis pipeline using R to model ping pong serve precision with Negative Binomial distribution (309 observations). Includes interactive web dashboard.

    HTML 1

  3. Analisis-Cultivo-Arroz Analisis-Cultivo-Arroz Public

    End-to-end data engineering platform for agricultural analytics. ETL pipeline (Python) + Interactive dashboard (Chart.js) with KPIs, financial analysis, and strategic insights.

    HTML

  4. easyparker-pwa easyparker-pwa Public

    EasyParker es una PWA para reservar parqueo en Guayaquil | Modos: Conductor y Anfitrión | Chat tiempo real | Eventos con surge pricing | Calificaciones etc| React + TypeScript + Tailwind

    TypeScript

  5. eSports-Analytics-Dashboard eSports-Analytics-Dashboard Public

    Dashboard analítico end-to-end para eSports LATAM con ETL en Python, validación de datos, visualización web y proyección ML 2026.

    Python

  6. RideFare-ETL-Pipeline RideFare-ETL-Pipeline Public

    Jupyter Notebook