Skip to content
View saivarunkotha's full-sized avatar

Block or report saivarunkotha

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
saivarunkotha/README.md

Hi, I'm Saivarun Kotha 👋

MS Data Science @ UMBC | Data Analyst · Data Engineer · ML Engineer
📍 Bellevue,WA  |  📫 LinkedIn


About me

I'm a data scientist with hands on experience building end to end ML pipelines, analyzing large scale datasets, and deploying predictive models that solve real business problems. Currently completing my Master's in Data Science at UMBC, with a focus on classification, prediction, and big data processing.

I enjoy turning messy, raw data into clear decisions whether that's through a well tuned model, a clean SQL query, or an interactive dashboard.


🛠 Tech stack

Languages Python   SQL   R

Machine Learning & AI scikit-learn   TensorFlow   PyTorch   XGBoost

Data Engineering & Big Data Apache Spark   Hadoop   MapReduce   ETL Pipelines

Visualization & BI Tableau   Power BI   Matplotlib   Seaborn   Plotly

Tools & Deployment Streamlit   Jupyter   Git   Pandas   NumPy


🚀 Featured projects

🔵 Fake News Detection Using SVM & NLP (Capstone)

UMBC Data Science Capstone — Built a classification model to predict outcomes from structured real world data. Applied feature engineering, model comparison (Logistic Regression, Random Forest, XGBoost), and deployed an interactive Streamlit app for live predictions.
Python scikit-learn XGBoost Streamlit Pandas
→ View project


🟠 Big Data Engineering

Big Data Processing with Spark & Hadoop — Designed and implemented MapReduce and Spark pipelines to process and analyze large-scale datasets. Demonstrated distributed computing fundamentals on real data workloads.
Apache Spark Hadoop MapReduce Python Jupyter
→ View project


🟢 Customer Churn Prediction

End-to-end ML project — Merged 6 data sources, engineered features, trained Logistic Regression, Random Forest and XGBoost models achieving AUC-ROC of 0.9903. Added SHAP explainability and deployed as a live Streamlit web app.
Python scikit-learn XGBoost SHAP Streamlit Pandas
→ View project  |  → Live Demo


📈 What I'm working on

  • Expanding SQL + Python analytics portfolio with business-focused EDA projects
  • Exploring FastAPI for ML model serving

📊 GitHub stats

![Saivarun's GitHub stats](Saivarun's GitHub stats


📫 Let's connect

If you're hiring for data analyst, data engineer, or ML engineer roles I'd love to connect.

LinkedIn GitHub

Popular repositories Loading

  1. 603-Platforms-for-Bigdata-Processing 603-Platforms-for-Bigdata-Processing Public

    MapReduce and Apache Spark pipelines for large-scale data processing. UMBC Big Data Platforms course.

    Jupyter Notebook

  2. UMBC-DATA606-Capstone UMBC-DATA606-Capstone Public

    NLP-based fake news detection using SVM and TF-IDF, deployed as a Streamlit web app. UMBC Data Science Capstone.

    Jupyter Notebook

  3. Excel_project Excel_project Public

    Exploratory data analysis and business insights using Excel and Python. Includes data cleaning, pivot tables, and visualizations.

  4. saivarunkotha saivarunkotha Public

  5. customer-churn-prediction customer-churn-prediction Public

    End-to-end customer churn prediction using ML EDA, feature engineering, XGBoost, and Streamlit app

    Jupyter Notebook