Sumit Singh

Data Engineer.

Building scalable data pipelines and ETL systems.

Working with Python, SQL, Spark, and Cloud Data Platforms.

Technical Skills

PYTHONSQLSPARKPYSPARKDATABRICKSAWSMONGO DBAIRFLOWKAFKASNOWFLAKEDATA MODELLING

About Me

I’m a data engineering enthusiast with a strong foundation in Python and SQL, passionate about building scalable data pipelines and modern data platforms.

I work with tools like Apache Spark, Apache Airflow, Delta Lake, and dbt to design reliable ETL workflows. I also have experience building cloud-based data solutions on Amazon Web Services using services like Amazon S3 and AWS Glue.

I enjoy transforming raw data into structured, analytics-ready datasets that drive insights.

Projects

A showcase of data platforms and pipelines built for scale and reliability.

Data Lakehouse Pipeline

Built a scalable data lakehouse pipeline using Apache Spark and Delta Lake to ingest and transform batch datasets stored in AWS S3

Apache SparkAWS S3Delta LakePython

Automated ETL Pipeline

Designed a scheduled ETL pipeline orchestrated with Apache Airflow to extract data from APIs, perform transformations with Spark, and load curated datasets into a data warehouse for reporting..

PythonApache AirflowSparkSQL

Spotify ETL Pipeline

Built an ETL pipeline to extract data from Spotify API, perform transformations with Python, and load curated datasets into a AWS S3 bucket and visualized using PowerBI.

PythonPowerBIAthenaAWS S3

Cloud Data Platform

Built a cloud-based analytics pipeline on AWS using S3 for storage, Spark for processing, and Delta Lake for transactional data management, enabling efficient batch analytics workflows.

SparkAWS S3Delta LakePython

Let's Connect

I'm currently open for new opportunities. Whether you have a question or just want to chat about data engineering, I'll try my best to get back to you!