Sumit Singh
Data Engineer.
Building scalable data pipelines and ETL systems.
Working with Python, SQL, Spark, and Cloud Data Platforms.
Technical Skills
About Me
I’m a data engineering enthusiast with a strong foundation in Python and SQL, passionate about building scalable data pipelines and modern data platforms.
I work with tools like Apache Spark, Apache Airflow, Delta Lake, and dbt to design reliable ETL workflows. I also have experience building cloud-based data solutions on Amazon Web Services using services like Amazon S3 and AWS Glue.
I enjoy transforming raw data into structured, analytics-ready datasets that drive insights.
Projects
A showcase of data platforms and pipelines built for scale and reliability.
Data Lakehouse Pipeline
Built a scalable data lakehouse pipeline using Apache Spark and Delta Lake to ingest and transform batch datasets stored in AWS S3
Automated ETL Pipeline
Designed a scheduled ETL pipeline orchestrated with Apache Airflow to extract data from APIs, perform transformations with Spark, and load curated datasets into a data warehouse for reporting..
Spotify ETL Pipeline
Built an ETL pipeline to extract data from Spotify API, perform transformations with Python, and load curated datasets into a AWS S3 bucket and visualized using PowerBI.
Cloud Data Platform
Built a cloud-based analytics pipeline on AWS using S3 for storage, Spark for processing, and Delta Lake for transactional data management, enabling efficient batch analytics workflows.
Let's Connect
I'm currently open for new opportunities. Whether you have a question or just want to chat about data engineering, I'll try my best to get back to you!