→data_engineer.py
I build reliable ETL and streaming systems, optimize pipeline performance, and deliver analytics-ready datasets. Strong focus on orchestration, data quality, and production troubleshooting.
// projects
Production-style projects with clear architecture, tradeoffs, and measurable outcomes.
End-to-end ETL with orchestration, validation, Spark transformations, and warehouse loading. Includes performance tuning and failure recovery patterns.
AI-augmented pipeline that detects anomalies in transaction data and delivers plain-English root cause analysis to Slack — automatically, without manual log triage.
Streaming pipeline for transaction events with event-time handling, deduplication, and checkpoint-based recovery.
Unified customer model built from multiple systems with incremental processing and quality gates.
Framework to detect "pipeline succeeded but data is wrong" using anomaly signals and freshness metrics.
// skills
Focused skill set aligned to modern data engineering roles.
// experience
Impact-focused work with production ownership.
2024 – present
Intuit
California, US
// currentDesigned and maintain scalable data pipelines supporting machine learning and analytics use cases. Work primarily with Python and Spark to ingest, clean, and process large volumes of structured, semi-structured, and unstructured data used in supervised learning workflows. Responsibilities include preparing training datasets, implementing data validation and quality checks, and ensuring datasets remain consistent and reproducible across model iterations. Applied LLM-assisted analysis to improve pipeline debugging and data issue triage, enabling faster human review and more reliable production workflows.
2023 – 2024
Ameriprise Financial
Minnesota, US
Built and scaled enterprise data pipelines supporting analytics and downstream ML use cases. Designed and implemented ETL workflows using Azure Data Factory and Python to ingest, transform, and standardize data from multiple sources. Focused on data cleansing, validation, and monitoring to ensure datasets were reliable and ready for analytical consumption. Optimized SQL transformations to improve pipeline performance and data availability.
2019 – 2022
Michael Page
Hyderabad, India
Developed and maintained Spark-based ETL pipelines to ingest and transform data from Oracle, SQL Server, and Teradata into HDFS, supporting large-scale analytics workloads. Eliminated 17+ hours per week of manual reporting by translating business requirements into analytics-ready datasets. Implemented reliable database ingestion using Sqoop and orchestrated workflows with Oozie. Tuned Spark jobs for large-scale transformations and prepared curated datasets for BI dashboards used by cross-functional teams.
// certifications
Verified credentials. Click to view.
// contact
Actively looking for Data Engineer roles in the Bay Area. Best way to reach me is email — I respond quickly.