Open to Data Engineer opportunities • US

NITTURI BALASUBRAMANYAM Data Engineer

I build reliable ETL and streaming systems, optimize pipeline performance, and deliver analytics-ready datasets. Strong focus on orchestration, data quality, and production troubleshooting.

View Projects GitHub LinkedIn

nitturi.balasubramanyam@gmail.com • +1 657-532-0248 • Bay Area, CA

AirflowSparkPythonSQL AWSKafkaData QualityWarehousing

35%

ETL runtime reduction

20%

compute cost reduction

25%

accuracy improvement

Featured Projects

Production-style projects with clear architecture, tradeoffs, and measurable outcomes.

Automated ETL Pipeline Optimization

AirflowSparkSQLRedshiftDocker

End-to-end ETL with orchestration, validation, Spark transformations, and warehouse loading. Includes performance tuning and failure recovery patterns.

Reduced pipeline runtime by 35% via query + Spark tuning
Cut compute cost by 20% (partitioning, joins, caching)
Improved accuracy by 25% with automated validation checks

View details → GitHub →

Real-Time Financial Data Pipeline

KafkaSpark StreamingLakehouseMonitoring

Streaming pipeline for transaction events with event-time handling, deduplication, and checkpoint-based recovery.

Handled late/out-of-order events using event-time logic
Prevented duplicates via idempotent writes
Recovered reliably using checkpoints

GitHub →

Customer 360 Data Platform

IngestionIncrementalSparkWarehouse

Unified customer model built from multiple systems with incremental processing and quality gates.

Unified CRM + payments + support into one customer model
Improved runtime using incremental processing strategy
Added quality checks (null/uniqueness/freshness)

GitHub →

Data Observability Framework

DQ ChecksMetricsAlertsDashboards

Framework to detect “pipeline succeeded but data is wrong” using anomaly signals and freshness metrics.

Row-count, freshness, and schema drift monitoring
Alerting workflow for fast RCA
Reduced time-to-detect for data issues

GitHub →

LLM-Powered Data Quality Monitor

AirflowClaude APIAWS S3PythonDockerSlack

AI-augmented pipeline that detects anomalies in transaction data and delivers plain-English root cause analysis to Slack — automatically, without manual log triage.

Integrated Anthropic Claude API to explain failures across 9 custom DQ rules — root cause, business impact, and fix recommendation in under 2 seconds
Orchestrated full pipeline with Airflow on Docker — S3 ingest → validation → LLM explanation → Slack alert, running daily with zero manual intervention
Reduced mean-time-to-understand data issues by replacing raw JSON alerts with plain-English AI-generated summaries delivered directly to #data-alerts

GitHub →

``` --- ## Step 4 — Update the Skills section Press **`Ctrl + F`** and search for: ``` ML workflows, Data Modeling, AI

Certifications

Verified credentials. Click to view.

Azure Data Engineer Associate

Microsoft

View →

Google Data Analytics

Google

View →

Power BI Data Analyst

Microsoft

View →

Skills

Focused skill set aligned to modern data engineering roles.

Core

Python, SQL, Spark/PySpark, Airflow, Kafka, ML workflows, Data Modeling, Claude API, AI, ETL/ELT, Data Quality

Cloud & Platforms

AWS (S3, Athena, EC2), Warehousing (Redshift), Azure Data Factory, Azure Synapse, Azure Data Lake, CI/CD, Docker, Monitoring & Alerting

Experience

Impact-focused work with production ownership.

Data Engineer — Intuit (2024 - present)

California, United States

Production pipelines

Designed and maintain scalable data pipelines that support machine learningand analytics use cases. I work primarily with Python and Spark to ingest,clean, and process large volumes of structured, semi-structured, andunstructured data used in supervised learning workflows. My responsibilitiesinclude preparing training datasets, implementing data validation and qualitychecks, and ensuring datasets remain consistent and reproducible acrossmodel iterations. I collaborate closely with ML and analytics teams to definedata requirements and quality standards. I have also applied LLM-assistedanalysis to improve pipeline debugging and data issue triage, enabling fasterhuman review and more reliable production workflows.

Data Engineer — Ameriprise Financial (2023 - 2024)

Minnesota, United States

Production pipelines

Worked on building and scaling enterprise data pipelines that supportedanalytics and downstream ML use cases. I designed and implemented ETLworkflows using Azure Data Factory and Python to ingest, transform, andstandardize data from multiple sources. My work focused on data cleansing,validation, and monitoring to ensure datasets were reliable and ready foranalytical and model-driven consumption. I partnered with analytics and BIteams to translate business requirements into well-defined, reusable datasetsand optimized SQL transformations to improve pipeline performance and dataavailability.

Data Engineer — Michael Page (2019 - 2022)

Hyderabad, India

Production pipelines

Developed and maintained Spark-based ETL pipelines to ingest and transformdata from Oracle, SQL Server, and Teradata into HDFS, supporting large-scale analytics workloads. Worked closely with business stakeholders totranslate requirements into analytics-ready datasets, eliminating 17+ hoursper week of manual reporting. Implemented reliable database ingestion usingSqoop and orchestrated workflows with Oozie to ensure consistent and timelydata delivery. Tuned Spark jobs for large-scale transformations to improveexecution performance and job stability, and prepared curated datasets tosupport BI dashboards and operational reporting used by cross-functionalteams.

Contact

Best way to reach me: email. I respond quickly.

Let’s connect

Data Engineering • ETL • Streaming • Airflow • Spark • AI •

Email LinkedIn