Open to Data Engineer opportunities • US

NITTURI BALASUBRAMANYAM Data Engineer

I build reliable ETL and streaming systems, optimize pipeline performance, and deliver analytics-ready datasets. Strong focus on orchestration, data quality, and production troubleshooting.

AirflowSparkPythonSQL AWSKafkaData QualityWarehousing
Balasubramanyam profile photo
35%
ETL runtime reduction
20%
compute cost reduction
25%
accuracy improvement

Featured Projects

Production-style projects with clear architecture, tradeoffs, and measurable outcomes.

Automated ETL Pipeline Optimization

AirflowSparkSQLRedshiftDocker

End-to-end ETL with orchestration, validation, Spark transformations, and warehouse loading. Includes performance tuning and failure recovery patterns.

  • Reduced pipeline runtime by 35% via query + Spark tuning
  • Cut compute cost by 20% (partitioning, joins, caching)
  • Improved accuracy by 25% with automated validation checks

Real-Time Financial Data Pipeline

KafkaSpark StreamingLakehouseMonitoring

Streaming pipeline for transaction events with event-time handling, deduplication, and checkpoint-based recovery.

  • Handled late/out-of-order events using event-time logic
  • Prevented duplicates via idempotent writes
  • Recovered reliably using checkpoints

Customer 360 Data Platform

IngestionIncrementalSparkWarehouse

Unified customer model built from multiple systems with incremental processing and quality gates.

  • Unified CRM + payments + support into one customer model
  • Improved runtime using incremental processing strategy
  • Added quality checks (null/uniqueness/freshness)

Data Observability Framework

DQ ChecksMetricsAlertsDashboards

Framework to detect “pipeline succeeded but data is wrong” using anomaly signals and freshness metrics.

  • Row-count, freshness, and schema drift monitoring
  • Alerting workflow for fast RCA
  • Reduced time-to-detect for data issues

LLM-Powered Data Quality Monitor

AirflowClaude APIAWS S3PythonDockerSlack

AI-augmented pipeline that detects anomalies in transaction data and delivers plain-English root cause analysis to Slack — automatically, without manual log triage.

  • Integrated Anthropic Claude API to explain failures across 9 custom DQ rules — root cause, business impact, and fix recommendation in under 2 seconds
  • Orchestrated full pipeline with Airflow on Docker — S3 ingest → validation → LLM explanation → Slack alert, running daily with zero manual intervention
  • Reduced mean-time-to-understand data issues by replacing raw JSON alerts with plain-English AI-generated summaries delivered directly to #data-alerts
``` --- ## Step 4 — Update the Skills section Press **`Ctrl + F`** and search for: ``` ML workflows, Data Modeling, AI

Certifications

Verified credentials. Click to view.

Skills

Focused skill set aligned to modern data engineering roles.

Core

Python, SQL, Spark/PySpark, Airflow, Kafka, ML workflows, Data Modeling, Claude API, AI, ETL/ELT, Data Quality

Cloud & Platforms

AWS (S3, Athena, EC2), Warehousing (Redshift), Azure Data Factory, Azure Synapse, Azure Data Lake, CI/CD, Docker, Monitoring & Alerting

Experience

Impact-focused work with production ownership.

Data Engineer — Intuit (2024 - present)

California, United States
Production pipelines

Data Engineer — Ameriprise Financial (2023 - 2024)

Minnesota, United States
Production pipelines

Data Engineer — Michael Page (2019 - 2022)

Hyderabad, India
Production pipelines

Contact

Best way to reach me: email. I respond quickly.

Let’s connect
Data Engineering • ETL • Streaming • Airflow • Spark • AI •