Lead Data Engineer

Full-Time / Bangalore Noida / / Data and AI

Role Overview

We are seeking an experienced Lead Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases.

Key Responsibilities

  • Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation.
  • Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads.
  • Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files.
  • Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking.
  • Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference.
  • Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments.
  • Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows.
  • Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools.

Qualifications & Required Skills

  • Bachelor’s or Master’s in Computer Science, Engineering, or related field.
  • 7+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines.
  • Strong programming skills in Python/Java and SQL.
  • Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing.
  • Hands-on with Airflow, dbt, or other orchestration tools.
  • Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC).
  • Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments.
  • Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake.
  • Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible.
  • Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata).

Good-to-Have

  • Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data.
  • Prior experience in domains such as aviation, manufacturing, or logistics is a plus.