Data Engineer (Flink & Iceberg)

Role Overview

We are seeking an experienced Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases.

Major Skills : Spark, Flink, Iceberg

Key Responsibilities

Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation.
Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads.
Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files.
Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking.
Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference.
Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments.
Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows.
Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools.

Qualifications & Required Skills

Bachelor’s or Master’s in Computer Science, Engineering, or related field.
4+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines.
Strong programming skills in Python/Java and SQL.
Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing.
Hands-on with Airflow, dbt, or other orchestration tools.
Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC).
Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments.
Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake.
Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible.
Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata).

Good-to-Have

Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data.
Prior experience in domains such as aviation, manufacturing, or logistics is a plus.

Location: Bangalore (Hybrid)