Data Engineer (Flink & Iceberg)

Full-Time / Bangalore / / Data and AI

Role Overview 

We are seeking an experienced Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases. 

Major Skills : Spark, Flink, Iceberg

Key Responsibilities 

  • Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation. 
  • Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads. 
  • Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files. 
  • Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking. 
  • Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference. 
  • Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments
  • Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows
  • Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools. 

Qualifications & Required Skills 

  • Bachelor’s or Master’s in Computer Science, Engineering, or related field. 
  • 4+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines. 
  • Strong programming skills in Python/Java and SQL
  • Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing. 
  • Hands-on with Airflow, dbt, or other orchestration tools. 
  • Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC).
  • Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments. 
  • Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake. 
  • Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible. 
  • Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata). 

Good-to-Have 

  • Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data. 
  • Prior experience in domains such as aviation, manufacturing, or logistics is a plus. 

Location: Bangalore (Hybrid)