Role Overview
We are seeking an experienced Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases.
Major Skills : Spark, Flink, Iceberg
Key Responsibilities
- Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation.
- Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads.
- Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files.
- Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking.
- Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference.
- Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments.
- Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows.
- Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools.
Qualifications & Required Skills
- Bachelor’s or Master’s in Computer Science, Engineering, or related field.
- 4+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines.
- Strong programming skills in Python/Java and SQL.
- Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing.
- Hands-on with Airflow, dbt, or other orchestration tools.
- Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC).
- Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments.
- Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake.
- Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible.
- Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata).
Good-to-Have
- Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data.
- Prior experience in domains such as aviation, manufacturing, or logistics is a plus.
Location: Bangalore (Hybrid)