Job main responsibilities:
Monitor data pipelines and platform health
Manage incidents and operational issues across the data platform
Perform root-cause analysis and implement corrective actions
Support reliability, performance, and cost optimization
Collaborate with engineering teams to improve platform robustness
Technical Skills & Technology Landscape:
Cloud data platform operations
Pipeline monitoring, alerting, and observability
Incident management and operational support
Performance and cost optimization (FinOps mindset)
1. Core Technical
Platform observability: Azure Monitor, Databricks monitoring, pipeline alerting (PagerDuty, OpsGenie)
Incident management : triage, escalation, RCA documentation, post-mortems
Infrastructure-as-Code: Terraform, Bicep, ARM templates for Azure data resources
Kubernetes-based orchestration: AKS, containerised data workloads
CI/CD pipeline operations: GitHub Actions, Azure DevOps release pipelines
FinOps and cloud cost governance: rightsizing, budget alerts, Databricks DBU tracking
Data platform reliability engineering: SLAs, SLOs, error budgets
2. Certifications
Microsoft Certified: Azure Administrator Associate — Preferred
Databricks Certified Associate Developer — Strongly Preferred
HashiCorp Terraform Associate — Strongly Preferred
Certified Kubernetes Administrator (CKA) — Preferred
Microsoft Certified: DevOps Engineer Expert — Preferred
3. Industry & Business Knowledge
Enterprise cloud platform operations in regulated manufacturing environments
Understanding of SAP integration dependencies and data pipeline criticality
Operational SLA frameworks for finance, supply chain, and operations reporting
Security baseline compliance: Azure Policy, Defender for Cloud, data platform hardening
Change management and release governance in enterprise settings
4. Behavioral & Leadership
Operational discipline: follows process, documents decisions, avoids hero culture
Continuous improvement orientation — turns incidents into systemic fixes
Clear and structured incident communication to business and technical stakeholders
Resilience under pressure — calm, systematic during platform outages
Proactive risk identification: monitors before things break
Hybrid Work Model: Flexibility to work from home and in the office, according to the policy, helping you achieve a healthy work-life balance.
Ticket Restaurant: Enjoy a daily meal allowance to support your well-being.
Flexible retribution: Kindergarten & Transport
30 Labor Days of Holidays: Ample time off to relax and recharge.
Language Lessons: Access to language lessons to help you grow both personally and professionally.
Medical Insurance: 60% company-subsidized medical insurance for employees, with the option to extend coverage to family members at a highly competitive rate.
Open and Modern Office Environment: Work in a collaborative, innovative, and comfortable space designed for your success.