Senior Site Reliability Engineer

Posted:
8/24/2025, 12:26:34 PM

Location(s):
Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia ⋅ Wilayah Persekutuan Kuala Lumpur, Malaysia

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering


Job Description

Location: Kuala Lumpur

About AirAsia MOVE
AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel solutions by combining innovation with operational excellence. Our goal is to create seamless, reliable, and delightful journeys for travelers across the region.

About the Role

We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we grow across multiple lines of business.

You’ll lead key initiatives around:

  • Cloud architecture modernization.

  • Multi-region reliability.

  • Observability and incident response.

  • Reducing toil through automation and self-service.
     

This is a hands-on technical role, where you’ll work across platforms, SRE, and application teams to build scalable systems that are resilient, cost-aware, and developer-friendly.

What You’ll Do

  • Design and implement secure, scalable infrastructure on Google Cloud Platform (GCP).

  • Lead efforts to build and evolve MOVE’s GCP Landing Zone, including Shared VPC, org structure, IAM, and policy guardrails

  • Build and improve multi-region architectures for high availability and disaster recovery.

  • Drive infrastructure automation using Terraform, CI/CD, and GitOps practices.

  • Improve observability across teams by standardizing monitoring, tracing, and alerting.

  • Collaborate on incident response and postmortems to reduce MTTR and build resilience.

  • Enforce tagging, FinOps controls, and security policies across GCP projects.

  • Contribute to platform engineering initiatives and developer self-service tools.

 What We’re Looking For

  • 5+ years in SRE, DevOps, or cloud infrastructure roles.

  • Solid experience with GCP, Terraform, Kubernetes (GKE), or similar cloud providers.

  • Strong hands-on experience in automation and multi-region architecture design.

  • Experience in networking (VPCs, NAT, PSC), IAM, and cloud-native security.

  • Proven ability to debug and support production systems under pressure.

  • Familiarity with monitoring and tracing tools like Cloud Monitoring, OpenTelemetry, Signoz.

  • Exposure to using AI/anomaly detection for alert tuning or reliability insights.

  • Clear communicator who works well with developers, product, and other infra teams.