Staff Engineer - SRE

Posted:
9/19/2024, 10:33:18 PM

Location(s):
Bengaluru, Karnataka, India ⋅ Karnataka, India

Experience Level(s):
Expert or higher ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Staff Engineer - Site Reliability Engineering (SRE)

Location - Bangalore

 

ABOUT US

Founded in 2014, Circles is a global technology company reimagining the telco industry with its SaaS platform - Circles X, helping telco operators launch and operate successful digital brands through its offerings.

Having pioneered a successful blueprint for disrupting the telco space in Singapore, Circles has since launched its own digital telco, Circles.Life, in Singapore, Taiwan and Australia. Circles has also partnered with other telco operators to launch digital services, enabling our partners to accelerate growth and capture market share within a short period of time.

Today, Circles is partnering with operators in 14 countries to deliver delightful digital experiences to millions of people through our businesses.

We are backed by global investors such as Sequoia, Warburg Pincus, EDBI and Founders Fund – renowned backers of industry-shaking innovators.

 

Role Description

Site Reliability Engineering (SRE) is a horizontal function spanning across the entire company meaning you'll be able to work with multiple teams across various products and platforms to ensure their software features are reliable for the build and launch & Operational teams. 

As a Staff Software Engineer, you will work as an individual contributor (IC) who has hands-on experience to propose, design, implement and troubleshoot robust CICD pipeline for multi-tier applications deployed on virtual machines and containerize platforms running on top of public cloud infrastructure. You will be working very closely with other principal engineers’ part of different teams.

You will help the organization accelerate the SaaS journey by implementing robust CICD automation, strategic solutions and reusable architecture patterns with loosely coupled architecture. Your contribution is required to increase the throughput of the delivery, accelerate the lead time, increase stability and provide highly reliable solutions.

This is a fantastic opportunity to join a highly skilled and talented team where you'll be able to add real value to our organization as a central function. This opportunity helps to upskill the SRE/DevOps in-depth knowledge, upskill your technical and business skill set.

You will be working in a high-tech DevOps CICD pipeline consisting of continuous integration, continuous deployment, ephemeral environments (dynamic environments), automated QA pipelines, fan out pipelines, blue/green deployments and many other cutting-edge features to deploy a SaaS product to multiple partner environments from on premise to different cloud providers.

 

Your Responsibilities

  • Design, Test and roll out robust CI/CD solutions as a centralized solution across business domains
  • Design, test and roll out infrastructure as code solutions which can act as the central golden version which can work for different combinations of input
  • Present solutions at review boards, conduct internal walkthrough sessions and conduct handover/training sessions on the target solution and implementation steps
  • Design robust CI/CD pipelines to work with cloud based/on prem Kubernetes clusters.
  • Setup a continuous delivery and deployment pipeline integrated with release workflow to support release orchestration
  • Troubleshoot multi-layer and containerized applications deployed in cloud infrastructure
  • Maintaining, enhancing and fine-tuning dynamic environments 
  • Apply automation and software to any manual and mechanical tasks or parts of the system that would benefit from it or are performed manually
  • Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices.
  • Conduct system discovery, analysis, and develop improvements for system software performance, availability and reliability
  • Design, write, ship, and motivate the implement solutions to increase observability, product reliability and organizational efficiency
  • Propagate Site Reliability Engineering culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams
  • Collaborate closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
  • Document system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
  • Maintain and monitoring deployment, orchestration, of the servers, docker containers, Kubernetes, and general back-end infrastructure
  • Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
  • Participate in On-Call roster to provide weekend support when required

Required Technical Skills

  • Minimum 8+ years of working experience in CICD platform, Kubernetes, leveraging DevOps, SRE & Agile methodologies
  • Bachelors or master’s degree in Information Technology/Computer Science or equivalent combination of education and experience.
  • Prefer to have the certifications such as;
    • Kubernetes CKA or CKAD certification is nice to have
    • AWS or GCP DevOps related certifications is nice to have
    • GCP or AWS certification on cloud architecture - associate/professional is nice to have
  • Very good experience in designing, setting up and maintaining Kubernetes cluster and containerized pipeline
  • Very good experience in designing, testing and implementing CICD pipeline to automate build, deployment and code promotion
  • Very good experience in writing automation scripts, CICD pipeline and automated routine tasks using groovy / python to eliminate human dependencies
  • Very good experience in troubleshooting CICD pipeline issues for containerized and multi-layer applications deployed in GCP or AWS
  • Sound knowledge to dive deep to understand the problem statement and execute structured troubleshooting mechanisms to identify the root cause and apply strategic solutions
  • Experience with CI/CD in cloud environments and container technology, Docker and Kubernetes, Docker Swarm, Helm DevOps (Git + CI/CD pipelines)
  • Experience as Linux systems administrator (e.g. Ubuntu, RedHat) and command line system administration such as Bash, VIM, SSH.
  • Experience in monitoring and analysing infrastructure performance using standard performance monitoring tools - Grafana/Prometheus, Datadog, Nagios, New Relic
  • Extended expertise in infrastructure core components: storage, system and/or networking
  • Strong understanding of TCP/IP networking, including familiarity with concepts such as OSI stack.
  • Strong understanding of Internet protocols and applications such as SMTP, DNS, HTTP, SSH, SNMP etc.
  • Solid understanding of ELK, Redis, RabbitMQ, Kafka and ETCD.
  • Hands-on experience in writing infrastructure as code (IaC), configuration management as code (CMaC) and policy as code (PoaC) is a plus