We are looking for a Site Reliability Engineer (SRE) to support and manage our Infrastructure-as-a-Service (IaaS) platform built on the VMware Cloud Foundation (VCF) stack. The role involves maintaining, automating, and optimizing the VCF environment, ensuring high availability, scalability, and operational efficiency
Day-to-Day Responsibilities
- Manage, operate, and optimize VMware Cloud Foundation (VCF) environments.
- Configure and maintain vCenter, ESXi clusters, and vRealize Suite components (vROps, vRA, vRLI, vRNI).
- Develop and maintain automation scripts and playbooks using Ansible and PowerCLI to streamline operations and deployments.
- Monitor system health, capacity, performance, and proactively troubleshoot infrastructure issues.
- Collaborate with Cloud, DevOps, Security, and Infrastructure teams to ensure platform reliability and compliance.
- Implement and maintain configuration management, system upgrades, and patching processes.
- Support Windows and Linux virtual machine environments, including lifecycle management and troubleshooting
Must-Have Skills & Experience
- The ideal candidate will have a strong foundation in IT infrastructure and virtualization, with proven hands-on experience managing enterprise-grade VMware environments.
- Overall IT Experience: 5–7 years of professional experience in IT infrastructure, systems, or virtualization.
- VMware Cloud Foundation (VCF): 2–3 years of experience managing and operating VCF environments, including SDDC Manager, vCenter, NSX, and vSAN integration.
- vCenter & ESXi: 3–5 years of experience in the installation, configuration, and management of vSphere environments, including HA, DRS, resource allocation, and troubleshooting.
- vRealize Suite (vROps, vRA, vRLI, vRNI): 2+ years of experience deploying, monitoring, and automating infrastructure using vRealize tools.
- Automation – Ansible: 2–3 years of experience building and maintaining automation playbooks for provisioning and configuration of infrastructure.
- Automation – PowerCLI / PowerShell: 2+ years of experience developing scripts for VMware automation and task orchestration.
- Operating Systems (Windows & Linux): 2+ years of experience performing administration, configuration, and troubleshooting of virtual machines.
- Monitoring & Troubleshooting: 3+ years of experience identifying, analyzing, and resolving issues in virtualized environments, ensuring high availability and performance.