Principal Cloud Engineer I

Posted:
9/17/2024, 7:35:21 AM

Location(s):
Ontario, Canada ⋅ Old Toronto, Ontario, Canada

Experience Level(s):
Expert or higher ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Workplace Type:
Hybrid

Who we are

Founded in 2002, Zafin offers a SaaS product and pricing platform that simplifies core modernization for top banks worldwide. Our platform enables business users to work collaboratively to design and manage pricing, products, and packages, while technologists streamline core banking systems. 

With Zafin, banks accelerate time to market for new products and offers while lowering the cost of change and achieving tangible business and risk outcomes. The Zafin platform increases business agility while enabling personalized pricing and dynamic responses to evolving customer and market needs. 

Zafin is headquartered in Vancouver, Canada, with offices and customers around the globe including ING, CIBC, HSBC, Wells Fargo, PNC, and ANZ. Zafin is proud to be recognized as a top employer and certified Great Place to Work® in Canada, India and the UK.  

Whats the opportunity?

Reporting to the Head of Platform Engineering, the Senior Cloud Engineer is responsible for driving the complete automation of our platform provisioning, configuration, and management. You will leverage tools such as Terraform, Azure, AKS, Kubernetes, Kustomize, Helm, Argo CD, and GitOps to spearhead the design, build, and automation of a highly available, scalable, secure, and reliable Cloud Infrastructure Platform based on Azure and Azure Kubernetes Services (AKS). Your role will support our mission to enable development teams to deliver high-quality software faster, more reliably, and securely. You will be a force multiplier in our team’s mandate to continuously improve and enhance the platform’s capabilities, ensuring it remains cutting-edge and highly efficient, continuously evolving to meet the needs of the organization and its users. This role requires operational expertise, customer focus, and the ability to work collaboratively with cross-functional teams to drive customer satisfaction.

Mode of Work: Hybrid

What will you do?

  • Lead the design and implementation of a scalable, reliable, secure, and highly available Cloud Infrastructure Platform based on Azure and Azure Kubernetes Service (AKS).
  • Drive the complete automation of platform provisioning, configuration, and management using tools like Terraform and Argo CD.
  • Automate Infrastructure Provisioning with Terraform: Design and manage cloud infrastructure using Terraform to implement infrastructure as code (IaC), utilizing Terraform modules to ensure modular, reusable, and maintainable configurations for consistent, repeatable deployments.
  • Develop automated workflows with CI/CD pipelines to streamline and accelerate software delivery, eliminating any manual interventions.
  • Design and maintain GitOps workflows using tools such as Argo CD to automate the deployment and management of infrastructure and applications, ensuring seamless integration with Kubernetes clusters. Use GitOps principles to automatically detect and correct configuration drifts, ensuring that the actual state of the system always matches the desired state.
  • Implement self-healing and auto-scaling mechanisms to enhance platform resilience and performance.
  • Drive Continuous Improvement and Enhancement: Lead initiatives to continuously improve and enhance the platform by identifying inefficiencies, implementing automation, eliminating any manual toil, adopting new technologies, and optimizing existing processes to ensure higher reliability, performance, and scalability.
  • Collaborate closely with the Cloud Operations team to facilitate a seamless handover of support and maintenance tasks. Deliver documentation, conduct necessary knowledge transfer sessions, and provide ongoing mentorship to enable the cloud operations team to take over operational tasks successfully.
  • Implement Observability for Applications and Infrastructure: Utilize Azure's suite of observability tools, including Azure Monitor, Application Insights, Log Analytics, and Azure Network Watcher, to monitor and alert on the performance and health of applications and infrastructure. Ensure comprehensive visibility into application health and performance, enabling proactive detection and resolution of issues. Set up alerts and dashboards to provide real-time insights and proactive notifications for infrastructure anomalies and performance degradation.
  • Deliver Self-service Documentation to ensure that development and operations teams can easily consume and support the platform independently. This will reduce dependency on the platform engineering team and facilitate the handover of operational responsibilities to the cloud operations team, equipping them with the necessary knowledge and tools to manage day-to-day operations effectively.
  • Provide L3 and L4 support to aid in the resolution of Cloud Platform-related Incidents.

What do you need to succeed?

Must haves:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Azure and Kubernetes Certifications preferred
  • 8+ years of experience in DevOps, Cloud Infrastructure, and Platform Engineering.
  • Extensive experience and strong expertise with Terraform for infrastructure as code (IaC): Proficient in designing, writing, and maintaining Terraform configurations, utilizing modules for modular and reusable code.
  • Extensive experience and strong expertise in Azure services, especially Azure Kubernetes Service (AKS): Deep understanding and extensive experience with Azure cloud services and hands-on experience managing AKS clusters.
  • Strong expertise in Kubernetes: Comprehensive knowledge of Kubernetes architecture, cluster setup, management, and troubleshooting. CKA certification is preferred.
  • Proficient in using Helm charts and Kustomize for Kubernetes resource management.
  • In-depth knowledge of Argo CD tool and GitOps principles: Experience in setting up and managing Argo CD for automated deployments and GitOps workflows.
  • Strong Experience with CI/CD pipelines: Expert in developing, managing, and optimizing CI/CD pipelines using Azure Pipelines and other tools like Jenkins and GitHub Actions
  • Hands-on experience with Kubernetes Observability tools such as Grafana, Prometheus and Azure suite of Observability tools such as Azure Monitor, Application Insights, Container Insights, Log Analytics, and Azure Network Watcher for monitoring and alerting on application and infrastructure health and performance.
  • Proficiency in scripting languages (Python, Bash): Capable of writing scripts to automate tasks and manage configurations.
  • Experience with configuration management tools (Ansible, Chef, Puppet): Knowledgeable in using these tools for automating system configurations.
  • Cloud Networking Architecture: Proficient in designing and managing cloud-native networking architectures, including Virtual Networks, Subnets, and Network Security Groups in Azure, AWS, or GCP. Hands-on experience with Azure Virtual Network, Private Link, and Service Endpoints for secure and scalable connectivity.
  • Cloud Load Balancing and Traffic Management: Expertise in cloud-native load balancing (e.g., Azure Load Balancer, Application Gateway) and traffic management (Azure Traffic Manager, AWS Route 53) to ensure high availability and optimized traffic routing.
  • Cloud-Native Security & VPN: Advanced knowledge of cloud-native security models, including Zero Trust Architecture, Security Groups, Network ACLs, and cloud-native firewall services (Azure Firewall, AWS Network Firewall). Proficient in implementing VPNs using IPSec and designing secure, high-performance VPN connections.
  • IP Networking (subnetting, TCP/IP, Dynamic routing) and good Knowledge of cryptography ciphers
  • Kubernetes Networking: Strong understanding of Kubernetes networking concepts, including pod networking, services, ingress, and egress configurations. Experience with CNI (Container Network Interface) plugins such as Calico, Flannel, or Cilium for secure, scalable network policies within Kubernetes clusters.
  • Azure Virtual WAN Hub: Expertise in configuring and managing Azure VWAN Hub, with hands-on experience in integrating it with Azure Firewall for secure, scalable, and efficient connectivity across different regions or hybrid cloud environments
  • Strong understanding of Identity and Access management (IAM) and experience with tools like Okta, Azure AD, Ping Identity or Auth0

What’s in it for you

Joining our team means being part of a culture that values diversity, teamwork, and high-quality work. We offer competitive salaries, annual bonus potential, generous paid time off, paid volunteering days, wellness benefits, and robust opportunities for professional growth and career advancement. Want to learn more about what you can look forward to during your career with us? Visit our careers site and our openings: zafin.com/careers

Zafin welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process. 

Zafin is committed to protecting the privacy and security of the personal information collected from all applicants throughout the recruitment process. The methods by which Zafin contains uses, stores, handles, retains, or discloses applicant information can be accessed by reviewing Zafin’s privacy policy at https://zafin.com/privacy-notice/. By submitting a job application, you confirm that you agree to the processing of your personal data by Zafin described in the candidate privacy notice.

Zafin

Website: https://zafin.com/

Headquarter Location: Toronto, Ontario, Canada

Employee Count: 251-500

Year Founded: 2002

IPO Status: Private

Last Funding Type: Series B

Industries: Banking ⋅ Financial Services ⋅ FinTech ⋅ Professional Services ⋅ Software