Production Support Engineer

Posted:
10/7/2025, 5:00:00 PM

Experience Level(s):
Junior ⋅ Mid Level ⋅ Senior

Field(s):
IT & Security

Company Overview

Join us on our mission to elevate customer experiences for people around the world.  As a member of the Everise family, you will be part of a global experience company that believes in being people-first, celebrating diversity and incubating innovation. Our dedication to our purpose and people is being recognized by our employees and the industry. Our 4.6/5 rating on Glassdoor and our shiny, growing wall of Best Place to Work awards is a testament to our investment in our culture. Through the power of diversity, we celebrate all cultures for their uniqueness and strengths. With 13 centers around the world and a robust work at home program, we believe great things happen when we work with people who think differently from us. Find a job you’ll love today!

We are seeking for experienced SRE/Production Support Engineer to join our dynamic team and ensure the seamless operation of our EverAI Suite products. In this role, you will provide 24/7 production support, troubleshoot issues, monitor system performance, and collaborate with development teams to maintain high availability and reliability. This position is ideal for problem-solvers who thrive in fast-paced environments and are passionate about AI technologies.

Key Responsibilities

  • Monitor production environments for EverAI Suite products (EverAI Simulator, EverAI Recruiter, and EverAI Knowledgeminer) using tools like Splunk, Prometheus, Grafana, ELK Stack, or similar monitoring systems.
  • Respond to incidents, alerts, and user-reported issues in a timely manner, performing root cause analysis and implementing fixes or workarounds.
  • Collaborate with cross-functional teams (development, QA, and operations) to resolve complex production problems and prevent recurrence.
  • Maintain and update documentation for support processes, troubleshooting guides, and knowledge bases.
  • Perform routine maintenance tasks, such as patching, scaling resources, and optimizes performance in cloud-based infrastructures (e.g., AWS, Azure, or GCP).
  • Participate in on-call rotations to provide after-hours support and ensure SLAs are met.
  • Analyze logs, metrics, and traces to identify trends, potential bottlenecks, and areas for improvement.
  • Assist in deployment activities, including CI/CD pipeline support and rollback procedures.
  • Contribute to continuous improvement initiatives, such as automating support tasks and enhancing monitoring capabilities.

Required Qualifications

  • Bachelor’s degree in computer science, Information Technology, Engineering, or a related field (or equivalent experience).
  • 3+ years of experience in production support, DevOps, or site reliability engineering (SRE) roles.
  • Strong troubleshooting skills with experience in debugging distributed systems, APIs, and microservices architectures.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell for automation.
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization tools (Docker, Kubernetes).
  • Familiarity with monitoring and logging tools (e.g., Splunk, Datadog, New Relic).
  • Knowledge of databases (SQL/NoSQL) and networking concepts.
  • Excellent communication skills, with the ability to explain technical issues to non-technical stakeholders.
  • Ability to work in a shift-based or on-call environment.

Preferred Qualifications

  • Experience supporting AI/ML-based products or SaaS platforms.
  • Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional SRE, or equivalent.
  • Familiarity with incident management frameworks (e.g., ITIL) and tools like PagerDuty or Jira.
  • Strong problem-solving mindset with a proactive approach to preventing issues.

If you’ve got the skills to succeed and the motivation to make it happen, we look forward to hearing from you.