Posted:
11/11/2024, 4:53:57 PM
Location(s):
Karnataka, India ⋅ Bengaluru, Karnataka, India
Experience Level(s):
Senior
Field(s):
DevOps & Infrastructure ⋅ Software Engineering
Job Title: Site Reliability Engineer I (SRE-I)
Who We Are:
Headquartered in New York City, Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. The Company develops and publishes products principally through Rockstar Games, 2K, Private Division, and Zynga. Our products are currently designed for console gaming systems, PC, and Mobile, including smartphones and tablets, and are delivered through physical retail, digital download, online platforms, and cloud streaming services. The Company’s common stock is publicly traded on NASDAQ under the symbol TTWO.
While our offices (physical and virtual) are casual and inviting, we are deeply committed to our core tenets of creativity, innovation and efficiency, and individual and team development opportunities. Our industry and business are continually evolving and fast-paced, providing numerous opportunities to learn and hone your skills. We work hard, but we also like to have fun, and believe that we provide a great place to come to work each day to pursue your passions.
The Challenge
SRE team serves as a centralised operations unit under the Technical Operation Centre (TOC), tasked with maintaining the health, availability, and reliability of our games and services. From a broader perspective, our primary mission is to ensure high uptime. As the first line of defence for all production issues, SREs take the lead in monitoring infrastructure and providing primary on-call support, ensuring a quick response to any incidents. We also play a critical role in emergency response, managing communication and coordination to resolve issues as efficiently as possible.In addition to these primary responsibilities, the SREs take a proactive approaches along with the NOC team to improving latency, performance, and efficiency across all services. Our work extends to capacity planning and optimization of systems at both the system and cloud levels, ensuring that services scale efficiently to meet the demands of our games. We don’t just respond to incidents; we continuously look for ways to enhance the performance and reliability of the infrastructure.Ultimately, SRE strives to achieve world-class uptime for all Take-Two products, working to reduce the frequency and impact of downtime while resolving issues promptly and comprehensively. With a focus on the entire production stack, we take a holistic approach to reliability engineering, ensuring that every layer—from the infrastructure to the application level—contributes to the best possible user experience.
What You’ll Take On
Windows Administration
Manage and maintain Windows servers, ensuring their stability, security, and performance.
CheckMK
Utilize CheckMK for comprehensive monitoring and alerting, ensuring all systems are functioning optimally.
Linux Administration
Diagnose and resolve issues on Linux systems, ensuring minimal downtime and maximum efficiency.
VMWare
Manage virtual environments using VMWare, ensuring resources are optimized and available.
vSan Understanding
Demonstrate a solid understanding of vSan for effective storage management and troubleshooting.
Cloud Administration
Administer and manage cloud services across AWS, Azure, Splunk, and GCP, ensuring seamless integration and operation.
Risk Assessment
Assess potential risks and impacts on game services and revenue, taking proactive measures to mitigate them.
Issue Identification
Identify issues, alerts, and critical service incidents using provided dashboards and monitoring tools.
Service Troubleshooting
Utilize studio playbooks to troubleshoot and diagnose basic issues across various services.
Communication
Relay accurate and timely information regarding service impacts to game studios, ensuring effective communication during incidents.
Incident Management
Spearhead outage management, including communication, triage, and escalation.
Daily On Call
Responsible for triaging and troubleshooting critical alerts form critical systems
What You Bring
Experience:
Live Services Knowledge: Understanding of live services and their operational requirements.
Change/Crisis Management: Experience in managing change and crisis situations, ensuring minimal disruption to services.
Effective Communicator: Able to relay information accurately and timely to the game studio and other stakeholders.
Team Player: Works well in a collaborative environment, sharing knowledge and supporting team members.
Proactive Problem-Solving:
A commitment to continuous improvement and proactive issue resolution.
Proven experience in troubleshooting production problems affecting live services.
Able to identify potential issues before they become critical and manage details effectively.
Background:
At least 1 year of experience in a similar role and/or 3 years experience in a relevant role.
Great to Have:
Apply Advanced Knowledge:
Utilize your broad understanding of principles, theories, and concepts in IT, integrating advanced knowledge from related fields.
Solve Complex Problems: Address diverse and moderately complex problems, using sound judgment to select the best methods and techniques.
Network and Collaborate: Engage with senior internal and external personnel to maximize the application of functional expertise.
Problem Solving:
Innovate Solutions: Develop and recommend solutions to tactical business issues, proactively identifying and addressing potential problems.
Lead with Expertise: Use your advanced knowledge to guide your team and drive effective solutions.
Decision Making:
Exercise Autonomy: Make decisions with considerable latitude, consulting with senior engineers or managers on complex issues and recommending solutions as necessary.
What We Offer You:
Website: https://take2games.com/
Headquarter Location: New York, New York, United States
Employee Count: 10001+
Year Founded: 1993
IPO Status: Public
Last Funding Type: Post-IPO Debt
Industries: Online Games ⋅ PC Games ⋅ Publishing ⋅ Software ⋅ Video Games