Senior/Staff Cloud Infrastructure Engineer

Posted:
12/10/2024, 2:10:25 PM

Location(s):
San Jose, California, United States ⋅ California, United States

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Who We Are

At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.

About the Team 

Cloud Infrastructure Engineering is a critical engineering discipline and a job function in the company. Its charter is to build tools and infrastructure that promote early detection of production failures, leading to a stellar customer experience.
Our work is to drive safety, health and uptime of our platform, and the ability to remedy unforeseen problems. By removing some of the complex burdens on how to scale and maintain uptime in distributed systems, Cloud Infrastructure Engineer allows development teams to focus on feature development instead of the nuances of achieving and maintaining service level commitments.

About the Opportunity

We’re looking for a creative and driven individual that can spearhead our effort to push “outside the box” infrastructure implementations, that will have a tremendous impact on our platform’s stability and scalability.

What You’ll Be Doing

  • Responsible for the maintenance and configuration of AWS products and services
  • Responsible for the research, architecture and project implementation solutions based on AWS products
  • Responsible for the daily maintenance of each AWS cloud environment
  • Automate the provisioning, scaling, and configuration of infrastructure resources using Terraform and CI/CD pipelines
  • Assist the team in troubleshooting and resolving production database issues during incidents
  • Monitor company services and handle alerts in a timely manner to ensure service stability and uptime
  • Collaborate with development teams to ensure seamless integration and deployment of new features

What We Look For In You

  • Bachelors degree or above, major in Computer Science or relevant domains, with over 6 years of experience in DevOps, SRE, DBA or related positions
  • Proficient in AWS distributed management, large-scale clustering, fault tolerance, backup, load balancing and other technologies
  • Have a deep understanding of high availability architecture, capacity planning, and rich experience in handling complex problems
  • Have solid Linux platform operation and maintenance and debugging capabilities, and be proficient in troubleshooting, configuration tuning, and performance analysis
  • Familiar with Kubernetes (k8s) for container orchestration and management
  • Familiar with the functional features of AWS products and core products, and have rich practical experience in deployment and tuning of EC2, EKS, VPC, or big data products
  • Experience with microservices architecture, including deployment, scaling, and maintenance
  • Experience in monitoring, O&M and management of AWS large-scale servers and containers
  • Familiarity with relational databases (e.g., MySQL, PostgreSQL) and basic operations such as querying, monitoring performance metrics, and reviewing logs
  • Familiar with the deployment, configuration and maintenance of Nginx, kong and other software
  • Proficient in using Python/shell for development
  • Strong engineering skills, proficient in at least one O&M or infrastructure sub-area, public cloud networking, SRE, DevOps or cloud-native
  • Proficient in using Terraform for infrastructure as code (IaC) to automate cloud resource provisioning and management
  • Excellent business analysis ability, system architecture ability, and problem-solving ability. and strong self-drive

Nice to Have

  • Bilingual in English and Mandarin
  • Familiar with the operation and maintenance management of Alibaba Cloud, Google Cloud, Microsoft Cloud and other cloud providers.

Perks & Benefits

  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events

OKX Statement

The base salary range for this position is $198,000 to $280,000. The salary offered depends on a variety of factors, including job-related knowledge, skills, experience, and market location. In addition to the salary, a performance bonus and long-term incentives may be provided as part of the compensation package, as well as a full range of medical, financial, and/or other benefits, dependent on the position offered. Applicants should apply via OKX internal or external careers site.

OKX is committed to equal employment opportunities regardless of race, color, genetic information, creed, religion, sex, sexual orientation, gender identity, lawful alien status, national origin, age, marital status, and non-job related physical or mental disability, or protected veteran status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider employment-qualified applicants with arrest and conviction records.

OKX

Website: https://www.okx.com/

Headquarter Location: Victoria, Beau Vallon, Seychelles

Employee Count: 1001-5000

Year Founded: 2017

IPO Status: Private

Industries: Apps ⋅ Bitcoin ⋅ Blockchain ⋅ Cryptocurrency ⋅ Finance ⋅ Financial Services ⋅ FinTech ⋅ Information Technology ⋅ Internet ⋅ Web3