Java R&D Engineer/expert - Stability engineering platform

Posted:
3/17/2025, 11:26:47 PM

Location(s):
Hong Kong, China

Experience Level(s):
Junior ⋅ Mid Level ⋅ Senior

Field(s):
Software Engineering

Who We Are

At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.
 

About the Team

The Service Stability Engineering Team envisions service stability as one of the core competitive strengths of the company's products. By building end-to-end, link-level risk management capabilities, the team aims to achieve sustainable automatic identification and analysis of stability risks, transforming from "reactive governance" to "proactive governance." This approach shifts more stability-related matters forward and addresses them early, preventing issues before they arise and enhancing user experience.
 
服务稳定性工程团队以保障服务稳定性成为公司产品核心竞争力之一为愿景,通过构建端到端链路级风险管理能力,实现可持续的稳定性隐患自动化识别和分析,从“被动治理”转为“主动治理”,将更多的稳定性事项前置、左移,防范于未然,提升用户体验。

What You’ll Be Doing:

  1. 研究如何用技术快速识别问题、定位问题、以及恢复故障,达到1-5-10目标;
  2. 负责slo/sla制定和落地,以目标为导向保证业务稳定性;
  3. 持续建设稳定性保障工具平台,包括巡检系统、问题根因诊断系统、风险库等,让问题发现、定位、分析更准确和高效;
  4. 制定、推动稳定性规范落地,确保产品设计和编码符合稳定性原则;
  5. 持续关注业界前沿技术动态,组织团队学习提升,适时引入、推进新技术的升级迭代

What We Look For In You:

  1. 计算机或相关专业本科以上学历,7年以上研发、架构经验,有基础架构、框架类研发经验者更佳;
  2. 熟练掌握java、熟练应用springcloud微服务技术栈,具有良好的编码风格和算法能力;
  3. 熟练应用flink、elasticsearch、clickhouse、skywalking、prometheus/VictoriaMetrics、python等数据计算与分析工具;
  4. 具有RAG/Agent开发和调优经验更佳;
  5. 善于发现问题、分析问题、解决问题,有清晰的分析逻辑和全局架构思维;
  6. 具有产品化思维,熟悉研发流程,熟悉故障分析和故障处理流程,善于使用工具解决问题;
  7. 具备良好的沟通能力和领导能力,能够与跨部门团队协作,推动稳定性相关工作,能英语沟通者更佳;
  8. 有稳定性保障建设、巡检系统、问题根因诊断系统、混沌工程系统实践者更佳。
技能关键字

Perks & Benefits 

  • Competitive total compensation
  • Comprehensive insurance coverage for employees and their dependants
  • More that we love to tell you along the process!

 

OKX

Website: https://www.okx.com/

Headquarter Location: Victoria, Beau Vallon, Seychelles

Employee Count: 1001-5000

Year Founded: 2017

IPO Status: Private

Industries: Apps ⋅ Bitcoin ⋅ Blockchain ⋅ Cryptocurrency ⋅ Finance ⋅ Financial Services ⋅ FinTech ⋅ Information Technology ⋅ Internet ⋅ Web3