Research Technology Engineer

Posted:
7/24/2024, 9:13:22 AM

Location(s):
London, England, United Kingdom ⋅ England, United Kingdom

Experience Level(s):
Junior ⋅ Mid Level ⋅ Senior

Field(s):
Data & Analytics ⋅ Software Engineering

Workplace Type:
On-site

The Firm

XTX Markets is a leading algorithmic trading company partnering with counterparties, exchanges and e-trading venues globally to provide liquidity in the Equity, FX, Fixed Income, Commodity & Options markets. We provide consistent liquidity, helping market participants throughout the world obtain the best prices in the various assets classes we cover, regardless of changing market conditions.

At XTX Markets technology is our business and we are a diverse organisation which attracts outstanding talent from across all industry backgrounds. We are focused on teamwork and our people collaborate on all aspects of the business, working openly and with respect for each other, our clients and the market. Our culture is non-hierarchical and one where everyone is valued. We strive for excellence in everything we do.

 

The Role

XTX has one of the largest HPC clusters in the world, which the research technology team have built by writing software. We are not afraid to write our own filesystems, job distribution engines, orchestration tools and build systems if that is the optimal way to manage infrastructure at scale (which it often is). We are looking for someone to join in a senior capacity to work with our experienced team to help coordinate designing our datacentre, choosing the right hardware, tuning operating systems, storage, and networks, managing orchestration and observability to writing the software which manages fair and efficient distribution of work on our compute cluster. We are a full stack team that works side-by-side with our researchers to make the most performant, reliable and transparent system we can.

The infrastructure that we build and maintain allows XTX to trade globally with daily volumes of over $300bn per day across a wide range of asset classes.

 

Responsibilities

The right candidate will:

  • Contribute to all components of our HPC infrastructure and code, and work on growing one of the biggest private compute clusters anywhere.
  • Write software that runs on a compute cluster that grows continually but currently has ~1PB memory, ~110K CPU cores, > 20,000 GPU’s, 200+PB of mixed SSD and HDD storage, connected by a high-performance network.
  • Enter an environment where improvements can usually be made very quickly, and where the results of those changes are both immediately visible and can make a large impact to the quantitative research function at the heart of our business.

Examples of current projects being undertaken by the group are:

  • Implementing an inhouse-written filesystem
  • Changing our host monitoring infrastructure
  • Bare metal provisioning
  • Changing our observability system to a new lightweight high performance metrics visualisation system

 

Essential Attributes

  • Previous direct experience in a similar setting with large-scale-compute, although previous experience in financial services is not necessary. We expect between 5-10 years of experience in a relevant setting.
  • Strong coding skills, preferably with recent exposure to python and at least one statically typed language. You will likely have a good STEM degree and/or top-notch technical credentials, and a drive to achieve.
  • Clear working knowledge and hands-on experience with computer networks; design, configuration, monitoring, automation, approaches to loss + congestion control, understanding of underlying hardware, IP/Ethernet and InfiniBand, host tuning. Be prepared to back this up by knowing how these technologies work at a fundamental level.
  • Strong knowledge of large-scale infrastructure management, using code as a tool to facilitate hardware performance evaluation, automated builds, patching, application deployment, domain environment (DHCP/DNS/data locality etc.), observability (monitoring and alerting) and storage.
  • A desire to solve complex problems optimally from the ground up, not just always reassembling components written by others.
  • Using your knowledge from many layers of the technology stack (network/hardware/OS/software) to produce an optimal result.
  • A working knowledge of large-scale distributed systems.
  • Understanding of one or more machine learning frameworks and compute offload devices, like GPUs, is an advantage.


Benefits

  • Onsite gym, sauna, and fitness classes at no charge
  • Extensive medical benefits including an on-site doctor and therapist at no charge
  • Breakfast and lunch provided daily
  • Various supports for caregivers, including emergency dependent care
  • Beautiful Kings Cross office: https://vimeo.com/257888726
  • 25 days paid holiday per year + statutory holiday and paid sick days
  • Generous Pension contributions