Software Engineer, Data Transformation and Movement

Posted:
10/24/2024, 1:37:22 PM

Experience Level(s):
Mid Level ⋅ Senior

Field(s):
Software Engineering

Workplace Type:
Remote

Pay:
$268/hr or $557,440 total comp

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Data Transformation and Movement team operates the critical infrastructure that powers near-realtime and batch data processing at Stripe. The team supports a variety of use cases, including Payment, Ledger, ML, Fraud Detection, Product Analytics, Regulatory Reporting, Financial Data Reconciliation, and externally facing products like Radar and Sigma. As an example of the scale, the team’s systems serve hundreds of teams, thousands of workflows, 100,000+ task executions, O(billion) streaming transformations, and moving terabytes of data processing over 1 GB/second every day. Our users inside Stripe include other engineering teams, Data Scientists, Sales & Operations, Finance, etc.

This role could be on any one of the following sub-teams:

Data Movement builds and operates a constellation of multi-region, high scale ingestion systems that moves data from all online sources into Iceberg, with sub-minute latency. On the cusp of innovation, we're pushing the boundaries of open-source Iceberg and Spark for real-time ingestion.Data Orchestration builds and operates the time-based and event-based orchestration infrastructure that powers and accelerates batch data pipelines.Data Transformation builds and operates the transformation abstractions and infrastructure that support frictionless data development across the board, sub-minute event data to enormous daily partitions - or even for-all-time snapshots.

Our team operates on a wide range of tech stacks including Kafka, Event Bus, Change Data Capture, Flink, Spark, Airflow, Hive MetaStore, Trino, Pinot, SQL, Python, Java, Scala, S3, and Iceberg. 

What you’ll do

As a Software Engineer on our team, you will do the following:

  • Design, build, and maintain innovative next-generation or first-generation versions of key Data Platform products, with an emphasis on usability, reliability, security, and efficiency.
  • Design ergonomic APIs and abstractions that build a great customer experience for internal Stripes, that will in turn enhance the experience of millions of Stripe users.
  • Ensure operational excellence and enable a highly available & reliable Data Transformation & Movement platform across streaming and batch workloads.
  • Collaborate nimbly with high-visibility teams and their stakeholders to support their key initiatives - while building a robust platform that benefits all of Stripe in the long term.
  • Plan for the growth of Stripe’s infrastructure by unblocking, supporting, and communicating proactively with internal partners to achieve results.
  • Connect your work with improvements in the usability and reliability of Open Source Software (OSS) like Apache Airflow, Iceberg, Spark and contribute back to the OSS community.

Who you are

We’re looking for someone who: 

Minimum requirements

  • BS or MS in Computer Science or equivalent field and interest in data 
  • 2-5 years of professional experience writing high quality production level code or software programs
  • Has experience operating or enabling large-scale, high-availability data pipelines from design, to execution and safe change management. Expertise in Spark, Flink, Spark, Airflow, Python, Java, SQL, and API design is a plus.
  • Has experience developing, maintaining, and debugging distributed systems built with open source tools
  • Has experience building infrastructure-as-a-product with a strong focus on users needs
  • Has strong collaboration and communication skills, and can comfortably interact with both technical and non-technical participants.
  • Has the curiosity to continuously learn about new technologies and business processes.
  • Is energized by delivering effective, user-first solutions through creative problem-solving and collaboration.

Preferred qualifications

  • Has experience writing production-level code in Expertise in Scala, Spark, Flink, Spark, Airflow, Python, Java, and SQL is a plus.
  • Experience packaging and deploying code into cloud-based environments (AWS, GCP, Azure) with tools including Bazel, Docker Containers, etc
  • Has experience designing APIs or building developer platforms
  • Has experience optimizing the end to end performance of distributed systems
  • Has experience with scaling distributed systems in a rapidly moving environment
  • Has experience working with data pipelines
  • Genuine enjoyment of innovation and a deep interest in understanding how things work