Research Engineer, Data Curation

Posted:
10/22/2024, 1:24:31 AM

Location(s):
California, United States ⋅ San Francisco, California, United States

Experience Level(s):
Senior

Field(s):
AI & Machine Learning ⋅ Data & Analytics ⋅ Software Engineering

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.

Role overview:

We are seeking a Senior Software Engineer to lead our data acquisition efforts to drive our groundbreaking research. In this critical role, you will design, implement, and maintain robust systems for acquiring, processing, and managing the vast datasets required to train our advanced AI models. Your work will directly impact the quality and capabilities of our generative AI technologies, which impacts all the consumers and developers that Genmo serves.

Key responsibilities:

  • Work closely with researchers to understand model deficiencies and identify novel data sources to improve the model's quality

  • Design data acquisition pipelines to acquire, filter, deduplicate, and otherwise productionize pre-training data

  • Work with our data annotation operations team to design and implement new data filtering strategies

  • Integrate best-of-breed research in topics like self-supervised active learning to improve our data systems at scale

  • Lead cutting-edge research initiatives aimed at significantly improving the quality and effectiveness of data used in AI models, with a particular emphasis on video generation applications

Qualifications:

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field

  • 5+ years of experience working on curating large-scale training datasets for algorithms in domains such as self-driving, robotics, or computer vision

  • Must have:

    • Able to operate effectively in a dynamic research environment as well as scope and deliver projects end-to-end in your domain

    • Strong Python experience and some past experience with deep learning frameworks such as PyTorch

    • Experience using SQL, Spark, or other tools for processing large amounts of data

    • Experience working in large distributed systems

Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.