Sr. ML Platform Engineer

Posted:
7/7/2024, 5:00:00 PM

Location(s):
Toronto, Ontario, Canada ⋅ Ontario, Canada

Experience Level(s):
Senior

Field(s):
AI & Machine Learning ⋅ Software Engineering

Job Description:

Rakuten International oversees 7 businesses with over 4,000 employees globally. The brand is recognized for its leadership and innovation in e-commerce, digital content, advertising, entertainment, and communications, bringing the joy of discovery and access to more than 1 billion members across the world. Our teams deliver on the company’s mission to delight merchants and customers through innovation, optimism, and teamwork.

Rakuten Rewards is a leading e-commerce company that enhances the way people shop by offering Cash Back, deals and rewards from more than 3,500 merchants. Founded in 1999, Rakuten has grown to become the go-to shopping destination for consumers, having paid out nearly $2 billion in Cash Back to its 15 million members. The company also operates ShopStyle, a leading fashion discovery destination, and Cartera Commerce, a top rewards platform for airlines and banks. For more information, visit www.rakuten.com.

We are currently seeking a highly skilled ML Platform Engineer to spearhead the design, implementation, and maintenance of infrastructure crucial to our machine learning endeavors. In this role, you will closely collaborate with data scientists and software engineers to construct scalable AI/ML platforms, streamline workflows, and uphold system reliability. The ideal candidate will possess adept programming abilities, a proven track record with distributed computing frameworks and cloud platforms, and a deep understanding of software engineering principles. If you are driven by the opportunity to pioneer robust AI/ML systems while remaining at the forefront of technological advancements, we warmly invite you to join our team.

KEY RESPONSIBILITIES

  • Develop and maintain scalable machine learning platforms, tools, and frameworks to support various AI/ML use cases, including data processing pipelines, model training workflows, and model serving infrastructure.
  • Work closely with cross-functional teams including data scientists and software engineers to understand requirements and translate them into scalable technical solutions.
  • Create and execute deployment strategies and structures for integrating generative AI models into production environments.
  • Continuously optimize and improve data processing pipelines, model training workflows, and infrastructure to enhance efficiency, performance, and scalability.
  • Build and maintain tools for model versioning, experimentation, evaluation, and deployment to facilitate rapid iteration and development.
  • Collaborate with cross-functional teams to integrate AI/ML capabilities into existing products and services, ensuring seamless integration and alignment with business objectives.
  • Document development processes, and technical specifications of AI/ML Platform, communicating effectively with stakeholders, team members, and collaborators.

MINIMUM REQUIREMENTS

  • Proficiency in programming languages commonly used in machine learning, such as Python, Java, or Scala.
  • Familiarity with distributed computing frameworks such as Apache Spark, Ray or TensorFlow.
  • Knowledge of cloud computing platforms such as AWS, GCP, or Azure.
  • Experience with designing and implementing data processing pipelines, model training workflows, and model serving infrastructure.
  • Experience with training and fine-tuning language models on large-scale datasets.
  • Proficient in developing customer support chatbots utilizing Large Language Models (LLMs) and Agent Frameworks.
  • Capability to work in a fast-paced and dynamic environment, adapting to changing priorities and requirements.
  • Strong communication skills, with the ability to effectively convey complex technical concepts to diverse audiences.

QUALIFICATION REQUIREMENTS         

  • Bachelor’s degree in computer science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 5+ years of work experience in AI/ML Engineering/Data Science
  • 3+ years of work experience in AWS (EC2, S3, SageMaker)
  • Experience in container technologies like Docker, Kubernetes.
  • Experience in productionizing Large Language Models (LLMs) and Agent frameworks.

Five Principles for Success
Our worldwide practices describe specific behaviors that make Rakuten unique and united across the world. We expect Rakuten employees to model these 5 Shugi Principles of Success.

Always improve, Always Advance - Only be satisfied with complete success - Kaizen
Passionately Professional - Take an uncompromising approach to your work and be determined to be the best
Hypothesize - Practice - Validate – Shikumika - Use the Rakuten Cycle to succeed in unknown territory
Maximize Customer Satisfaction - The greatest satisfaction for our teams is seeing their customers smile
Speed!! Speed!! Speed!! - Always be conscious of time - take charge, set clear goals, and engage your team

Rakuten provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type. Rakuten considers applicants for employment without regard to race, color, religion, age, sex, national origin, disability status, genetic information, protected veteran status, sexual orientation, gender, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws.