- Jobright.ai (San Francisco, CA)
- Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site ... for building, testing, and deploying AI products at scale. The Site Reliability Engineer - Inference will work on developing a large-scale platform for… more
- Hamilton Barnes ? (San Francisco, CA)
- …to go for experimentation, full-scale model training, or inference . As a Platform Engineer /Senior Site Reliability Engineer , you'll own the ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Sierra Business Solution (San Francisco, CA)
- Software Engineer , Site Reliability (SRE) Software Engineer , Site Reliability (SRE) at Sierra Business Solution . About Us We are an in‑person ... and best practices. What You'll Bring 5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.… more
- Sierra (San Francisco, CA)
- …led the product and design teams for Google Workspace. What you'll do As a Software Engineer on our Site Reliability team at Sierra, you will be responsible ... the engineering org. What you'll bring 5+ years of hands‑on experience in Site Reliability or Infrastructure engineering roles for complex SaaS or cloud‑based… more
- Primer (San Francisco, CA)
- …, and incidents stay rare . That's where you come in. As our first dedicated Site Reliability Engineer , you'll be the force multiplier who designs, builds, ... whole team while keeping us four steps ahead of failure. YOUR MISSION Own reliability from design to customer. Define and uphold SLOs / SLIs, manage error budgets,… more
- xAI (San Francisco, CA)
- …able to concisely and accurately share knowledge with their teammates. About the role As a Site Reliability Storage Engineer , you will play a pivotal role in ... to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …supports everything from rapid experimentation to full-scale model training and inference , with flexible orchestration via Slurm, Kubernetes, or direct SSH access. ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Baseten (San Francisco, CA)
- …we're scaling our team to meet accelerating customer demand. The Role As a Site Reliability Engineer , you'll envision and build robust systems and ... About Baseten Baseten powers inference for the world's most dynamic AI companies,...machine learning models. Establish standards and best practices for reliability and performance across the infrastructure. Automate processes when… more
- Chef Robotics (San Francisco, CA)
- …tech leaders from leading companies. About The Role As a Senior Software Engineer , Backend specializing in database architecture and AI systems, you will lead the ... and architecture for real-time robotics operations. As a senior engineer , you will mentor team members and drive technical...data storage and retrieval systems for training datasets and inference results Design and implement systems to collect and… more
- SproutsAI (San Francisco, CA)
- …. Get AI-powered advice on this job and more exclusive features. Senior Software Engineer - Chennai (On- site ) About the Role We're a fast-moving, top-tier ... We thrive in ambiguity, ship fast, and care deeply about reliability , security, and craftsmanship. You'll work end-to-end-from whiteboard to production-owning… more
- Amazon (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) Job ID: 3003288 | Amazon.com Services LLC The Artificial General Intelligence (AGI) team is looking for ... a passionate, talented, and inventive Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper‑scalable, general‑purpose large model training and inference systems to develop and deploy cutting‑edge sensory AI… more
- NLP PEOPLE (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive ... Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI… more
- Cisco (San Francisco, CA)
- Machine Learning Engineer PhD (Full Time) - United States Get AI-powered advice on this job and more exclusive features. Please note this posting is to advertise ... distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned,… more
- Cisco Systems (San Francisco, CA)
- …distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned, ... academic or professional projects. Preferred Qualifications Experience working with inference engines (eg, vLLM, Triton, TorchServe). Knowledge of GPU architecture… more
- Zyphra Technologies Inc. (Palo Alto, CA)
- …company based in Palo Alto, California. The Role: As a Machine Learning Data Engineer - Systems & Retrieval , you will build and optimize the data infrastructure ... role in architecting retrieval systems for LLMs and enabling scalable training and inference with clean, accessible, and secure data. You'll have an impact across… more
- Expedia, Inc. (San Jose, CA)
- …career journey. We're building a more open world. Join us. Machine Learning Engineer III Introduction to the Team: Expedia Technology teams partner with our Product ... that drive loyalty and traveler satisfaction. We are seeking a Machine Learning Engineer III to join our Universal Messaging Platform team. This role will focus… more
- Amazon (Palo Alto, CA)
- …the future of advertising. Key job responsibilities As a Software Development Engineer in Machine Learning, you will: * Enhance the scalability, automation, and ... efficiency of large-scale training and real-time inference systems. * Pioneer the development of LLM ...for action. We are looking for a talented Software Engineer with a strong background in machine learning engineering… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role ... clusters, storage solutions, and networking to ensure optimal performance, scalability, and reliability for all our AI workloads. + Cloud AI Service Integration:… more
- Amazon (San Francisco, CA)
- …breakthrough foundation models run at production scale. As a Software Development Engineer embedded in our science team, you'll be instrumental in transforming novel ... applications, leveraging your expertise in CUDA and TensorRT to achieve unprecedented inference efficiency at Amazon scale. In this role, you'll balance deep… more
- Amazon (San Francisco, CA)
- …foundation models run at production scale. As a Senior Machine Learning Engineer embedded in our science team, you'll be instrumental in transforming cutting-edge ... your expertise in CUDA and TensorRT to achieve unprecedented inference efficiency at Amazon scale. In this role, you'll...5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience… more