- Jobright.ai (San Francisco, CA)
- Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site ... for building, testing, and deploying AI products at scale. The Site Reliability Engineer - Inference will work on developing a large-scale platform for… more
- Hamilton Barnes ? (San Francisco, CA)
- …to go for experimentation, full-scale model training, or inference . As a Platform Engineer /Senior Site Reliability Engineer , you'll own the ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Sierra Business Solution (San Francisco, CA)
- Software Engineer , Site Reliability (SRE) Software Engineer , Site Reliability (SRE) at Sierra Business Solution . About Us We are an in‑person ... and best practices. What You'll Bring 5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.… more
- Sierra (San Francisco, CA)
- …led the product and design teams for Google Workspace. What you'll do As a Software Engineer on our Site Reliability team at Sierra, you will be responsible ... the engineering org. What you'll bring 5+ years of hands‑on experience in Site Reliability or Infrastructure engineering roles for complex SaaS or cloud‑based… more
- Primer (San Francisco, CA)
- …, and incidents stay rare . That's where you come in. As our first dedicated Site Reliability Engineer , you'll be the force multiplier who designs, builds, ... whole team while keeping us four steps ahead of failure. YOUR MISSION Own reliability from design to customer. Define and uphold SLOs / SLIs, manage error budgets,… more
- xAI (San Francisco, CA)
- …able to concisely and accurately share knowledge with their teammates. About the role As a Site Reliability Storage Engineer , you will play a pivotal role in ... to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …supports everything from rapid experimentation to full-scale model training and inference , with flexible orchestration via Slurm, Kubernetes, or direct SSH access. ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Baseten (San Francisco, CA)
- …we're scaling our team to meet accelerating customer demand. The Role As a Site Reliability Engineer , you'll envision and build robust systems and ... About Baseten Baseten powers inference for the world's most dynamic AI companies,...machine learning models. Establish standards and best practices for reliability and performance across the infrastructure. Automate processes when… more
- Chef Robotics (San Francisco, CA)
- …tech leaders from leading companies. About The Role As a Senior Software Engineer , Backend specializing in database architecture and AI systems, you will lead the ... and architecture for real-time robotics operations. As a senior engineer , you will mentor team members and drive technical...data storage and retrieval systems for training datasets and inference results Design and implement systems to collect and… more
- SproutsAI (San Francisco, CA)
- …. Get AI-powered advice on this job and more exclusive features. Senior Software Engineer - Chennai (On- site ) About the Role We're a fast-moving, top-tier ... We thrive in ambiguity, ship fast, and care deeply about reliability , security, and craftsmanship. You'll work end-to-end-from whiteboard to production-owning… more
- Amazon (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) Job ID: 3003288 | Amazon.com Services LLC The Artificial General Intelligence (AGI) team is looking for ... a passionate, talented, and inventive Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper‑scalable, general‑purpose large model training and inference systems to develop and deploy cutting‑edge sensory AI… more
- NLP PEOPLE (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive ... Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI… more
- Cisco (San Francisco, CA)
- Machine Learning Engineer PhD (Full Time) - United States Get AI-powered advice on this job and more exclusive features. Please note this posting is to advertise ... distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned,… more
- Expedia, Inc. (San Jose, CA)
- …career journey. We're building a more open world. Join us. Machine Learning Engineer III Introduction to the Team: Expedia Technology teams partner with our Product ... that drive loyalty and traveler satisfaction. We are seeking a Machine Learning Engineer III to join our Universal Messaging Platform team. This role will focus… more
- Cisco Systems (San Jose, CA)
- …distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned, ... academic or professional projects. Preferred Qualifications Experience working with inference engines (eg, vLLM, Triton, TorchServe). Knowledge of GPU architecture… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role ... clusters, storage solutions, and networking to ensure optimal performance, scalability, and reliability for all our AI workloads. + Cloud AI Service Integration:… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …innovative results in real-world settings. Core Expertise + Statistical inference : significance testing (p-values, confidence intervals), Bayesian statistics, design ... Carlo methods (random sampling, density estimation). + Rare-event and reliability analysis (a plus): importance sampling, subset simulation, cross-entropy methods,… more
- Cisco (Milpitas, CA)
- …distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine-tuned, ... academic or professional projects. **Preferred Qualifications** + Experience working with inference engines (eg, vLLM, Triton, TorchServe). + Knowledge of GPU… more
- Cisco (San Jose, CA)
- …distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine-tuned, ... or academic/research project documentation. **Preferred Qualifications** + Experience with inference engines such as vLLM, Triton, or TorchServe. + Knowledge… more
- Amazon (Mountain View, CA)
- …experience - 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience - Bachelor's degree ... learning and/or machine learning methods (eg for training, fine tuning, and inference ) - Hands-on experience with generative AI technology Preferred Qualifications -… more