- Together AI (San Francisco, CA)
- About the Role Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated ... that handle high-load and high-performance requirements. If you are passionate about designing ML systems that operate at scale and eager to create impactful… more
- GEICO (Palo Alto, CA)
- …Experience with Staff Engineer or Tech Lead roles in ML /AI organizations* Background in distributed systems and high-performance computing* Open-source ... ML Infrastructure team is seeking an exceptional Senior ML Platform Engineer to build and scale...DevOps* Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions* Ensure ML platforms… more
- Rivet Industries, Inc. (Palo Alto, CA)
- Machine Learning Researcher / ML -Ops Engineer About Rivet Rivet is an American company building integrated task systems - fusing hardened hardware with ... software, sensors, AI, and networking - for industrial workforces and defense personnel. We...all else Role Description We are seeking a talented ML Research Engineer to advance our computer… more
- Google (Mountain View, CA)
- Software Engineering Manager II, AI/ ML , YouTube Software Engineering Manager II, AI/ ML , YouTube 1 week ago Be among the first 25 applicants Bachelor's degree or ... and responding to the human voice), reinforcement learning (eg, sequential decision making), ML infrastructure, or specialization in another ML field. 2 years of… more
- Google Inc. (Mountain View, CA)
- Software Engineering Manager II, AI/ ML , YouTube corporate_fare YouTube place Mountain View, CA, USA Apply Bachelor's degree or equivalent practical experience. 8 ... and responding to the human voice), reinforcement learning (eg, sequential decision making), ML infrastructure, or specialization in another ML field. 2 years of… more
- Google (Mountain View, CA)
- …code to solve ambiguous problems. Lead the design and implementation of recommendation systems , optimize ML infrastructure, and guide the development of model ... Software Engineering Manager II, AI/ ML Recommendations, Rankings, Predictions, YouTube 3 days ago...tuning). 2 years of experience building and deploying recommendation systems models (retrieval, prediction, ranking, embedding) in production and… more
- Google Inc. (Mountain View, CA)
- …code to solve ambiguous problems. Lead the design and implementation of recommendation systems , optimize ML infrastructure, and guide the development of model ... Software Engineering Manager II, AI/ ML Recommendations, Rankings, Predictions, YouTube YouTube Mountain View, CA, USA Advanced Experience owning outcomes and… more
- Atlassian (San Francisco, CA)
- …to leverage the power of AI & ML without any complications. As a Machine Learning Systems Engineer on the AI & ML Platform team, you will build and scale ... products with AI products outside Atlassian. About AI & ML Platform Team Our team is building the foundations...models and pipelines. Along with that, you will build systems for product teams like Jira & Confluence to… more
- Mundanelabs (Palo Alto, CA)
- Principal Software Engineer , Embodied Systems Mundane is a venture-backed seed-stage robot learning startup founded by a team of Stanford researchers and ... and dreamers. About the Role As a Principal Software Engineer on the Embodied Systems team ,...every layer of embodied intelligence - hardware, controls, perception, ML , teleop, and fleet systems . A chance… more
- NVIDIA Corporation (Santa Clara, CA)
- …enables efficient, resilient deployment of cutting-edge LLM workloads.We are seeking a Principal Systems Engineer to define the vision and roadmap for memory ... Engineer - Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer ...equivalent experience* 15+ years of experience building large-scale distributed systems , high-performance storage, or ML systems… more
- Gauss Labs (Palo Alto, CA)
- …processing, model training, evaluation, and deployment Own the end-to-end implementation of ML systems from research prototypes to production-grade code Optimize ... (eg, batch, real-time, or edge deployments) Experience in distributed/parallel systems , information retrieval, networking , and systems...- $202,000.00 2 weeks ago 2025 University Graduate - AI/ ML Engineer San Jose, CA $85,500 -… more
- NVIDIA (Santa Clara, CA)
- …efficient, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management ... or PhD or equivalent experience 15+ years of experience building large-scale distributed systems , high-performance storage, or ML systems infrastructure in… more
- E2b (San Francisco, CA)
- …in Go, Terraform ✅ Skills : Go, Building and managing large clusters, Linux, Networking , Kubernetes, Virtualization Who we are E2B is a fast growing Series A startup ... at the kernel level of virtual machines We're looking for an infrastructure engineer passionate about making things run fast and efficiently, and running A LOT… more
- TrustIn (San Francisco, CA)
- Staff Distributed Systems Engineer - AI Infrastructure (Open Source) We're an early‑stage, well‑funded AI infrastructure company operating in stealth. Our ... the AI developer ecosystem. We're hiring a Staff Distributed Systems Engineer to help design and scale...with Go, Rust, or C++ Solid understanding of concurrency, networking , and distributed algorithms Experience with large‑scale systems… more
- Slab Inc. (Palo Alto, CA)
- …model training, evaluation, and deployment. Own the end-to-end implementation of ML systems from research prototypes to production-grade code. Optimize ... environments (eg, batch, real-time, or edge deployments). Experience in distributed/parallel systems , information retrieval, networking , and systems software… more
- krea.ai (San Francisco, CA)
- …research, real‑time user experiences, and large‑scale model deployments. As a Distributed Systems Engineer , you will design, build, and maintain large‑scale ... clusters, ensuring efficient and fault‑tolerant operations. You will collaborate closely with ML engineers and researchers to architect systems that enable rapid… more
- Jobright.ai (San Francisco, CA)
- Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability ... Engineer - Inference role at Jobright.ai Get AI-powered advice...Job Summary: Lambda is the #1 GPU Cloud for ML /AI teams, providing tools for building, testing, and deploying… more
- CompScience (San Francisco, CA)
- …AWS, with a heavy emphasis on scalable data engineering and reliable, cost-efficient ML systems . Implement and manage high-throughput, event-driven ML ... are looking for an experienced and self-motivated Sr MLOps Engineer to join our growing team and take ownership...systems that automate the entire lifecycle of our ML models-from data pipelines and training to deployment and… more
- Hamilton Barnes ? (San Francisco, CA)
- …and auto-healing systems for high-availability GPU workloads. Collaborate with ML , networking , and platform teams to optimise resource scheduling, GPU ... for cluster orchestration and workload management. Deep knowledge of Linux systems , networking , and GPU infrastructure (NVIDIA H100/H200/B200 preferred).… more
- ngrok Inc. (San Francisco, CA)
- …software in Go or languages like Rust, C, Java, or C++. Understand networking and distributed systems fundamentals (protocols, TLS/mTLS, proxies, load balancers, ... Software Engineer III/Senior, AI Gateway About ngrok Inc. At ngrok, we believe that doing networking the right way should also be the easy way. Over the last 10… more