- Apple Inc. (Cupertino, CA)
- …California seeks a senior /principal engineer to architect and build distributed ML infrastructure. This role involves optimizing GPU compute systems and ... to enhance AI capabilities. Candidates should have substantial experience in GPU programming and distributed systems, alongside a technical degree. The compensation… more
- Genesis Therapeutics Inc. (Burlingame, CA)
- A biotechnology company in Burlingame is seeking experienced ML infrastructure engineers to lead engineering efforts on their AI platform focused on generative ... building distributed training pipelines. Candidates should have experience with PyTorch and GPU clusters, alongside a passion for AI-driven drug discovery. The role… more
- Pathway Genomics Corporation (Palo Alto, CA)
- A leading AI startup is seeking a Senior ML Infrastructure / DevOps Engineer in Palo Alto. This role involves designing and maintaining GPU and CPU clusters ... for ML training, automating infrastructure provisioning, and managing robust ML pipelines. The ideal candidate has 5+ years in DevOps/SRE roles, deep familiarity… more
- GEICO (Palo Alto, CA)
- …and Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform Engineer to build and scale our machine learning ... LLMs (Llama, Mistral, Gemma, etc.)* Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization*… more
- Lavendo (San Francisco, CA)
- …Computing, Infrastructure-as-Code Candidate Location: Remote US The Opportunity We are seeking a Senior AI / ML Specialist Solutions Architect to join our ... Proven expertise in scaling and optimizing AI workloads across multi-node and multi- GPU environments Demonstrated success delivering ML products, scaling from… more
- NVIDIA Corporation (Santa Clara, CA)
- …with teams with varied strengths including GPU Compute, Distributed Systems, Networking, ML Infra , AI Platform, and Cloud Services to ensure engineers have ... Senior Software Engineer, Observability page is loaded## ...cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ ML , HPC clusters, GPU infrastructure)* Embedding security… more
- Voxel (San Francisco, CA)
- …workplace safety. The role involves managing data labeling pipelines, building multi- GPU training frameworks, and leading the model lifecycle. This position offers ... a competitive salary, extensive health benefits, and a dynamic work environment focused on revolutionary workplace safety technology. #J-18808-Ljbffr more
- Google Inc. (Sunnyvale, CA)
- Senior Software Engineer, ML Infrastructure, Cloud AI...compiler and runtimes interact at a high level. Close infra gaps to help with end‑to‑end ML stack ... the Research to Production pipeline for both Training and Serving use cases for TPU, GPU and CPU accelerators. In this role, you will work on projects that improve… more
- jobr.pro (Sunnyvale, CA)
- …feedback. Understand how accelerator compiler and runtimes interact at a high level. Close infra gaps to help with end-to-end ML stack maturation (eg, reduce ... the Research to Production pipeline for both Training and Serving use cases for TPU, GPU and CPU accelerators. In this role, you will work on projects that improve… more
- Epoch Biodesign (San Francisco, CA)
- …this role, you will work closely with strategic enterprise customers deploying AI/ ML workloads on high-performance GPU infrastructure. Candidates should have ... deep expertise in Kubernetes and MLOps, along with strong communication skills to effectively engage with stakeholders and optimize customer deployments across cloud platforms. This position offers a hybrid work schedule and competitive salary. #J-18808-Ljbffr more
- Lavendo (San Francisco, CA)
- About the Company Our client is an AI infrastructure company providing NVIDIA GPU cloud platforms (Blackwell, Hopper) and managed services to AI labs, software ISVs, ... Company type: Publicly traded, 245% stock growth Industry: Cloud Computing, AI/ ML , Infrastructure-as-Code Candidate location: Remote US The Mission Our client is… more
- NVIDIA Corporation (Santa Clara, CA)
- …Physics, Mathematics, or a related technical field.* 8+ years of hands-on validated ML /DL Infra experience in Autonomous Vehicles with focus on improving compute ... AI models to ensure the best performance on current- and next-generation GPU architectures.* Build collateral (notebooks, github repos, demos, etc.) applied to… more
- Crusoe Energy Systems LLC (San Francisco, CA)
- …is seeking a Senior Solutions Engineer to lead deployments of AI/ ML workloads on high-performance GPU infrastructure. The role demands deep expertise ... in Kubernetes and customer-facing technical confidence. Ideal candidates will drive innovation and must have strong Linux skills. This opportunity offers competitive compensation and rich benefits including stock options, health insurance, and paid parental… more
- Voxel (San Francisco, CA)
- …(YOLO, DETR, Faster RCNN) or video understanding at scale. Contributions to open-source ML infra projects or published talks/blogs on MLOps. Exposure to ... Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad. Establish DevOps-for- ML best practices (IaC,… more
- Menlo Ventures (San Francisco, CA)
- …reliability, latency, and efficiency of distributed AI workloads Collaborate with platform, infra , and ML teams to deliver seamless end-to-end experiences Shape ... not just any solution Bonus points for: Experience with real-time serving, ML infrastructure, or GPU orchestration Exposure to platforms like SageMaker,… more
- Lambda Inc. (San Francisco, CA)
- …as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU . If you'd like to build the world's best AI cloud, join us. *Note: ... IT. About the Role We are seeking an experienced Senior Business Systems Architect to lead and innovate across...inventory, GL, and physical deployment. Partner with Data Center Infra Ops, Inventory Control, Accounting, and FP&A teams to… more
- Decagon AI, Inc. (San Francisco, CA)
- …powering analytics/BI and customer‑facing telemetry, including for customer‑managed and on‑prem environments. ML Infra : GPU and model‑serving platforms for ... team builds and operates the foundations that power Decagon: networking, data, ML serving, developer platform, and real‑time voice. We partner closely with product,… more
- Fluidstack (San Francisco, CA)
- About Fluidstack Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, ... and network overlays (VXLAN, Geneve). Fluency in Python, Go or Rust; solid Infra -as-Code & CI/CD chops. Familiarity with DPDK, XDP, eBPF and InfiniBand/RoCE. Proven… more
- Oracle (Santa Clara, CA)
- …is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These ... are essential for running distributed AI/ ML /HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband. **Why Join Us?** + Innovative… more
- Google (Sunnyvale, CA)
- Senior Software Engineer, ML Infrastructure, Cloud AI...and runtimes interact at a high level. + Close infra gaps to help with end-to-end ML stack ... the Research to Production pipeline for both Training and Serving use cases for TPU, GPU and CPU accelerators. In this role, you will work on projects that improve… more