• Advanced Micro Devices (San Jose, CA)
    A leading technology company is seeking a Principal / Senior GPU Software Performance Engineer in San Jose, CA. This role involves optimizing GPU ... systems. The ideal candidate will have strong skills in GPU performance engineering and experience with deep learning frameworks, particularly PyTorch.… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • Advanced Micro Devices (San Jose, CA)
    …Join us as we shape the future of AI and beyond. Principal / Senior GPU Software Performance Engineer - Post‑ Training THE ROLE Drive the performance ... ideal candidate is passionate about software engineering and the craft of training performance . You lead sophisticated cross‑stack issues-spanning data loaders,… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • BlackLine (Pleasanton, CA)
    …Management Design and manage training infrastructure including distributed training orchestration, GPU /TPU resource allocation, and automatic scaling. ... set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs). Mentor senior engineers and drive design reviews for ML pipelines, model registries,… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • BlackLine (Pleasanton, CA)
    …Management Design and manage training infrastructure including distributed training orchestration, GPU /TPU resource allocation, and automatic scaling. ... set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs). Mentor senior engineers and drive design reviews for ML pipelines, model registries,… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • San Francisco Compute Co. (San Francisco, CA)
    …provider located in San Francisco is looking for a dedicated professional to manage GPU training clusters. You will ensure the smooth operation of high- ... performance computing systems, engage in customer interactions, and utilize...Applicants must have experience with Linux, networking fundamentals, and GPU clusters. The role offers a salary ranging from… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Databricks Inc. (San Francisco, CA)
    …looking for a research engineer to enhance deep learning techniques and optimize performance on NVIDIA architectures. Ideal candidates will have a PhD in Computer ... Science and experience with CUDA and distributed training frameworks. Join our diverse team and take part in groundbreaking AI developments within an inclusive and… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • CareerArc (San Jose, CA)
    …the future of AI and beyond. Together, we advance your career. THE ROLE: The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a ... to system triage, at‑scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Troveo Ai (San Francisco, CA)
    …for dissecting performance in large distributed systems; familiarity with multi‑ GPU training and vector databases preferred. Excellent communication to ... Infrastructure & Optimization - Scale ML infrastructure on AWS, leveraging multi‑ GPU clusters and distributed training to accelerate embedding computation… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • General Motors (San Francisco, CA)
    A leading automotive company is seeking a Senior AI/ML Tooling Engineer in San Francisco, California. In this role, you will develop internal ML tooling to optimize ... model training and inference, working closely with cross-functional teams. The...in frameworks like PyTorch and TensorFlow. Competitive edge in GPU programming and ML hardware architecture is a plus.… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Hobbsnews (San Jose, CA)
    …in LLM model training , evaluation, inference optimization and parallelization in a GPU cluster. At least 3 years of experience working with AWS or equivalent ... Senior Manager, Data Science - GenAI Digital Assistant...for business‑specific applications and features. Drive innovation by designing, training , evaluating, and deploying state‑of‑the‑art NLP models, partnering with… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Hamilton Barnes Associates Limited (San Francisco, CA)
    …for experimentation, full-scale model training , or inference. Our client operates high- performance GPU clusters powering some of the most advanced AI ... workloads. Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts. Integrate, tune, and operate inference… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Crusoe Energy Systems LLC (San Francisco, CA)
    …infrastructure to handle millions of API requests per second. Optimize AI inference performance on GPU -based systems. Implement robust monitoring and alerting to ... About This Role: As a Senior Staff Software Engineer on the Managed AI...open-source AI projects such as VLLM or similar frameworks. Performance optimizations on GPU systems and inference… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Lavendo (San Francisco, CA)
    …artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clusters, cloud platforms, tools, and services for developers to ... Location: Remote US The Opportunity We are seeking a Senior AI / ML Specialist Solutions Architect to join...client's team. What You'll Do Architect and optimize distributed training and inference systems for large-scale AI models Design… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Menlo Ventures (San Francisco, CA)
    …customer or business value. Experience building architecture for large-scale, performance -sensitive CPU/ GPU inference systems. Strong communication skills and ... to operationalize models at scale with strong SLAs and cost efficiency. As a Senior Engineer, you'll play a critical role in shaping both the product experience and… more
    job goal (01/12/26)
    - Save Job - Related Jobs - Block Source
  • Menlo Ventures (San Francisco, CA)
    …and tensorization for training ‑specific patterns Design, implement, and optimize high‑ performance GPU kernels for training workloads (eg, attention ... gradient synchronization and collective operations Profile, debug, and optimize end‑to‑end training workflows to identify and resolve performance bottlenecks,… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Crusoe Energy Systems LLC (San Francisco, CA)
    …focused on Operational Excellence, you will help ensure the stability, resilience, and performance of Crusoe's GPU cloud. This role is ideal for engineers ... improvement across a large-scale distributed platform. You'll partner closely with senior SREs, infrastructure engineers, and platform teams to improve reliability,… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • King River Capital Group (San Francisco, CA)
    …out before, during, and after playing games. Discord is looking for a Senior Software Engineer to build high- performance , cross-platform client software that ... platform-specific APIs to provide optimal experiences wherever people use Discord. As a Senior Software Engineer on our A/V Client team, you'll tackle challenging … more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Crusoe Energy Systems LLC (San Francisco, CA)
    …to open-source AI projects such as VLLM or similar frameworks. (Preferred) Performance optimizations on GPU systems and inference frameworks. (Preferred) Soft ... is recognized as the "gold standard" for reliability and performance . Our data centers are optimized for AI workloads...Cloud Managed AI team seeks an ambitious and experienced Senior Software Engineer to join their team. You'll have… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Datacrunch (San Francisco, CA)
    …and infrastructure‑as‑code tools (Terraform, Ansible). Experience managing Slurm-based HPC GPU clusters, diagnosing performance issues, and designing efficient ... & CEO - to align on vision and expectations. About the role We're seeking a Senior or Principal Site Reliability Engineer (SRE) to become our first US hire, based in… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Genentech (San Francisco, CA)
    …statistics, machine learning theory, and algorithms. Strong knowledge of ML performance optimization, GPU best practices. Experience with kubernetes, relational ... optimise workflows. We also work on scaling up model training and inference, evaluating the quality of AI/ML models...and backend systems. Build tools to evaluate AI/ML model performance and establish new ways to understand AI quality.… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source