- Advanced Micro Devices (San Jose, CA)
- A leading technology company is seeking a Principal / Senior GPU Software Performance Engineer in San Jose, CA. This role involves optimizing GPU ... systems. The ideal candidate will have strong skills in GPU performance engineering and experience with deep learning frameworks, particularly PyTorch.… more
- Advanced Micro Devices (San Jose, CA)
- …Join us as we shape the future of AI and beyond. Principal / Senior GPU Software Performance Engineer - Post‑ Training THE ROLE Drive the performance ... ideal candidate is passionate about software engineering and the craft of training performance . You lead sophisticated cross‑stack issues-spanning data loaders,… more
- BlackLine (Pleasanton, CA)
- …Management Design and manage training infrastructure including distributed training orchestration, GPU /TPU resource allocation, and automatic scaling. ... set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs). Mentor senior engineers and drive design reviews for ML pipelines, model registries,… more
- BlackLine (Pleasanton, CA)
- …Management Design and manage training infrastructure including distributed training orchestration, GPU /TPU resource allocation, and automatic scaling. ... set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs). Mentor senior engineers and drive design reviews for ML pipelines, model registries,… more
- San Francisco Compute Co. (San Francisco, CA)
- …provider located in San Francisco is looking for a dedicated professional to manage GPU training clusters. You will ensure the smooth operation of high- ... performance computing systems, engage in customer interactions, and utilize...Applicants must have experience with Linux, networking fundamentals, and GPU clusters. The role offers a salary ranging from… more
- Databricks Inc. (San Francisco, CA)
- …looking for a research engineer to enhance deep learning techniques and optimize performance on NVIDIA architectures. Ideal candidates will have a PhD in Computer ... Science and experience with CUDA and distributed training frameworks. Join our diverse team and take part in groundbreaking AI developments within an inclusive and… more
- CareerArc (San Jose, CA)
- …the future of AI and beyond. Together, we advance your career. THE ROLE: The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a ... to system triage, at‑scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to… more
- Troveo Ai (San Francisco, CA)
- …for dissecting performance in large distributed systems; familiarity with multi‑ GPU training and vector databases preferred. Excellent communication to ... Infrastructure & Optimization - Scale ML infrastructure on AWS, leveraging multi‑ GPU clusters and distributed training to accelerate embedding computation… more
- General Motors (San Francisco, CA)
- A leading automotive company is seeking a Senior AI/ML Tooling Engineer in San Francisco, California. In this role, you will develop internal ML tooling to optimize ... model training and inference, working closely with cross-functional teams. The...in frameworks like PyTorch and TensorFlow. Competitive edge in GPU programming and ML hardware architecture is a plus.… more
- Hobbsnews (San Jose, CA)
- …in LLM model training , evaluation, inference optimization and parallelization in a GPU cluster. At least 3 years of experience working with AWS or equivalent ... Senior Manager, Data Science - GenAI Digital Assistant...for business‑specific applications and features. Drive innovation by designing, training , evaluating, and deploying state‑of‑the‑art NLP models, partnering with… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …for experimentation, full-scale model training , or inference. Our client operates high- performance GPU clusters powering some of the most advanced AI ... workloads. Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts. Integrate, tune, and operate inference… more
- Crusoe Energy Systems LLC (San Francisco, CA)
- …infrastructure to handle millions of API requests per second. Optimize AI inference performance on GPU -based systems. Implement robust monitoring and alerting to ... About This Role: As a Senior Staff Software Engineer on the Managed AI...open-source AI projects such as VLLM or similar frameworks. Performance optimizations on GPU systems and inference… more
- Lavendo (San Francisco, CA)
- …artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clusters, cloud platforms, tools, and services for developers to ... Location: Remote US The Opportunity We are seeking a Senior AI / ML Specialist Solutions Architect to join...client's team. What You'll Do Architect and optimize distributed training and inference systems for large-scale AI models Design… more
- Menlo Ventures (San Francisco, CA)
- …customer or business value. Experience building architecture for large-scale, performance -sensitive CPU/ GPU inference systems. Strong communication skills and ... to operationalize models at scale with strong SLAs and cost efficiency. As a Senior Engineer, you'll play a critical role in shaping both the product experience and… more
- Menlo Ventures (San Francisco, CA)
- …and tensorization for training ‑specific patterns Design, implement, and optimize high‑ performance GPU kernels for training workloads (eg, attention ... gradient synchronization and collective operations Profile, debug, and optimize end‑to‑end training workflows to identify and resolve performance bottlenecks,… more
- Crusoe Energy Systems LLC (San Francisco, CA)
- …focused on Operational Excellence, you will help ensure the stability, resilience, and performance of Crusoe's GPU cloud. This role is ideal for engineers ... improvement across a large-scale distributed platform. You'll partner closely with senior SREs, infrastructure engineers, and platform teams to improve reliability,… more
- King River Capital Group (San Francisco, CA)
- …out before, during, and after playing games. Discord is looking for a Senior Software Engineer to build high- performance , cross-platform client software that ... platform-specific APIs to provide optimal experiences wherever people use Discord. As a Senior Software Engineer on our A/V Client team, you'll tackle challenging … more
- Crusoe Energy Systems LLC (San Francisco, CA)
- …to open-source AI projects such as VLLM or similar frameworks. (Preferred) Performance optimizations on GPU systems and inference frameworks. (Preferred) Soft ... is recognized as the "gold standard" for reliability and performance . Our data centers are optimized for AI workloads...Cloud Managed AI team seeks an ambitious and experienced Senior Software Engineer to join their team. You'll have… more
- Datacrunch (San Francisco, CA)
- …and infrastructure‑as‑code tools (Terraform, Ansible). Experience managing Slurm-based HPC GPU clusters, diagnosing performance issues, and designing efficient ... & CEO - to align on vision and expectations. About the role We're seeking a Senior or Principal Site Reliability Engineer (SRE) to become our first US hire, based in… more
- Genentech (San Francisco, CA)
- …statistics, machine learning theory, and algorithms. Strong knowledge of ML performance optimization, GPU best practices. Experience with kubernetes, relational ... optimise workflows. We also work on scaling up model training and inference, evaluating the quality of AI/ML models...and backend systems. Build tools to evaluate AI/ML model performance and establish new ways to understand AI quality.… more