- General Motors (Sunnyvale, CA)
- …the business. **This job is eligible for relocation assistance.** **About the Team:** The ML Inference Platform is part of the AI Compute Platforms organization ... efficiency. **About the Role:** We are seeking a Staff ML Infrastructure engineer to help build and...shaping the architecture, roadmap and user-experience of a robust ML inference service supporting real-time, batch, and… more
- Amazon (Seattle, WA)
- …integrates with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and ... learning and GenAI workloads on Amazon's Inferentia and Trainium ML accelerators. This comprehensive toolkit includes an ML...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
- Amazon (Seattle, WA)
- …integrates with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and ... learning and GenAI workloads on Amazon's Inferentia and Trainium ML accelerators. This comprehensive toolkit includes an ML...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
- Amazon (Cupertino, CA)
- …applications. Key job responsibilities * Architect and lead the design of distributed ML serving systems optimized for generative AI workloads * Drive technical ... the boundaries of what's possible in large-scale ML serving. Recent shares: https://github.com/aws-neuron/upstreaming-to-vllm/releases/tag/2.25.0 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd- inference… more
- Amazon (Seattle, WA)
- …the Trn2 and future Trn3 servers that use them. This role is for a software engineer in the Machine Learning Applications ( ML Apps) team for AWS Neuron. This ... enables and performance tunes building blocks for all key ML model families, including Llama3, GPT OSS, Qwen3, DeepSeek...Llama3, GPT OSS, Qwen3, DeepSeek and beyond. The Neuron Inference Technology team works side by side with the… more
- Amazon (Seattle, WA)
- …Trainium cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is ... for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc.… more
- NVIDIA (Santa Clara, CA)
- …highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You'll architect ... that pushes the pareto frontier for the field of ML Systems ; survey recent publications and find...theories. + Knowledgeable and passionate about performance engineering in ML frameworks (eg, PyTorch) and inference engines… more
- MongoDB (Palo Alto, CA)
- …in multi-tenant environments + 1+ years of experience serving as TL for a large-scale ML inference or training platform SW project **Nice to Have** + Prior ... We're looking for a Lead Engineer , Inference Platform to join our...of experience in managing a technical team focused on ML inference or training infrastructure **Why Join… more
- MongoDB (Palo Alto, CA)
- …for developer-first experiences. As a Senior Engineer , you'll focus on building core systems and services that power model inference at scale. You'll own key ... **About the Role** We're looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic… more
- NVIDIA (CA)
- …benchmarking, automation, and documentation processes to ensure low-latency, robust, and production-ready inference systems on GPU clusters. What we need to see: ... systems , including Rust-based runtime components, for large-scale AI inference workloads. + Implement inference scheduling and deployment solutions… more
- Red Hat (Boston, MA)
- …collaborate with our team to tackle the most pressing challenges in scalable inference systems and Kubernetes-native deployments. Your work with distributed ... bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI… more
- Capital One (San Francisco, CA)
- Lead AI Engineer (FM Hosting, LLM Inference ) **Overview** At Capital One, we are creating responsible and reliable AI systems , changing banking for good. For ... cost, latency, throughput - of large scale production AI systems . + Contribute to the technical vision and the...and supporting AI services + Experience developing AI and ML algorithms or technologies (eg LLM Inference ,… more
- NVIDIA (Santa Clara, CA)
- …open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative generative AI models like LLMs, VLMs, multimodal and ... as large language models (LLM) and diffusion models for maximal inference efficiency using techniques ranging from quantization, speculative decoding, sparsity,… more
- Amazon (Cupertino, CA)
- …and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications ( ML Apps) team for AWS Neuron. This role ... for development, enablement and performance tuning of a wide variety of ML model families, including massive scale large language models like Llama2, GPT2,… more
- NVIDIA (CA)
- …including debugging, performance analysis, and test design. + Experience with high scale distributed systems and ML systems Ways to stand out from the ... We are now looking for a Senior System Software Engineer to work on Dynamo & Triton Inference...crowd: + Prior work experience improving performance of AI inference systems . + Background with deep learning… more
- Red Hat (Raleigh, NC)
- …the power of open-source LLMs and vLLM to every enterprise. The Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... optimize, and scale LLM deployments. As a Machine Learning Engineer focused on vLLM, you will be at the...you. Join us in shaping the future of AI Inference ! **What You Will Do** + Write robust Python… more
- Bank of America (Addison, TX)
- Senior Engineer -AI Inference Addison, Texas;Plano, Texas; Newark, Delaware; Charlotte, North Carolina; Kennesaw, Georgia **To proceed with your application, you ... must be at least 18 years of age.** Acknowledge (https://ghr.wd1.myworkdayjobs.com/Lateral-US/job/Addison/Senior- Engineer -AI- Inference \_25029879) **Job Description:** At Bank of America,… more
- Red Hat (Boston, MA)
- …bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... (https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/#the-top-open-source-projects-by-contributors) on Github. As a Machine Learning Engineer focused on vLLM, you will be… more
- Guidehouse (Huntsville, AL)
- …to 10% **Clearance Required** **:** Active Top Secret (TS) Guidehouse is seeking a Lead AI/ ML Engineer to join our Technology / AI and Data team, supporting ... AI solutions that leverage large language models (LLMs), retrieval systems , and secure, scalable inference pipelines to...You Will Do** **:** + Serves as the lead AI/ ML engineer responsible for developing, optimizing, and… more
- Amazon (Seattle, WA)
- …in Python and ML framework internals * Strong understanding of distributed systems and ML optimization * Passion for performance tuning and system ... AI accelerators Inferentia and Trainium. As a Senior Software Engineer in our Machine Learning Applications team, you'll be...architects to influence the future of AI hardware. Our systems handle millions of inference calls daily,… more