- NVIDIA Corporation (Santa Clara, CA)
- We are now looking for a TensorRT - LLM Software Development Engineer ! NVIDIA is hiring software engineers for its TensorRT - LLM team. Academic and ... high-quality (C++/Python) code for our core backend software for LLM inference. Improve the usability of the TensorRT... LLM inference. Improve the usability of the TensorRT - LLM library and build systems (CMake) What… more
- NVIDIA Corporation (Santa Clara, CA)
- Principal Software Engineer - Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer - Large-Scale LLM Memory and ... of any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads.We are seeking a Principal Systems Engineer to define the… more
- NVIDIA Corporation (Santa Clara, CA)
- Senior AI Engineer , NeMo Retriever - Model Optimization and MLOps page is loaded## Senior AI Engineer , NeMo Retriever - Model Optimization and ... inference engines from NVIDIA and the community, including NVIDIA TensorRT and TensorRT - LLM , NIM microservices...The NeMo Retriever team is looking for an AI Engineer to join our team, focusing on the intersection… more
- GEICO (Palo Alto, CA)
- …Great Rewards and Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform Engineer to build and scale our machine learning ... maintain feature stores for ML model training and inference pipelines* Build and optimize LLM inference systems using frameworks like vLLM, TensorRT - LLM , and… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT - LLM across multiple model types. Develop APIs, orchestration layers, and ... orchestration. Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT - LLM , or custom PyTorch deployments. Knowledge of performance… more
- Apple Inc. (San Francisco, CA)
- Senior Software Engineer , Model Inference San Francisco Bay Area, California, United States Software and Services Join Apple Maps to help build the best map in ... measurable results at global scale. Description As a Software Engineer on the Apple Maps team, you will lead...and Speculative Decoding. Skilled in GPU optimization (eg, CUDA, TensorRT - LLM , cuDNN) to accelerate inference tasks. Skilled… more
- Amazon (San Francisco, CA)
- Senior Software Development Engineer , AI/ML, AWS Neuron, Model Inference Job ID: 3067759 | Amazon.com Services LLC The Annapurna Labs team at Amazon Web Services ... responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama… more
- NVIDIA Corporation (Santa Clara, CA)
- Senior Technical Marketing Engineer - AI Inference at Scale page is loaded## Senior Technical Marketing Engineer - AI Inference at Scalelocations: US, ... power AI at scale. We are looking for a Senior Technical Marketing Engineer to join our...JAX), and inference-specific frameworks & optimizations (Triton Inference Server, TensorRT - LLM , vLLM, SGLang).* Market Awareness - Experience… more
- Amazon (San Francisco, CA)
- Software Development Engineer , AI/ML, AWS Neuron, Model Inference The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development ... responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama… more
- NVIDIA Corporation (Santa Clara, CA)
- …more deep neural network (DNN) training and Inference frameworks, such as PyTorch, TensorRT - LLM , vLLM, SGLang.* Strong programming skills in C++ and Python.* ... our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.… more
- NVIDIA Corporation (Santa Clara, CA)
- …MS) or equivalent experience.**Ways to stand out from the crowd: Hands-on with LLM inference stacks (Triton Inference Server, TensorRT - LLM , vLLM).* ... GPUs. You will shape our strategy for emerging challenges like disaggregated LLM inference and safeguard the long-term technical health of the platform.**What you'll… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a TensorRT - LLM Software Development Engineer ! NVIDIA is hiring software engineers for its TensorRT - LLM team. Academic and ... core backend software for LLM inference. + Improve the usability of the TensorRT - LLM library and build systems (CMake) What we need to see: + Masters or… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Deep Learning Software Engineer , LLM Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate ... learning community to implement the latest algorithms for public release in TensorRT LLM , VLLM, SGLang and LLM benchmarks. Identify performance opportunities… more
- NVIDIA (Santa Clara, CA)
- …deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale ... large-scale LLM inference. + Architect and implement deep integrations with leading LLM serving engines (such as vLLM, SGLang, TensorRT - LLM ), with a… more
- NVIDIA (Santa Clara, CA)
- …and streamlined deployment strategies with open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative LLMs, ... ( TensorRT Model Optimizer, Megatron-LM, Megatron-Bridge, Nvidia-NeMo, NeMo-AutoModel, TensorRT - LLM ) and open-source frameworks (PyTorch, Hugging Face, vLLM,… more
- NVIDIA (Santa Clara, CA)
- …on pre-optimized inference engines from NVIDIA and the community, including NVIDIA TensorRT and TensorRT - LLM , NIM microservices optimize response latency ... The NeMo Retriever team is looking for an AI Engineer to join our team, focusing on the intersection...deployments, etc. + Familiarity with ML libraries, especially PyTorch, TensorRT , or TensorRT - LLM . + Excellent… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior DL Algorithms Engineer ! We are seeking a highly skilled Deep Learning Algorithms Engineer with hands-on experience optimizing ... deploy, and optimize models for efficient inference using frameworks such as TensorRT , TensorRT - LLM , vLLM, and SGLang. + Understand, analyze, profile, and… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior DL Algorithms Engineer ! We are seeking a highly skilled Deep Learning Algorithms Engineer with hands-on experience optimizing ... inference. + Convert and deploy models using frameworks such as TensorRT and TensorRT - LLM + Understand, analyze, profile, and optimize performance of… more
- NVIDIA (Santa Clara, CA)
- …Today, we are increasingly known as "the AI computing company." We are seeking a Senior Staff Machine Learning Engineer to join our Enterprise AI team and build ... frameworks such as PyTorch or TensorFlow; familiarity with CUDA-accelerated libraries (eg, TensorRT - LLM ) is a plus. + Proven track record to take a significant… more
- Red Hat (Boston, MA)
- **About the Job** The Red Hat Performance and Scale Engineering team is seeking a Senior Performance Engineer to join our PSAP (Performance and Scale for AI ... for example.This is a dynamic role for a seasoned engineer with a growth mindset who handles and adapts...PyTorch Profiler, among others + Hands-on experience with modern LLM inference server stacks (eg, vLLM, TensorRT -… more