• Capital One (San Francisco, CA)
    …Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference , similarity search, ... Senior Lead AI Engineer (AI Foundations,...developing AI and ML algorithms or technologies (eg LLM Inference , Similarity Search and VectorDBs, Guardrails, Memory) using Python,… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • TwelveLabs (San Francisco, CA)
    …features like video search, generation, and embedding, integrated with model inference pipelines. Architect high‑throughput, service‑oriented backend systems ... Join to apply for the Senior Software Engineer , Backend role at TwelveLabs. Base Pay Range $145,000.00/yr - $182,000.00/yr At TwelveLabs, we are pioneering the… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Hp Iq (San Francisco, CA)
    …while seamlessly integrating with cloud infrastructure. We are looking for a Senior Software Engineer to design and develop high‑performance, scalable services ... edge devices. Optimize data pipelines and storage solutions for real‑time AI inference and processing. Implement security and privacy best practices for distributed… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • WekaIO (San Francisco, CA)
    …into data pipelines that dramatically increase GPU utilization and make AI model training and inference , machine learning, and other compute‑intensive workloads ... on this exciting journey. The Bay Area regional Sales Engineer will join our rapidly growing sales organization. Being...our rapidly growing sales organization. Being a WEKA Sales Engineer will require you to partner with customers, understanding… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Bland (San Francisco, CA)
    …YC, the founders of Twilio, Affirm, ElevenLabs, and many more. About The Role As a Senior ML Engineer at Bland, you'll own the intelligence behind our voice AI ... and reduce latency. Optimize for enterprise scale: Handle complex inference optimization challenges- model quantization, efficient serving architectures, and… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Assembled (San Francisco, CA)
    Machine Learning Engineer - Forecasting & Scheduling at Assembled About Assembled Great customer support requires human agents and AI in perfect balance, and ... Forecasting & Scheduling Contact‑volume forecasting: data pipelines, statistical/ML models and inference services that predict ticket volumes, agent demand and time… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Workshop Labs (San Francisco, CA)
    …data. Our core ML systems challenge: how do we serve the world's best personal model , at low cost and high speed, with bulletproof privacy? What you'll do Build the ... finetuned models for our customers Monitor & optimize in-the-wild model serving performance to hit low latency & cost...us-and integrate the privacy architecture with the finetuning & inference code You have A deep understanding of the… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Faire (San Francisco, CA)
    …and willingness to learn new tools and techniques Demonstrated ability to mentor Senior Data Scientists, develop team strategy, and independently lead model ... in supervised fine tuning of multi-modal LLMs Experience deploying and optimizing LLM inference systems at scale (10B+ tokens), with focus on cost efficiency and… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Algorithms Engineer

    NVIDIA (Santa Clara, CA)
    …streamlined deployment strategies with open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative generative ... and diffusion models. In this role, you will design, implement, and productionize model optimization algorithms for inference and deployment on NVIDIA's latest… more
    NVIDIA (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Senior Software Development Engineer

    Amazon (Seattle, WA)
    …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
    Amazon (01/06/26)
    - Save Job - Related Jobs - Block Source
  • Senior Software Development Engineer

    Amazon (Cupertino, CA)
    …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
    Amazon (11/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer

    MongoDB (Palo Alto, CA)
    **About the Role** We're looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic ... with Atlas and designed for developer-first experiences. As a Senior Engineer , you'll focus on building core...focus on building core systems and services that power model inference at scale. You'll own key… more
    MongoDB (01/08/26)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , AI…

    NVIDIA (CA)
    …how you can make a lasting impact on the world. We are now looking for a Senior System Software Engineer to work on user facing tools for Dynamo Inference ... data scientists. What you'll be doing: + Build and maintain distributed model management systems, including Rust-based runtime components, for large-scale AI … more
    NVIDIA (11/29/25)
    - Save Job - Related Jobs - Block Source
  • Senior Principal Machine Learning…

    Red Hat (Boston, MA)
    …bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... the vLLM and LLM-D projects, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for… more
    Red Hat (01/08/26)
    - Save Job - Related Jobs - Block Source
  • Senior DL Algorithms Engineer

    NVIDIA (Santa Clara, CA)
    …leads the AI revolution. What you will be doing: + Implement language and multimodal model inference as part of NVIDIA Inference Microservices (NIMs). + ... We are now looking for a Senior DL Algorithms Engineer ! NVIDIA is...bugs and deliver production code to TRT-LLM, NVIDIA's open-source inference serving library. + Profile and analyze bottlenecks across… more
    NVIDIA (01/08/26)
    - Save Job - Related Jobs - Block Source
  • Senior Deep Learning Software…

    NVIDIA (Santa Clara, CA)
    NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and ... frameworks, which are at the forefront of efficient large-scale model serving and inference . You will play...are growing fast. If you're a creative and autonomous engineer with a genuine passion for technology, we want… more
    NVIDIA (12/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior Deep Learning Software…

    NVIDIA (Santa Clara, CA)
    NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and ... SGLang and vLLM, which are at the forefront of efficient large-scale model serving and inference . You will play a central role in improving these platforms,… more
    NVIDIA (12/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Engineer -AI Inference

    Bank of America (Addison, TX)
    Senior Engineer -AI Inference Addison, Texas;Plano, Texas; Newark, Delaware; Charlotte, North Carolina; Kennesaw, Georgia **To proceed with your application, ... must be at least 18 years of age.** Acknowledge (https://ghr.wd1.myworkdayjobs.com/Lateral-US/job/Addison/ Senior - Engineer -AI- Inference \_25029879) **Job Description:** At Bank… more
    Bank of America (12/22/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - vLLM…

    Red Hat (Boston, MA)
    …for enterprises to build, optimize, and scale LLM deployments. We are seeking an experienced Senior ML Ops engineer to work closely with our product and research ... open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings...deep learning products and software. As an ML Ops engineer , you will work closely with our technical and… more
    Red Hat (12/06/25)
    - Save Job - Related Jobs - Block Source
  • Senior Principal Machine Learning…

    Red Hat (Boston, MA)
    …bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for… more
    Red Hat (01/08/26)
    - Save Job - Related Jobs - Block Source