• Red Hat (Boston, MA)
    A leading software company seeks a Machine Learning Engineer focused on vLLM Inference in Boston, MA. This role involves designing high-performance machine ... Ideal candidates are experienced in Python and Pydantic, understand LLM Inference Core Concepts, and possess strong communication skills. Enjoy comprehensive… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Red Hat (Boston, MA)
    Machine Learning Engineer , vLLM Inference - Tool Calling and Structured Output Join to apply for the Machine Learning Engineer , vLLM Inference - ... mission to bring the power of open‑source LLMs and vLLM to every enterprise. The Red Hat Inference...optimize, and scale LLM deployments. As a Machine Learning Engineer focused on vLLM , you will be… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Red Hat (Boston, MA)
    Principal Machine Learning Engineer , Distributed vLLM Inference Join to apply for the Principal Machine Learning Engineer , Distributed vLLM ... mission to bring the power of open‑source LLMs and vLLM to every enterprise. Red Hat Inference ...or distributed tracing libraries/techniques like OpenTelemetry. Ph.D. in an ML ‑related domain is a significant advantage. The salary range… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Red Hat, Inc. (Boston, MA)
    …open, and we are on a mission to bring the power of open-source LLMs and vLLM to every enterprise. The Red Hat Inference team accelerates AI for the enterprise ... optimize, and scale LLM deployments. As a Machine Learning Engineer focused on vLLM , you will be...you. Join us in shaping the future of AI Inference ! What You Will Do Write robust Python and… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Acceler8 Talent (San Francisco, CA)
    A dynamic technology company in San Francisco is seeking a Machine Learning Engineer ( Inference ) to enhance AI inference efficiency. This mid-senior level ... role focuses on building and optimizing inference stacks, requiring experience with tools such as ...inference stacks, requiring experience with tools such as vLLM and TensorRT. The compensation includes a competitive salary,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Apple Inc. (San Francisco, CA)
    …with Maps infrastructure. Responsibilities Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving ... Senior Software Engineer , Model Inference San Francisco Bay...equivalent experience). 5+ years in software engineering focused on ML inference , GPU acceleration, and large-scale systems.… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Red Hat (Boston, MA)
    Principal Machine Learning Engineer , AI Inference page is loaded## Principal Machine Learning Engineer , AI Inferenceremote type: Hybridlocations: ... mission to bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference ...optimize, and scale LLM deployments.As a Principal Machine Learning Engineer focused on vLLM , you will be… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • MongoDB (Palo Alto, CA)
    We're looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and ... Atlas and designed for developer-first experiences. As a Senior Engineer , you'll focus on building core systems and services...in a cloud-native environment Work across product, infrastructure, and ML teams to ensure the inference platform… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • GEICO (Palo Alto, CA)
    …Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform Engineer to build and scale our machine learning infrastructure ... resource optimization* Design, implement, and maintain feature stores for ML model training and inference pipelines* Build...and inference pipelines* Build and optimize LLM inference systems using frameworks like vLLM , TensorRT-LLM,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • d-Matrix (Santa Clara, CA)
    …Hybrid, working onsite at our Santa Clara, CA, headquarters 3 days per week. Principal AI/ ML System Software Engineer What you will do The role requires you to ... in Computer Science, Electrical Engineering, or related fields Experience with inference servers/model serving frameworks (such as TensorRT-LLM, vLLM , SGLang,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Menlo Ventures (Berkeley, CA)
    …agents-all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace ... distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.). Familiarity with inference optimization frameworks ( vLLM , TensorRT, etc.). Experience… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Compu-Vision Consulting, Inc. (San Jose, CA)
    …generation Constrained decoding enforcing HDL syntax Does-it-synthesize? checks Privacy-First AWS ML Pipelines Design secure training & inference environments ... LLM serving: Bedrock model invocation where applicable Low-latency self-hosted inference ( vLLM /TensorRT-LLM) Autoscaling and canary/blue-green rollouts Evaluation… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Scouto AI (San Francisco, CA)
    …PyTorch, TensorRT-LLM, vLLM , SGLang, CUDA, and other libraries to debug ML performance issues Collaborate with the engineering team to bring new features and ... Overview We are building a distributed LLM inference network that combines idle GPU capacity from...inference optimization techniques Hands-on experience with SGLang and vLLM , with contributions to these projects strongly preferred Familiarity… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • SRS Consulting Inc (San Jose, CA)
    …serving: Bedrock model invocation where it fits, and/or low‑latency self‑hosted inference ( vLLM /TensorRT LLM), autoscaling, and canary/blue‑green rollouts. Build ... Role: Staff Machine Learning Engineer Location: San Jose, CA (Onsite) Locals Duration:...enforce syntax, and "does it synthesize" checks. Design privacy‑first ML pipelines on AWS: Training/customization and hosting using Amazon… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Aldea Inc (San Francisco, CA)
    …high reliability. Track record of delivering significant performance optimizations in ML training or inference systems. Preferred Qualifications Experience with ... with large-scale pretraining (100B+ tokens, ideally trillion+ scale). Experience optimizing inference for production: quantization, vLLM , TensorRT, or custom… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Crusoe Energy Systems LLC (San Francisco, CA)
    …The Crusoe Cloud Managed AI team seeks an ambitious and experienced Senior Software Engineer to join their team. You'll have a pivotal role in shaping the ... architecture and scalability of our next-generation AI inference platform. You will lead the design and implementation of core systems for our AI services, including… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Together AI (San Francisco, CA)
    Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant ... experience includes implementing runtime systems that perform inference at scale using AI/ ML models from...computer science or equivalent industry experience Familiar with LLM inference ecosystem, including frameworks and engines (eg vLLM more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Abridge (San Francisco, CA)
    …What You'll Do Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training Develop, optimize, and maintain ML model serving and ... Machine Learning Infrastructure Engineer Join to apply for the Machine Learning...with model serving frameworks such as NVIDIA Triton Server, VLLM , TRT‑LLM and so on. Expertise with ML more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • NVIDIA Corporation (Santa Clara, CA)
    Principal Software Engineer - Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer - Large-Scale LLM Memory and Storage ... Posted Todayjob requisition id: JR2010271NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • NVIDIA (Santa Clara, CA)
    NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built ... resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source