- Apple Inc. (San Francisco, CA)
- Senior Software Engineer , Model Inference San Francisco Bay Area, California, United States Software and Services Join Apple Maps to help build the ... take end-to-end ownership, and deliver measurable results at global scale. Description As a Software Engineer on the Apple Maps team, you will lead the design… more
- OpenAI (San Francisco, CA)
- …things that they've never been able to before. We focus on performant and efficient model inference , as well as accelerating research progression via model ... . About the Role We are looking for an engineer who wants to take the world's largest and...improve the performance, latency, throughput, and efficiency of our model inference stack. Build tools to give… more
- Amazon (San Francisco, CA)
- Senior Software Development Engineer , AI/ML, AWS Neuron, Model Inference Job ID: 3067759 | Amazon.com Services LLC The Annapurna Labs team at Amazon Web ... ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement...with work experience on some optimizations for improving the model execution. - Software development experience in… more
- Amazon (San Francisco, CA)
- Software Development Engineer , AI/ML, AWS Neuron, Model Inference The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software ... lifecycles along with work experience on some optimizations for improving the model execution. Software development experience in C++, Python (experience in at… more
- Databricks Inc. (San Francisco, CA)
- Staff Software Engineer - GenAI inference P-1285 About This Role As a staff software engineer for GenAI inference , you will lead the ... architecture, development, and optimization of the inference engine that powers Databricks Foundation Model API. You'll bridge research advances and production… more
- Menlo Ventures (San Francisco, CA)
- About This Role As a software engineer for GenAI inference , you will help design, develop, and optimize the inference engine that powers Databricks' ... What You Will Do Contribute to the design and implementation of the inference engine, and collaborate on model ‑serving stack optimized for large‑scale LLMs… more
- Virtue AI (San Francisco, CA)
- …we're looking for passionate builders to join our core team. What You'll Do As an Inference Engineer , you will own how models are served in production. Your job ... You will: Serve and optimize LLM, embedding, and other ML models' inference across multiple model families Design and operate inference APIs with clear… more
- OpenAI (San Francisco, CA)
- …things that they've never been able to before. We focus on performant and efficient model inference , as well as accelerating research progression via model ... About the Team Our Inference team brings OpenAI's most capable research and...connection management. Have 5+ years of experience as a software engineer and systems architect working on… more
- OpenAI (San Francisco, CA)
- …tighter coordination with product and research. About the Role We're looking for a software engineer to help us serve OpenAI's multimodal models at scale. You'll ... networking, distributed compute, and high-throughput data handling. Have familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel… more
- Menlo Ventures (San Francisco, CA)
- …and contribute to our innovative projects. Position Overview We are looking for a Software Engineer to work at the forefront of deploying our cutting-edge AI ... machine learning architectures. Experience with machine learning compilers. Experience optimizing model inference for robotic systems deployment. Base Salary… more
- Pulse (San Francisco, CA)
- …experience is a plus About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling ... across single-tenant and multi-tenant environments. Responsibilities Build inference services with smart batching and caching Optimize kernels, tokenization, and … more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference . Our client operates high-performance GPU clusters ... beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to… more
- Menlo Ventures (San Francisco, CA)
- …to new inference features (eg, structured sampling, prompt caching) Supporting inference for new model architectures Analyzing observability data to tune ... leaders working together to build beneficial AI systems. About the role Our Inference team is responsible for building and maintaining the critical systems that… more
- OpenAI (San Francisco, CA)
- A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate ... has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience with distributed systems. This role involves… more
- Databricks Inc. (San Francisco, CA)
- A leading AI-focused technology company in San Francisco is seeking a Software Engineer for GenAI inference . In this role, you'll design, develop, and ... optimize the inference engine powering the Foundation Model API....focusing on large-scale LLM applications. A strong background in software engineering, distributed systems, and machine learning techniques is… more
- Arcade (San Francisco, CA)
- A pioneering technology company in San Francisco seeks a Software Engineer to build scalable backend systems and enhance generative AI workflows. The ideal ... candidate will design efficient architecture for model execution and collaborate closely with product and AI teams. This role offers the opportunity to innovate in a… more
- Databricks Inc. (San Francisco, CA)
- A leading data and AI company in San Francisco seeks a Staff Software Engineer for GenAI inference to lead its architecture and optimization efforts. ... 6 years of experience and an understanding of ML inference internals. Key tasks include collaborating on model... inference internals. Key tasks include collaborating on model features, optimizing the inference engine for… more
- Databricks Inc. (San Francisco, CA)
- …of experience building and operating large-scale distributed systems. Experience in model serving, inference systems, or related infrastructure (eg, routing, ... customers can use deep data insights to improve their business. Databricks' Model Serving product provides enterprises with a unified, scalable, and governed… more
- Databricks Inc. (San Francisco, CA)
- …experience building and operating large‑scale distributed systems. Deep expertise in model serving, inference systems, and related infrastructure (eg, routing, ... customers can use deep data insights to improve their business. Databricks' Model Serving product provides enterprises with a unified, scalable, and governed… more
- Menlo Ventures (San Francisco, CA)
- … Model Serving is the API Product for hosting and serving frontier AI model inference for open‑source models like Llama, Qwen, and GPT‑OSS as well as ... sensitive systems such as customer‑facing APIs, Edge Gateways, ML inference , or similar services and have an interest in...LLM APIs and runtimes at scale. As a Staff Engineer , you'll play a critical role in shaping both… more