- Protogon Research (Redwood City, CA)
- …other real-time decision environments. Background designing large-scale AI platforms, distributed training infrastructure , or research frameworks that ... AI Team Build and lead a world-class AI team across research, engineering, and infrastructure ....Data Pipelines & Model Quality Improvements Own the end-to-end AI platform, including data pipelines, training systems,… more
- Advanced Micro Devices (San Jose, CA)
- …is a strategic technical leader with a strong foundation in distributed training and AI infrastructure , coupled with experience building or guiding ... you will define and execute the technical vision for distributed training of large-scale generative AI...distributed training , LLMs, recommendation systems, and AI infrastructure - and translate them into… more
- HeyGen (San Francisco, CA)
- … distributed training libraries) into production‑ready, scalable training and inference pipelines. Infrastructure Management: Champion the adoption ... the foundational compute infrastructure that powers our state‑of‑the‑art AI models-from multimodal training data pipelines to high‑throughput, low‑latency… more
- Oracle (Redwood City, CA)
- Senior Principal Software Engineer - AI Infrastructure Innovation Join to apply for the Senior Principal Software Engineer - AI Infrastructure Innovation ... AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. Evaluation of system architecture and proposed implementation path… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …with high‑performance computing (HPC) or AI /ML training infrastructure at scale. Background in reliability engineering, distributed systems, or ... SSH access. This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI , shaping the operational backbone of one of the largest… more
- Oracle (Seattle, WA)
- Sr Principal Software Engineer, Networking - AI Infrastructure Innovation OCI (Oracle Cloud) AI Infrastructure Innovation team is pioneering the creation ... backend fabrics-that enables customers to achieve high performance for AI training and inference. You will define...access. If you thrive at the intersection of large‑scale distributed systems, high‑speed networking, and AI workloads,… more
- Alignerr (San Francisco, CA)
- …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... Principal Systems Engineer (C++) - AI Infrastructure Alignerr connects top technical...with data, research, and engineering teams to support model training and evaluation workflows Identify bottlenecks and edge cases… more
- Labelbox (San Francisco, CA)
- …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... work on real production systems and high-impact research workflows across data, tooling, and infrastructure . Position Senior C++ Full-Stack Engineer - AI Data & … more
- LinkedIn (Mountain View, CA)
- … Infra team, you will play a crucial role in leading and building the next‑gen training infrastructure to power AI use cases. You will design and implement ... as the tech‑lead for several concurrent key initiatives for the Training Infrastructure and defining the future of AI training platforms. Basic… more
- Labelbox (San Francisco, CA)
- …technical expertise with a passion for innovation, working at the intersection of AI infrastructure , data systems, and user experience. We believe in pushing ... AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at...Labelbox customers to efficiently manage and stream data for training next-generation AI models. Owning critical components… more
- Abridge (San Francisco, CA)
- …and training Develop, optimize, and maintain ML model serving and training infrastructure , ensuring high‑performance and low‑latency. Collaborate with ML and ... and so on. Expertise with ML toolchains such as PyTorch, Tensorflow or distributed training and inference libraries. Familiarity with GPU cluster management and… more
- Menlo Ventures (San Francisco, CA)
- …medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to ... a high-agency, high-visibility team operating at the frontier of AI infrastructure - with deep ties to...state a few. Improve reliability, latency, and efficiency of distributed AI workloads Collaborate with platform, infra,… more
- GEICO (Palo Alto, CA)
- …The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform ... Platform & Infrastructure * Design and implement scalable infrastructure for training , fine-tuning, and serving open...Experience with Staff Engineer or Tech Lead roles in ML/ AI organizations* Background in distributed systems and… more
- Sentara Health (Virginia Beach, VA)
- …We are seeking a highly skilled and experienced Data Science ML Operations and Gen AI Engineer (or Senior) to join us and help advance our current and future work ... address those gaps. You will also collaborate with the AI team's ML Scientists and our partner data engineering...frameworks, and guardrails in production. - Strength in APIs, distributed systems, and ML Ops (K8s, CI/CD, monitoring). -… more
- P-1 AI (San Francisco, CA)
- …debugging NCCL deadlocks to optimizing FSDP configs, you'll be the go‑to person for training infrastructure and performance. What You'll Do Own the training ... About P-1 AI We are building an engineering AGI. We...training workflows Configure, launch, monitor, and debug multi‑node distributed training jobs using FSDP, DeepSpeed, or… more
- Crusoe Energy Systems LLC (San Francisco, CA)
- …in Golang or Python for large-scale, production-level services. (Preferred) Familiarity with AI infrastructure , including training , inference, and ETL ... -first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to… more
- GMI Cloud (Mountain View, CA)
- …/ML infrastructure , or data center engineering. Strong understanding of distributed training & inference architectures, Kubernetes, Slurm, or other ... We are seeking a Senior Solution Architect to design GPU‑cloud and AI infrastructure solutions, lead PoCs and benchmarks, guide customers through deployment, and… more
- Klaviyo (San Francisco, CA)
- …delivers value to the business. Your scope will include everything from integrating AI into infrastructure and automation frameworks to building internal AI ... technology transformations or systems architecture teams with a focus on AI , automation, or infrastructure modernization. Demonstrated success embedding … more
- The Rundown AI, Inc. (San Francisco, CA)
- …with AI model integration needs Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion ... Knowledge of Python, C/C++, CUDA, and experience profiling GPU performance and distributed training runs Experience with machine learning frameworks like… more
- Oracle (Redwood City, CA)
- … AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. You will contribute to platform definition, platform development ... technical leadership to the broader organization. You should have experience developing AI infrastructure and operating high‑scale services, and an understanding… more