- Advanced Micro Devices, Inc. (San Jose, CA)
- …is a strategic technical leader with a strong foundation in distributed training and AI infrastructure , coupled with experience building or guiding ... you will define and execute the technical vision for distributed training of large-scale generative AI...distributed training , LLMs, recommendation systems, and AI infrastructure - and translate them into… more
- Sentara Health (Virginia Beach, VA)
- …We are seeking a highly skilled and experienced Data Science ML Operations and Gen AI Engineer (or Senior) to join us and help advance our current and future work ... address those gaps. You will also collaborate with the AI team's ML Scientists and our partner data engineering...frameworks, and guardrails in production. - Strength in APIs, distributed systems, and ML Ops (K8s, CI/CD, monitoring). -… more
- HeyGen (San Francisco, CA)
- … distributed training libraries) into production‑ready, scalable training and inference pipelines. Infrastructure Management: Champion the adoption ... the foundational compute infrastructure that powers our state‑of‑the‑art AI models-from multimodal training data pipelines to high‑throughput, low‑latency… more
- Oracle (Redwood City, CA)
- Senior Principal Software Engineer - AI Infrastructure Innovation Join to apply for the Senior Principal Software Engineer - AI Infrastructure Innovation ... AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. Evaluation of system architecture and proposed implementation path… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …with high‑performance computing (HPC) or AI /ML training infrastructure at scale. Background in reliability engineering, distributed systems, or ... SSH access. This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI , shaping the operational backbone of one of the largest… more
- Alignerr (San Francisco, CA)
- …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... Principal Systems Engineer (C++) - AI Infrastructure Alignerr connects top technical...with data, research, and engineering teams to support model training and evaluation workflows Identify bottlenecks and edge cases… more
- Labelbox (San Francisco, CA)
- …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... work on real production systems and high-impact research workflows across data, tooling, and infrastructure . Position Senior C++ Full-Stack Engineer - AI Data & … more
- LinkedIn (Mountain View, CA)
- … Infra team, you will play a crucial role in leading and building the next‑gen training infrastructure to power AI use cases. You will design and implement ... as the tech‑lead for several concurrent key initiatives for the Training Infrastructure and defining the future of AI training platforms. Basic… more
- Labelbox (San Francisco, CA)
- …technical expertise with a passion for innovation, working at the intersection of AI infrastructure , data systems, and user experience. We believe in pushing ... AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at...Labelbox customers to efficiently manage and stream data for training next-generation AI models. Owning critical components… more
- Menlo Ventures (San Francisco, CA)
- …medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to ... a high-agency, high-visibility team operating at the frontier of AI infrastructure - with deep ties to...state a few. Improve reliability, latency, and efficiency of distributed AI workloads Collaborate with platform, infra,… more
- GEICO (Palo Alto, CA)
- …The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform ... Platform & Infrastructure * Design and implement scalable infrastructure for training , fine-tuning, and serving open...Experience with Staff Engineer or Tech Lead roles in ML/ AI organizations* Background in distributed systems and… more
- Klaviyo (San Francisco, CA)
- …delivers value to the business. Your scope will include everything from integrating AI into infrastructure and automation frameworks to building internal AI ... technology transformations or systems architecture teams with a focus on AI , automation, or infrastructure modernization. Demonstrated success embedding … more
- Hedra, Inc (San Francisco, CA)
- …computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training . Ensure that our infrastructure can handle the ... like Docker and Kubernetes required for deployments at scale. Understanding of distributed training techniques and how to scale models across multi-node… more
- The Rundown AI, Inc. (San Francisco, CA)
- …with AI model integration needs Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion ... Knowledge of Python, C/C++, CUDA, and experience profiling GPU performance and distributed training runs Experience with machine learning frameworks like… more
- Oracle (Redwood City, CA)
- … AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. You will contribute to platform definition, platform development ... technical leadership to the broader organization. You should have experience developing AI infrastructure and operating high‑scale services, and an understanding… more
- Gauss Labs (Palo Alto, CA)
- …design scalable, efficient ML pipelines Build and maintain reliable, performant infrastructure for data processing, model training , evaluation, and deployment ... Join to apply for the AI Engineer - Machine Learning (US) role at...for end-to-end model development 3+ years building production-ready ML infrastructure , including data pipelines, training /inference workflows, and… more
- salesforce.com, inc. (San Francisco, CA)
- …across Salesforce Enterprise. We integrate our IT network, public cloud infrastructure , and data centers, while increasingly leveraging AI /ML-driven insights ... Job Category: Software Engineering Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition… more
- Salesforce (San Francisco, CA)
- …across Salesforce Enterprise. We integrate our IT network, public cloud infrastructure , and data centers, while increasingly leveraging AI /ML‑driven insights ... Enterprise Security Software Architect ( AI /ML) Join to apply for the Enterprise Security...Enterprise Security team builds and operates highly scalable, fault‑tolerant, distributed systems to deliver cloud‑scale security services. >Our key… more
- Amazon (San Francisco, CA)
- …a Sr. Machine Learning Engineer to lead the development of next-generation ML training infrastructure . This high-impact role requires over 8 years of software ... development experience, expertise in distributed systems, and a deep understanding of AI frameworks. Join a dynamic team contributing to cutting-edge AI … more
- Roche Holdings Inc. (Santa Clara, CA)
- …workloads using Kubernetes, including frameworks like Kubeflow or KubeRay to manage distributed training jobs. Familiarity with data versioning tools like DVC. ... to come. Join Roche, where every voice matters. Principal DevOps Engineer - ML/ AI Algorithms As Principal DevOps Engineer - ML/ AI Algorithms, you will… more