• Protogon Research (Redwood City, CA)
    …other real-time decision environments. Background designing large-scale AI platforms, distributed training infrastructure , or research frameworks that ... AI Team Build and lead a world-class AI team across research, engineering, and infrastructure ....Data Pipelines & Model Quality Improvements Own the end-to-end AI platform, including data pipelines, training systems,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Advanced Micro Devices (San Jose, CA)
    …is a strategic technical leader with a strong foundation in distributed training and AI infrastructure , coupled with experience building or guiding ... you will define and execute the technical vision for distributed training of large-scale generative AI...distributed training , LLMs, recommendation systems, and AI infrastructure - and translate them into… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • HeyGen (San Francisco, CA)
    distributed training libraries) into production‑ready, scalable training and inference pipelines. Infrastructure Management: Champion the adoption ... the foundational compute infrastructure that powers our state‑of‑the‑art AI models-from multimodal training data pipelines to high‑throughput, low‑latency… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Oracle (Redwood City, CA)
    Senior Principal Software Engineer - AI Infrastructure Innovation Join to apply for the Senior Principal Software Engineer - AI Infrastructure Innovation ... AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. Evaluation of system architecture and proposed implementation path… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Hamilton Barnes Associates Limited (San Francisco, CA)
    …with high‑performance computing (HPC) or AI /ML training infrastructure at scale. Background in reliability engineering, distributed systems, or ... SSH access. This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI , shaping the operational backbone of one of the largest… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Oracle (Seattle, WA)
    Sr Principal Software Engineer, Networking - AI Infrastructure Innovation OCI (Oracle Cloud) AI Infrastructure Innovation team is pioneering the creation ... backend fabrics-that enables customers to achieve high performance for AI training and inference. You will define...access. If you thrive at the intersection of large‑scale distributed systems, high‑speed networking, and AI workloads,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Alignerr (San Francisco, CA)
    …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... Principal Systems Engineer (C++) - AI Infrastructure Alignerr connects top technical...with data, research, and engineering teams to support model training and evaluation workflows Identify bottlenecks and edge cases… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Labelbox (San Francisco, CA)
    …quality, or evaluation systems Familiarity with AI /ML workflows, model training , or benchmarking pipelines Experience with distributed systems or developer ... work on real production systems and high-impact research workflows across data, tooling, and infrastructure . Position Senior C++ Full-Stack Engineer - AI Data & … more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • LinkedIn (Mountain View, CA)
    … Infra team, you will play a crucial role in leading and building the next‑gen training infrastructure to power AI use cases. You will design and implement ... as the tech‑lead for several concurrent key initiatives for the Training Infrastructure and defining the future of AI training platforms. Basic… more
    job goal (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Labelbox (San Francisco, CA)
    …technical expertise with a passion for innovation, working at the intersection of AI infrastructure , data systems, and user experience. We believe in pushing ... AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at...Labelbox customers to efficiently manage and stream data for training next-generation AI models. Owning critical components… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Abridge (San Francisco, CA)
    …and training Develop, optimize, and maintain ML model serving and training infrastructure , ensuring high‑performance and low‑latency. Collaborate with ML and ... and so on. Expertise with ML toolchains such as PyTorch, Tensorflow or distributed training and inference libraries. Familiarity with GPU cluster management and… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Menlo Ventures (San Francisco, CA)
    …medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to ... a high-agency, high-visibility team operating at the frontier of AI infrastructure - with deep ties to...state a few. Improve reliability, latency, and efficiency of distributed AI workloads Collaborate with platform, infra,… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • GEICO (Palo Alto, CA)
    …The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers.**GEICO AI ML Infrastructure team is seeking an exceptional Senior ML Platform ... Platform & Infrastructure * Design and implement scalable infrastructure for training , fine-tuning, and serving open...Experience with Staff Engineer or Tech Lead roles in ML/ AI organizations* Background in distributed systems and… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Sentara Health (Virginia Beach, VA)
    …We are seeking a highly skilled and experienced Data Science ML Operations and Gen AI Engineer (or Senior) to join us and help advance our current and future work ... address those gaps. You will also collaborate with the AI team's ML Scientists and our partner data engineering...frameworks, and guardrails in production. - Strength in APIs, distributed systems, and ML Ops (K8s, CI/CD, monitoring). -… more
    JobLookup XML (01/13/26)
    - Save Job - Related Jobs - Block Source
  • P-1 AI (San Francisco, CA)
    …debugging NCCL deadlocks to optimizing FSDP configs, you'll be the go‑to person for training infrastructure and performance. What You'll Do Own the training ... About P-1 AI We are building an engineering AGI. We...training workflows Configure, launch, monitor, and debug multi‑node distributed training jobs using FSDP, DeepSpeed, or… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Crusoe Energy Systems LLC (San Francisco, CA)
    …in Golang or Python for large-scale, production-level services. (Preferred) Familiarity with AI infrastructure , including training , inference, and ETL ... -first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • GMI Cloud (Mountain View, CA)
    …/ML infrastructure , or data center engineering. Strong understanding of distributed training & inference architectures, Kubernetes, Slurm, or other ... We are seeking a Senior Solution Architect to design GPU‑cloud and AI infrastructure solutions, lead PoCs and benchmarks, guide customers through deployment, and… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Klaviyo (San Francisco, CA)
    …delivers value to the business. Your scope will include everything from integrating AI into infrastructure and automation frameworks to building internal AI ... technology transformations or systems architecture teams with a focus on AI , automation, or infrastructure modernization. Demonstrated success embedding … more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • The Rundown AI, Inc. (San Francisco, CA)
    …with AI model integration needs Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion ... Knowledge of Python, C/C++, CUDA, and experience profiling GPU performance and distributed training runs Experience with machine learning frameworks like… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source
  • Oracle (Redwood City, CA)
    AI chips and systems and software to drive next‑gen Cloud AI Infrastructure platforms. You will contribute to platform definition, platform development ... technical leadership to the broader organization. You should have experience developing AI infrastructure and operating high‑scale services, and an understanding… more
    job goal (01/14/26)
    - Save Job - Related Jobs - Block Source