• Principal AI and ML

    NVIDIA (Santa Clara, CA)
    We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will ... closely with customers to pinpoint and address infrastructure deficiencies, facilitating groundbreaking AI and ML research on GPU Clusters. Together, we can… more
    NVIDIA (08/27/25)
    - Save Job - Related Jobs - Block Source
  • Principal Capacity Delivery Reliability…

    Amazon (Seattle, WA)
    …trends. Leading cross-functional strategic capacity modeling to support emerging workloads (such as AI / ML and HPC) and future region expansions will be a crucial ... ready to architect the digital backbone of tomorrow? We're seeking a Principal Supply Capacity Delivery Reliability Planner to lead end-to-end capacity and planning… more
    Amazon (09/26/25)
    - Save Job - Related Jobs - Block Source
  • Principal Staff Software Engineer,…

    LinkedIn (Mountain View, CA)
    …Agents work for Training infrastructure. As a Principal Staff Software Engineer on the AI Training Infra team, you will play a crucial role in leading and ... team needs to be together. As part of LinkedIn's AI Platform group, the AI Training team...tensor libraries like PyTorch, Tensorflow, JAX/FLAX Suggested Skills + ML Algorithm Development + Machine Learning / Deep Learning… more
    LinkedIn (09/25/25)
    - Save Job - Related Jobs - Block Source
  • Principal Solutions Architect

    Amazon (San Francisco, CA)
    …technical domains, but the technical curiosity to become proficient in all: * ML / AI Infrastructure: * Architect training and inference systems at scale ... experience - Deep technical expertise in multiple domains including ML / AI systems, cloud-native architectures (Kubernetes, microservices, serverless), DevOps/SRE… more
    Amazon (10/04/25)
    - Save Job - Related Jobs - Block Source
  • Principal Software Engineer

    DataRobot (Boston, MA)
    … that makes sense for their business - today and in the future. As a Principal Software Engineer for Generative AI at DataRobot, you will be the technical anchor ... years of software engineering experience, including 3+ years in AI / ML systems (Generative AI preferred)....and evaluation metrics. + Experience with MLOps tools and AI -specific infra (eg, vector DBs, GPU optimization).… more
    DataRobot (08/25/25)
    - Save Job - Related Jobs - Block Source
  • Principal Applied Scientist, Delivery…

    Amazon (New York, NY)
    …scientists and engineers to pioneer the next frontier of logistics through advanced AI and foundation models. We are seeking an exceptional Principal Applied ... learning - Guide and support fellow engineers in building scalable and reusable infra to support model training, evaluation, and inference - Lead focused technical… more
    Amazon (09/24/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Manager, Applied Science, Catalog AI

    Amazon (Seattle, WA)
    …fascinated by the power of Large Language Models (LLM) and applying Generative AI to solve complex challenges within one of Amazon's most significant businesses? ... from: developing tuning artifacts on top of foundational LLMs, training ML models, performing fact extraction, automatic detection of missing product information,… more
    Amazon (07/18/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Machine Learning Engineer, Amazon General…

    Amazon (Bellevue, WA)
    Description Our Machine Learning training infrastructure ( ML Infra ) team is responsible for designing, implementing, and optimizing large-scale computing ... - Demonstrate significant innovation, creativity, and judgement when solving challenging AI / ML infrastructure problems. Identify future skills needed across your… more
    Amazon (09/08/25)
    - Save Job - Related Jobs - Block Source
  • VP, Engineering - Search & Discovery

    Realtor (Austin, TX)
    …organization across Search Engineering, Relevance/Ranking, Query Understanding, Personalization, and ML Platforms, partnering tightly with Product, Data, and Revenue ... culture: nDCG/ERR, CTR, lead quality, P50/P95 latency, experiment velocity, and infra cost per search. **Build & lead a world-class engineering organization**… more
    Realtor (10/04/25)
    - Save Job - Related Jobs - Block Source