• Senior Software Engineer, Distributed

    NVIDIA (Austin, TX)
    …from the crowd: + Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience ... part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be...Bright Cluster Manager) + Proven operational excellence in maintaining reliable and performant AI infrastructure. NVIDIA is… more
    NVIDIA (07/02/25)
    - Save Job - Related Jobs - Block Source
  • Senior Distributed Software Engineer,…

    NVIDIA (Santa Clara, CA)
    …achieve this goal, we are looking for an engineer with a deep understanding of distributed systems , outstanding design skills, and a track record in building and ... the broader NVIDIA team to design and build a reliable , scalable, and efficient storage-as-a-service tailored to AI...years of industry experience + Strong background in developing distributed systems involving Golang, Kubernetes, and Cloud… more
    NVIDIA (08/08/25)
    - Save Job - Related Jobs - Block Source
  • Engineering Manager - Rack Scale AI

    NVIDIA (Santa Clara, CA)
    …Lead IPP's (Infrastructure, Planning and Process) Cloud Platform Team focused on Rack Scale AI Systems . IPP is a global organization within NVIDIA. This group ... of cloud design in the areas of virtualization and global infrastructure, distributed systems , load balancing and security + Excellent thought process… more
    NVIDIA (07/29/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …leverage our large-scale GPU training and inference fleet through an observable, reliable and high-performance distributed AI /GPU communication stack. ... learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations, or… more
    Meta (08/01/25)
    - Save Job - Related Jobs - Block Source
  • REMOTE: Chief Architect, AI Engineering

    Lenovo (Morrisville, NC)
    systems ,perception, model runtimes, evaluation, and infrastructure. * Proventrack recordof shipping AI -driven or distributed software systems at scale. * ... OTA rollouts and preload governance at global scale. *Expertisein designing complex distributed systems and cloud-native platforms, with broad knowledge across… more
    Lenovo (08/29/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, AI Resiliency

    NVIDIA (Santa Clara, CA)
    …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
    NVIDIA (07/22/25)
    - Save Job - Related Jobs - Block Source
  • Software Manager, AI Infrastructure System

    NVIDIA (Santa Clara, CA)
    …to encouraging an inclusive and diverse workplace. + Hands-on experience developing large-scale distributed systems Ways to stand out from the crowd: + Strong ... orgs to build products that use LLMs and agent systems to serve the needs of NVIDIA engineering teams....the product/team. + Develop and execute strategies for scalable, reliable , and secure AI infrastructure supporting both… more
    NVIDIA (07/01/25)
    - Save Job - Related Jobs - Block Source
  • (USA) Principal, Software Engineer - AI

    Walmart (Sunnyvale, CA)
    …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Walmart GTP, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
    Walmart (08/28/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Generative AI

    GE Vernova (Niskayuna, NY)
    …neural network architectures (eg, CNNs, RNNs, Transformers). + Expertise in designing scalable, distributed architectures for AI systems . + Strong experience ... Azure, GCP) and containerization (Kubernetes, Docker). + Familiarity with large-scale distributed systems and database technologies. + Experience in creating… more
    GE Vernova (09/11/25)
    - Save Job - Related Jobs - Block Source
  • Principal, Software Engineer - Gen AI

    Walmart (Sunnyvale, CA)
    …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Backend team, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
    Walmart (06/21/25)
    - Save Job - Related Jobs - Block Source
  • Associate Director - Gen AI Solutions…

    S&P Global (New York, NY)
    …and executive-level decisions. + **System Development:** Build and scale production-ready AI systems that operate reliably at enterprise levels, ensuring ... + ** AI Infrastructure Architecture:** Oversee the design of scalable and reliable infrastructure for AI , ML, and model training and deployment. +… more
    S&P Global (09/03/25)
    - Save Job - Related Jobs - Block Source
  • Sr Machine Learning Engineer - GenAI, LLM, Agentic…

    eightfold.ai (Santa Clara, CA)
    …with opportunities. Responsibilities: + Research, design, development, and deployment of advanced AI agents and agentic systems . + Architect and implement ... About Eightfold. ai : Eightfold. ai is revolutionizing HR technology... is at the forefront of developing intelligent, autonomous systems that will redefine talent management. We are building… more
    eightfold.ai (08/08/25)
    - Save Job - Related Jobs - Block Source
  • Senior Engineering Manager - Enterprise AI

    NVIDIA (Santa Clara, CA)
    …years overall engineering experience in building high quality secure products and platforms with distributed systems , AI applications, ML models + 5+ overall ... Join the Enterprise AI & Automation team in NVIDIA IT to...estimation, performance analysis, testing, and delivering world-class scalable and reliable software. + Collaborate with other stakeholders from employee… more
    NVIDIA (09/03/25)
    - Save Job - Related Jobs - Block Source
  • Principal Software Engineer - Azure AI

    Microsoft Corporation (Redmond, WA)
    …equivalent experience. + 4+ years' practical experience working on high scale, reliable online systems **Other Requirements:** Ability to meet Microsoft, ... Microsoft Azure AI Inference platform is the next generation cloud...drive the design, optimization, and scaling of our inference systems . In this role, you'll lead engineering efforts to… more
    Microsoft Corporation (09/04/25)
    - Save Job - Related Jobs - Block Source
  • Director, AI and Machine Learning

    PennyMac (Westlake Village, CA)
    …product managers and technology teams to design and build secure, large-scale distributed event-based systems . + Build and maintain effective relationships with ... journey. A Typical Day We're looking for a highly experienced and motivated AI /ML Team Lead to spearhead the design, development, and deployment of cutting-edge … more
    PennyMac (08/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior Staff AI Architect | Production…

    ServiceNow, Inc. (Addison, TX)
    …interact with our applications. The ideal candidate has deep expertise in AI /ML systems design, personalization, and experience working within Digital Technology ... **Shape the Future of Intelligent Experiences** + Design and implement scalable AI /ML systems that enhance personalization, intent recognition, semantic search,… more
    ServiceNow, Inc. (08/28/25)
    - Save Job - Related Jobs - Block Source
  • Staff AI Engineer

    Proofpoint (Draper, UT)
    …skills (Python preferred), with an emphasis on building maintainable, secure, and reliable systems . + Hands-on experience with Linux containers and orchestration ... and mentor others. **Nice to Have** + Familiarity with distributed training for AI /ML workloads. + Experience with data privacy and regulatory frameworks… more
    Proofpoint (09/09/25)
    - Save Job - Related Jobs - Block Source
  • Senior, Data Scientist - Agentic AI

    Walmart (San Bruno, CA)
    …roadmap that aligns with business objectives. + Experience building high-traffic distributed systems and ML infrastructure, ensuring scalability and resilience. ... skilled and motivated Senior Data Scientist to lead the development of our agentic AI platforms. This role is crucial for our E-commerce Analytics team, where you'll… more
    Walmart (08/30/25)
    - Save Job - Related Jobs - Block Source
  • Senior Delivery Consultant AI /ML, AWS…

    Amazon (Dallas, TX)
    …software solutions around large, complex Machine Learning (ML) and Artificial Intelligence ( AI ) systems ? Want to help the largest global enterprises derive ... business value through the adoption and automation of Generative AI (GenAI)? Excited by using massive amounts of disparate data to develop AI /ML models? Eager to… more
    Amazon (07/22/25)
    - Save Job - Related Jobs - Block Source
  • Senior Principal Software Engineer, Generative…

    REI (Seattle, WA)
    AI solutions. + **Reliability:** Participate in 24/7 incident response for AI systems , ensuring reliability and performance. The "Principal" level is ... ethics, bias mitigation, fairness, and interpretability, with a commitment to building AI systems responsibly. + **Problem-Solving:** The ability to navigate… more
    REI (08/13/25)
    - Save Job - Related Jobs - Block Source