• Software Engineer - AI/ML, AWS Neuron…

    Amazon (Cupertino, CA)
    …- Preferred previous software engineer expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training . The group ... Web Services (AWS) is looking for a Software Development Engineer II to build, deliver, and maintain complex products...stable diffusion, Vision Transformers and many more. The ML Distributed Training team works side by side… more
    Amazon (01/12/26)
    - Save Job - Related Jobs - Block Source
  • Sr. Software Engineer - AI/ML, AWS Neuron…

    Amazon (Cupertino, CA)
    …as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler ... accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for...engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience… more
    Amazon (12/19/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , Google…

    Google (Sunnyvale, CA)
    Senior Software Engineer , Google Distributed Cloud, Kubernetes _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving ... year of experience with software design and architecture for distributed systems. **Preferred qualifications:** + Master's degree or PhD...on and is growing every day. As a software engineer , you will work on a specific project critical… more
    Google (12/20/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , Atlas…

    Rubrik (Palo Alto, CA)
    …heart of this transformation is our Atlas platform. We are looking for an experienced distributed systems engineer to guide us through the next stage of the ... the edge, or in the cloud. It is a distributed , scale-out, fault tolerant, performant, deduplicated user-space filesystem that...evolution of our data platform. As an engineer in the team, you'll design, develop and deliver… more
    Rubrik (12/17/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer III, Infrastructure,…

    Google (Sunnyvale, CA)
    Software Engineer III, Infrastructure, Google Distributed Cloud _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving ... + 2 years of experience with developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies,...on and is growing every day. As a software engineer , you will work on a specific project critical… more
    Google (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer , Google…

    Google (Sunnyvale, CA)
    Site Reliability Engineer , Google Distributed Cloud, Connected SRE _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... projects. + 3 years of experience designing, analyzing, and troubleshooting distributed systems. **Preferred qualifications:** + Master's degree in Computer Science… more
    Google (12/18/25)
    - Save Job - Related Jobs - Block Source
  • Staff Systems Development Engineer , Google…

    Google (Sunnyvale, CA)
    Staff Systems Development Engineer , Google Distributed Cloud _corporate_fare_ Google _place_ New York, NY, USA; Seattle, WA, USA; +2 more; +1 more **Advanced** ... experience working with vendors or customers. + Experience as a Customer Solution Engineer . + Experience with physical servers, storage, and network devices, as well… more
    Google (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , Google…

    Google (Sunnyvale, CA)
    Senior Software Engineer , Google Distributed Cloud Hosted _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving ... and architecture. + 3 years of experience developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage or… more
    Google (01/07/26)
    - Save Job - Related Jobs - Block Source
  • Distinguished Software Engineer (Data…

    Palo Alto Networks (Santa Clara, CA)
    …Summary** At Palo Alto Networks, we are redefining cybersecurity. As a Distinguished Engineer on the Enterprise DLP team, you will be the foremost technical leader ... all network, cloud, and user vectors. **Key Responsibilities** As a Distinguished Engineer , you will own the long-term technical direction and execution for all… more
    Palo Alto Networks (12/23/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer - Distributed

    Rubrik (Palo Alto, CA)
    …or Networking domain + Strong fundamentals in data structures, algorithms, and distributed systems design + Strong background in Systems Programming + Expertise in ... Proficient in Python, Go, and either C++, Java, or Scala + Large distributed systems design and development experience is preferred + Knowledge of Storage,… more
    Rubrik (12/30/25)
    - Save Job - Related Jobs - Block Source
  • Principal Staff Software Engineer , AI…

    LinkedIn (Mountain View, CA)
    …problems. + Designing, implementing, and optimizing the performance of large-scale distributed training for personalized recommendation as well as large ... LLMs, GNNs, Incremental Learning, Online Learning, and advanced LLM Agents work for Training infrastructure. As a Principal Staff Software Engineer on the AI… more
    LinkedIn (12/25/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer , Foundation Model…

    NVIDIA (Santa Clara, CA)
    …product roadmaps. What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + ... NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for...and AI infrastructure; + Proven experience designing and optimizing distributed training systems with frameworks like PyTorch,… more
    NVIDIA (12/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , AI Platform

    LinkedIn (Mountain View, CA)
    …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Online Learning and Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the AI Training more
    LinkedIn (12/05/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , AI Platform

    LinkedIn (Mountain View, CA)
    …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Online Learning and Serving performance optimizations across billions of user queries Model Training Infrastructure: As an engineer on the AI Training more
    LinkedIn (10/21/25)
    - Save Job - Related Jobs - Block Source
  • Apprentice Engineer - Backend

    LinkedIn (Mountain View, CA)
    …fundamentally believe top talent can come from anywhere, regardless of educational training or professional experience. REACH Program REACH is a multi-year program ... set and gain the experience needed to become an Engineer at LinkedIn. The time each apprentice spends in...dramatic growth in membership and products. You will utilize distributed systems and algorithms, develop applications at scale, learn… more
    LinkedIn (01/13/26)
    - Save Job - Related Jobs - Block Source
  • Sr. Staff Software Engineer , AI Infra

    LinkedIn (Mountain View, CA)
    …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Online Learning and Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the AI Training more
    LinkedIn (12/27/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , Threat Infrastructure…

    Google (Sunnyvale, CA)
    Software Engineer , Threat Infrastructure and Detection, AI Security _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving ... + 2 years of experience with developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies,...or more of these programming languages as a back-end engineer : Java, Go. + Experience in one or more… more
    Google (01/09/26)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    …NCCL has been integrated into PyTorch and is on the critical path of multi-GPU distributed training . In other words, nearly every distributed GPU-based ML ... full-stack distributed ML reliability and performance (eg Large-Scale GenAI/LLM training ) from the trainer down to the inter-GPU and network communication layer.… more
    Meta (12/20/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , Machine…

    Google (Sunnyvale, CA)
    …Enhance model and system performance for both low-latency inference and large-scale distributed training workloads. + Develop post- training algorithms, such ... speed and reduce memory consumption on modern GPU and TPU architectures. + Engineer custom kernels to maximize training efficiency for memory-bound large models… more
    Google (12/27/25)
    - Save Job - Related Jobs - Block Source
  • Staff Machine Learning Engineer , AI…

    General Motors (Sunnyvale, CA)
    …model training performance analysis and optimizaiton solutions to scale distributed training workflows and maximize resource utilization across heterogeneous ... experience + 3+ years specialized experience in AI/ML infrastructure, eg, enabling distributed training for scaling large ML models + Strong programming… more
    General Motors (10/22/25)
    - Save Job - Related Jobs - Block Source