- Meta (Menlo Park, CA)
- …our network engineering teams is for you! **Required Skills:** Network Engineer , HPC Systems Network Strategy Responsibilities: 1. Design, ... you will be responsible for conceiving, developing, and deploying software, hardware and network systems and tools that improve reliability and efficiency in our… more
- Randstad US (Mountain View, CA)
- staff hpc engineer . + mountain view ,...repetitive tasks or tools to enhance support of the HPC systems System performance analysis and tuning ... details job summary: Randstad Federal is seeking a Staff HPC Engineer for a role supporting NASA...A minimum of 5 years experience of administration of HPC systems and scheduling software (PBS, Slurm,… more
- NVIDIA (Santa Clara, CA)
- …UCX for Deep Learning and HPC . We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC ... are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are… more
- Amazon (Cupertino, CA)
- …to improve network speed and performance so that customers can scale their network intensive ML and HPC workloads over thousands of CPUs, GPUs and TPUs ... blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience....experience. More and more customers run their ML and HPC workloads on AWS to reap the benefits of… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Lead ... deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition, we… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up ... and deployment of HPC infrastructure and troubleshooting...tasks such as shell scripting. Role: IT -Sr Staff Systems Engineer Location on-site (not remote): San… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the ... and life cycle of servers in production. **Required Skills:** Production Systems Engineer , Sustaining Responsibilities: 1. Develop robust, industry leading… more
- NVIDIA (Santa Clara, CA)
- …InfiniBand with IBOP and RDMA as well as u nderstanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads + Familiarity with deep ... join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...usage of all datacenter resources including compute , storage, network and power. You will help build methodologies, tools… more
- Amazon (Cupertino, CA)
- …SRE (Site Reliability Engineering), or Resilience Engineering - 7+ years of SysDE ( Systems Development Engineer ) or equivalent experience - 7+ years of server ... that enable high performance and scalability in AI/ML and HPC workloads. You are intrigued by the continuous release...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more
- NVIDIA (Santa Clara, CA)
- …wave of artificial intelligence. We are looking for a highly motivated senior software engineer for an exciting role in our communication libraries and network ... crew that develops and maintains software for complex heterogeneous computing systems that power disruptive products in High Performance Computing and Deep… more
- Meta (Menlo Park, CA)
- **Summary:** In this role, you will be a member of the Network .AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis. We are actively seeking Software… more
- quadric.io, Inc (Burlingame, CA)
- …battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems . Unlike other NPUs or neural network accelerators in the ... co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of...C++ DSP and control code. Role: The Field Application Engineer (FAE) will work closely with Business Development, Product,… more
- Microsoft Corporation (Mountain View, CA)
- …or related field AND 4+ years technical experience in software engineering, network engineering, service engineering, or systems engineering o OR equivalent ... and sustainability related to Microsoft cloud hardware. We are looking for an engineer who is customer-focused with the insight and industry knowledge to envision… more
- NVIDIA (Santa Clara, CA)
- …including distributed storage, scheduling, and orchestration among compute, storage and network + Configuring and troubleshooting hardware, operating systems , ... We are seeking a Sr System Software Engineer to help us build out our scientific...post processing of data. + Optimize compute, storage and network architecture specific to physics & simulation driven applications.… more
- NVIDIA (Santa Clara, CA)
- We are looking for a software engineer for our Sparse Linear Algebra team which develops key technologies and libraries such as cuSOLVER, cuSPARSE, cuDSS, and AmgX, ... come and join our team! What you will be doing: + developing scalable HPC math library software for various numerical methods including but not limited to sparse… more
- NVIDIA (Santa Clara, CA)
- …+ Adapt and enhance communication libraries to seamlessly support innovative network topologies and system architectures. + Design or adapt optimized storage ... environments. + Strong background in parallel programming and distributed systems + Experience analyzing and optimizing large scale distributed applications.… more
- Lenovo (San Jose, CA)
- …HyperV, Windows Server, Nutanix, Microsoft Azure, Systems management products, and x86 systems + Knowledge of OpenStack, SAP, HPC , SQL, Azure, NetApp storage ... fuel the advancement of 'New IT' technologies (client, edge, cloud, network , and intelligence) including server, storage, mobile, software, solutions, and services.… more