- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- NVIDIA (Santa Clara, CA)
- …analyzing and tuning performance for a variety of AI / HPC workloads. Excellent problem-solving to analyze complex systems , identify bottlenecks, and ... and implement GPU compute clusters for deep learning and high- performance computing. What you'll be doing: + Provide leadership...storage systems like Lustre and GPFS for AI / HPC workload. Experience working with deep learning… more
- Cisco (San Jose, CA)
- …future of AI infrastructure - we'd love to meet you. **Impact** As **High- performance AI compute engineer** , you will be instrumental in defining and ... Principal Engineer - HPC , AI Infrastructure Apply (https://jobs.cisco.com/jobs/Login?projectId=1445895) + Location:San Jose, California, US + Area of… more
- Amazon (Austin, TX)
- …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you enjoy tackling large analytical problems as ... - helping them envision and build the future of high- performance computing. Your technical solutions and insights will shape...solutions and insights will shape how partners transform their HPC approaches for the AI era. AWS… more
- NVIDIA (Santa Clara, CA)
- …fit for you, we'd love to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters...and platforms, such as HCAs, Switches, CPUs, GPUs, and Systems . You will develop performance analysis tools… more
- NVIDIA (Santa Clara, CA)
- …group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, develop, and ... and usable. + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
- Amazon (Santa Clara, CA)
- …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more
- General Dynamics Information Technology (Vicksburg, MS)
- …expectations. Familiarity with commonly used HPC services (ie high performance file systems , modules for installing applications, compilers, MPI, OpenMP, ... career. At GDIT, people are our differentiator. As a HPC Computational Scientist supporting High Performance Computing...expert computational support for users of the supercomputing computing systems within the DoD. The team serves as a… more
- NVIDIA (Santa Clara, CA)
- …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
- NVIDIA (Santa Clara, CA)
- …analyzing and tuning performance for a variety of AI / HPC workloads. Excellent problem-solving to analyze complex systems , identify bottlenecks, and ... deploy, and operate GPU Compute Clusters for EDA and high- performance computing workloads used across multiple teams and projects.... systems such as Lustre and GPFS for AI / HPC workload. + Familiarity with metrics collection… more
- Texas A&M University System (College Station, TX)
- …patching, and performance tuning.* Oversee networking, security, and infrastructure for HPC systems .* Lead the development of specialized HPC computing ... research and super computing needs. As a Senior High Performance Computing Engineer ( HPC ), you will provide...expertise and consultation for the design and deployment of HPC systems . Get in on the ground… more
- NVIDIA (Santa Clara, CA)
- …management, and fabric scalability. + Experience working with benchmarking tools and performance analysis for large-scale HPC / AI networking deployments. + ... engine of modern Artificial Intelligence, Advanced Networking, and High Performance Computing ( HPC ) - the biggest technology...Published work, patents, or advanced certifications in networking or HPC systems . NVIDIA is widely considered to… more
- GliaCell Technologies (MD)
- …develops, tests, deploys, documents, maintains, and enhances complex and diverse software for HPC (high performance computing) systems based upon documented ... requirements. + The HPC systems might include, but are not limited to, processing-intensive analytics, novel algorithm development, manipulation of extremely… more
- NVIDIA (Santa Clara, CA)
- …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
- Amazon (Cupertino, CA)
- Description We are seeking an experienced engineer to work on distributed AI /ML systems . This role involves working on collective operations - the fundamental ... operations that enable AI to scale across multiple accelerators & servers. Most...building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. We… more
- NVIDIA (Santa Clara, CA)
- …Machine Learning ecosystems. You'll be called on to help architect and scale high- performance , distributed AI infrastructure on-prem or in the cloud built with ... with experience in validation and debugging of large-scale GPU clusters focused on performance . As part of the Solution Architecture organization, we work with the… more
- University of Maine System (Orono, ME)
- …research community and its collaborators by enabling and growing the use of high- performance computing in the research enterprise. ARCSIM is a component of the ... of the Graduate School. The position will have an active role in shaping what HPC resources are available, who can access those resources, how to get data to those… more
- NVIDIA (Santa Clara, CA)
- …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
- NVIDIA (Santa Clara, CA)
- …improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of… more