Hpc Operations Engineer

Hpc Operations Engineer
Empresa:

Coreweave Europe


Detalles de la oferta

We are looking for people willing to work in two shifts from 7am to 9pm. This is fully remote within Spain. Successful candidates will be expected to attend onboarding training at our US Headquarters for up to 2 weeks within their first month of employment. About the role: The High Performance Computing Operations team is responsible for the day-to-day provisioning, management and uptime of CoreWeave's ever-expanding fleet of server nodes. Playing a central role in CoreWeave's growth strategy, this team is on the front line for configuration, updates and remote troubleshooting of our highest tier of supercomputing clusters and their networking, delivery platforms and tools dependencies. You will be in a daily battle with the forces of entropy to maximise the number of nodes CoreWeave can deliver to customers.
We are seeking curious, creative and persistent problem solvers to join our HPC Operations team to help us drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and turned on.
Key Responsibilities: Install, configure, and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUsTroubleshoot hardware and software issues; escalate and coordinate as needed with data centre, network and platform teams to drive resolutionMonitor and analyse system performance and take appropriate remediation actions for cloud healthApproach your work with flexibility and optimism anticipating shifting business and technical prioritiesCreate and maintain documentation of team processes, knowledge and best practices for system managementThink critically about your day-to-day work and work collaboratively to improve team processes and efficiencySuccessful candidates typically share the following skills and experience: Experience troubleshooting or administering data center or on-prem infrastructure (servers, storage, network or a mix)Strong understanding of Linux system administration and networking conceptsAbility to troubleshoot hardware and software issues and perform system maintenance tasks consistently and reliablyIdeal candidates may also have experience in one or more of these: Software development or scripting languages (bash, python, powershell, etc)Grafana, prometheus, promsql queries or similar observability platformsData centre environments including server racks, HVAC systems, fiber traysKubernetes administrationThe salary for this position ranges from 34,000€ to 38,000€ plus competitive benefits. Pay is based on a number of factors including job-related knowledge, skills, and experience.

#J-18808-Ljbffr


Fuente: Jobleads

Requisitos

Hpc Operations Engineer
Empresa:

Coreweave Europe


Network Engineer

Key Roles and Responsibilities:Maintaining and administering computer networks and related computing environments including systems software, applications so...


Desde Ntt - Madrid

Publicado a month ago

Regador/A Fijo Discontinuo Madrid Imprescindible Coche

Desde Gi Group, la firma de servicios de trabajo temporal y permanent placement de Gi Group Holding, multinacional italiana líder en soluciones de Recursos H...


Desde Gigroup - Madrid

Publicado a month ago

Operation Engineer

Nuclear Operations Engineer This full-time opportunity is based in Spain, remote work is possible. This role requires exceptional performance under pressure,...


Desde Westinghouse Electric Company Llc. - Madrid

Publicado a month ago

Técnico/A De Calderas En Galapagar

Cronoshare es una plataforma online para profesionales que quieren encontrar nuevos clientes. Buscamos Técnico/a de calderas en Galapagar y alrededores. Pert...


Desde Cronoshare.Com - Madrid

Publicado a month ago

Built at: 2024-05-24T21:07:00.884Z