Remotery

Network Reliability Engineer

Posted 6 days ago

This is a fully remote position, open to applicants in Poland.

📋 Description

• Develop a comprehensive AI infrastructure with capabilities for monitoring, diagnosing, and resolving production incidents.

• Collaborate with other engineering teams to troubleshoot significant production challenges.

• Engage in an on-call rotation to manage incidents and guarantee service continuity.

• Design and uphold observability solutions to track the health of AI infrastructure and applications.

• Assist in the lifecycle management of AI infrastructure across various environments and countries.

• Advocate for and implement best practices concerning stability, resiliency, scalability, and security.

• Ensure the maintenance of clear technical documentation related to tools and procedures.

• Contribute to the evolution of systems and tools informed by production feedback.

• Work closely with development teams to ensure the infrastructure is ready for deployment.

• Take part in team rituals and initiatives for knowledge-sharing.


⛳️ Requirements

• Proficient in Go or Python programming languages.

• Strong scripting abilities, particularly in Bash and Python.

• Practical experience with Linux operating systems (Ubuntu/Debian).

• Preferred experience with GPU and HPC infrastructure.

• Understanding of networking concepts (VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, IPv6, etc.).

• Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Elastic.

• Comfortable working with Infrastructure-as-Code tools (Ansible, Salt, AWX, etc.).

• Experience in managing relational databases (MariaDB).

• Knowledge of CI/CD pipeline processes (GitLab).

• Proficient in English, both written and spoken.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Flexible working hours and remote work opportunities.

• Comprehensive health, dental, and vision insurance.

• Opportunities for professional development and training.

• Collaborative and inclusive work environment.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers