This is a fully remote position, open to applicants in Chile.

📋 Description

• Proactive Monitoring: Continuous surveillance of dashboards and alerts (physical infrastructure, virtual and services) to ensure 99.999% availability.

• Incident Management (Triage): Reception, categorization, and prioritization of alerts.

• Ticket Management: Rigorous opening and follow-up of tickets following ITIL methodologies.

• Initial Technical Resolution: Diagnosis and resolution of low to medium complexity issues (e.g., service restarts, log cleaning, quota adjustments, basic connectivity checks).

• Structured Escalation: When complexity exceeds the initial level, escalate to L1/L2 by providing a comprehensive technical report (logs, network traces, reproduction steps, and client context).

• Case Documentation: Keep the event log and knowledge base (KB) updated regarding recurring incidents.

• External Communication: Notify clients about health statuses, maintenance windows, and ongoing incidents clearly and promptly.

• Health Checks: Conduct periodic health validation routines on production platforms.

• Ensure compliance with SLA regarding incidents and network and service availability.

• Generation and analysis of availability reports for platforms.

⛳️ Requirements

• At least 1-2 years in monitoring centers (NOC), first-level technical support, or systems administration.

• Experience in ticket management and support processes (Jira, ServiceNow, or others), including clear documentation of diagnosis, evidence, and communication.

• Proficiency in Monitoring/Observability tools such as Prometheus, Grafana, Elasticsearch, Opensearch, OpenNMS.

• Ability to read and interpret metrics, events, logs, and alarms.

• Experience with production-critical systems, including incident management, coordination of production actions, escalation, and effective communication.

• Degree in Computer Engineering, Systems Engineering, Electronic Engineering, or a related field.

• Experience with Linux in production environments: troubleshooting services and the operating system (systemd, journalctl), permissions/users, processes, filesystem, and networks.

• Networking in Linux: configuration and diagnosis of interfaces, VLANs, routes, bonding, and MTU; troubleshooting with tools like tcpdump (sniffing), ip, ss, ethtool, ping/traceroute.

• Kubernetes: operation/administration and troubleshooting in production (Pods, Deployments/DaemonSets, Services, events/logs, readiness/liveness; basic knowledge of storage PV/PVC).

• Virtualization: experience operating and supporting virtualized environments (KVM/VMware/Hyper-V or others), including diagnosis of common computing, network, and storage failures.

• Automation: ability to resolve repetitive tasks using Bash and Ansible and/or Python (information gathering, operational checks, basic remediation, secure production scripts).

• Intermediate English skills for reading/writing technical documentation, updating stakeholders, and interacting with vendors/manufacturers for support cases.

🏝️ Benefits

• Private medical insurance for you and your family.

• Language courses to ensure your growth knows no boundaries.

• Access to courses, books, materials, and reimbursement for certifications.

• A minimum of 15 vacation days, one day off for your birthday, and extra breaks before National Holidays, Christmas, and New Year.

• Performance bonuses and project success incentives.

• Budget for recreational activities and team-building.

• Cutting-edge technology: We renew your equipment every 3 years... and it’s yours at the end of the period!

Cloud NOC Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

NOC Engineer, Junior NOC Engineer

Senior Network Operations Technical Lead

NOC Engineer

NOC Engineer

Cloud NOC Engineer

Network Ops & Observability Architect

Never miss a great job!