
Senior Platform Engineer
Posted 11 hours ago

Posted 11 hours ago
This is a fully remote position, open to applicants in Colorado.
• Design, develop, and operationally oversee automated, resilient, high-availability, self-healing, secure platforms equipped with native-AI capabilities to meet IT requirements for both internal and customer business functions.
• Create and manage the Observability OpenTelemetry Central Backend Stack, including Grafana Enterprise, Mimir, Loki, Tempo, and Alertmanager on Kubernetes/RKE2 using Helm and GitLab CI-CD.
• Construct and oversee Infrastructure as Code (IaC) and CI-CD processes for automated provisioning and deployment, incorporating Terraform modules for infrastructure, VM, and storage provisioning, as well as Ansible AWX playbooks for OS/application bootstrap, ArgoCD, and Helm for Kubernetes configuration.
• Develop and maintain an OpenTelemetry Prometheus scrape profile library, which includes SNMP exporters, REST API exporters, and cloud provider exporters (CloudWatch, Azure Monitor, GCP) for various device classes.
• Establish AIOps capabilities on platforms for observability use cases, such as anomaly detection integrations, event correlation rules in Alertmanager, and synthetic monitoring patterns to minimize alert noise.
• Configure and sustain Zabbix auto-discovery, encompassing network range scanning, device classification, and Prometheus service discovery integration.
• Construct and secure Edge Stack deployments (Prometheus + OTel collector) for each data center site utilizing GitOps templates.
• Integrate Alertmanager with ServiceNow, focusing on webhook routing, ticket enrichment, auto-close logic, and escalation policy configuration.
• Ensure platform security through Conjur/CyberArk secret injection at runtime, mTLS between stack components, and RBAC in Grafana Enterprise.
• Author and manage Grafana dashboards in JSON/GitLab, covering facility overview, network health, RED metrics, and application telemetry.
• Mentor mid-level engineers, lead code reviews, and establish engineering standards within the team.
• Represent platform engineering during cross-functional architecture reviews and executive-level program updates.
• Perform additional duties as necessary and assigned.
• 5+ years of experience in DevOps/Automation within a production environment.
• Proficiency in Kubernetes (RKE2/k3s), Helm chart deployment, system services, and Docker/container technology.
• 4+ years of experience in LGTM Stack Development and Configuration, including Grafana, Mimir, Loki, Tempo configuration, tuning, dashboarding, and production operations; Prometheus is required.
• 5+ years of senior-level experience in Python/scripting frameworks, including automation scripts, exporter development, GitLab pipeline scripting, and REST API integrations.
• 5+ years of expertise in GitOps/CI/CD, including GitLab CI/CD pipeline authoring; Terraform and Ansible as primary IaC tools; ArgoCD or Flux is preferred.
• 2+ years of experience in AIOps/Observability Engineering, focusing on Alertmanager rule authoring, anomaly detection integrations, event correlation, and noise reduction techniques.
• 5+ years of working knowledge in infrastructure (Linux/VM) management, including Linux administration, VMware vCenter/VCF experience, Netapp storage management, and network fundamentals (SNMP, TCP/IP).
• 2+ years of experience in Secrets Management with CyberArk/Conjur, HashiCorp Vault, or equivalent, focusing on runtime secret injection patterns.
• Minimal travel may be required.
• Medical, Telehealth, Dental, and Vision coverage.
• 401(k) plan.
• Health Savings Accounts (HSA) and Flexible Spending Accounts (FSA).
• Life insurance and AD&D.
• Short-Term and Long-Term disability coverage.
• Flexible Paid Time Off (PTO).
• Leave of Absence options.
• Employee Assistance Program.
• Wellness Program.
• Rewards and Recognition Program.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.