
Observability Architect
Posted 7 hours ago

Posted 7 hours ago
This is a fully remote position, open to applicants in Canada.
• Define and take ownership of the enterprise-level observability architecture, establishing technical standards, reference architectures, and long-term roadmaps.
• Assess, select, and standardize observability tools (e.g., Grafana, Prometheus, VictoriaMetrics, Tempo, Loki, Elastic Stack, OpenTelemetry) to minimize tool proliferation and enhance total cost efficiency.
• Architect scalable data pipelines and storage solutions capable of ingesting and querying petabyte-scale telemetry data across various metrics, traces, logs, and continuous profiling.
• Create Terraform modules and Helm charts for the declarative provisioning of observability infrastructure in multi-cloud environments.
• Set and uphold instrumentation standards utilizing the OpenTelemetry framework, including SDK guidelines, collector deployment patterns, and semantic conventions.
• Define and advocate for SLO/SLI/error-budget frameworks across engineering teams, offering architectural guidance on the implementation of service-level objectives.
• Act as a senior escalation resource during critical incidents, utilizing extensive observability expertise to expedite diagnosis and resolution.
• Provide architectural mentorship and technical guidance to Observability Engineers and SRE team members.
• 5-8 years of experience in Observability Architecture, Site Reliability Engineering (SRE), or Platform/Infrastructure Engineering.
• Post-secondary Diploma/Degree in Engineering, Computer Science, or a related discipline.
• Mastery of the OpenTelemetry ecosystem and extensive knowledge of Prometheus-compatible metrics systems (VictoriaMetrics, Thanos, etc.).
• Advanced proficiency with tracing systems (Grafana Tempo, Jaeger) and log aggregation platforms (Loki, Elasticsearch, Google BigQuery).
• Expert-level understanding of cloud infrastructure (GCP strongly preferred) and Kubernetes architecture.
• Strong software engineering capabilities in Go, Python, or similar languages for developing cloud-native tools.
• Exceptional communication skills with the ability to influence technical direction across organizational boundaries.
• Preferred certifications: Google Cloud Professional Cloud Architect or Certified Kubernetes Administrator (CKA).
• Flexible working arrangements
• Home office reimbursement program
• Baby bonus & parental leave top-up program
• Online learning and networking opportunities
• Electric vehicle purchase incentive program
• Competitive medical and dental benefits
• Retirement savings program
Higharc
RTX
TEHORA inc.
CD PROJEKT SA
Get handpicked remote jobs straight to your inbox weekly.