
Senior/Staff/Principal Software Engineer β Observability Engineering
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in New York.
β’ Take ownership of the complete design and execution of the AppGate observability fabric, which includes telemetry SDKs in our clients and gateways, the LogForwarder pipeline, and customer-side integrations.
β’ Make essential technical choices regarding transport protocols, sampling strategies, schema design, and correlation models that influence the scalability of our platform to handle hundreds of millions of events daily.
β’ Facilitate the development of advanced capabilities such as OpenTelemetry-Native Telemetry Fabric, High-Cardinality Data Pipeline, End-to-End Distributed Tracing, On-Demand Packet Capture, and more.
β’ Establish the telemetry schema, correlation model, transport, and sampling strategies that encompass client devices, controllers, and gateways.
β’ Validate at Customer Scale: Conduct tests in lab settings that replicate our largest deployments and proactively identify cardinality explosions and pipeline backpressure before they reach customers.
β’ Drive Integration Standards: Manage the OTLP, Prometheus, and JSON-log compatibility surface while ensuring successful ingestion into platforms like Datadog, Splunk, Nexthink, and Elastic.
β’ Collaborate Cross-Functionally: Engage directly with product teams, R&D, and key customers in defense and critical infrastructure to define requirements and achieve meaningful results.
β’ Over 8 years of engineering experience, with a minimum of 4 years focused on observability, telemetry, or large-scale data infrastructure (Datadog, Splunk, Elastic, Honeycomb, New Relic, Grafana Labs, or similar).
β’ Profound knowledge of OpenTelemetry: OTLP, the OTel Collector, semantic conventions, context propagation, and head/tail sampling β you're well-versed in discussing trade-offs.
β’ Experience with distributed tracing in a production environment: You have designed or made significant contributions to a tracing system that manages real customer traffic, beyond just a side project.
β’ Proven experience with high-throughput pipelines: Hands-on involvement with systems processing over 100M events daily, including back-pressure management, batching, and storage considerations.
β’ Strong skills in systems programming: Experience in production environments using Go and/or Rust is preferred. Familiarity across the stack, from agent code to backend services, is essential.
β’ Fluency in networking and security: Comfortable with TLS, DNS, TCP, and identity protocols. Prior experience in ZTNA, SASE, or SD-WAN is a significant advantage.
β’ Mindset: Pragmatic, opinionated, and focused on impact. You understand the right moments to prototype and when to deliver.
β’ Competitive salary and performance-based bonuses.
β’ Comprehensive health, dental, and vision insurance.
β’ Generous paid time off and holiday schedule.
β’ Opportunities for professional development and continuous learning.
β’ Flexible work arrangements and a supportive team culture.
Synera
Stefanini Brasil
Roblox
Kapres Technology
Get handpicked remote jobs straight to your inbox weekly.