
Staff Software Engineer – Platform Architecture, Execution Model
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in Virginia, +1 more state.
• Develop the fundamental execution model, including state machine, lifecycle, resource model, and failure semantics.
• Create platform APIs/SDKs that connect workflows, agents, tools, and product surfaces, while managing versioning and compatibility.
• Ensure correctness by implementing idempotency, deterministic replays, compensating actions, and maintaining data integrity.
• Engineer reliability on a large scale, focusing on concurrency controls, rate limits, backpressure, sharding/partitioning, and workload isolation.
• Integrate security and governance at the core, including RBAC/ABAC, policy enforcement, and fine-grained audit and lineage.
• Deliver observability through distributed tracing, structured logs, metrics, and evaluation hooks, creating an “explainable trail” of agent actions.
• Take ownership of quality through design reviews, test strategies (unit, property, chaos), performance baselines, SLOs, incident response, and postmortems.
• Mentor and assist senior engineers; collaborate with Product, Security, and Customer teams to convert requirements into robust primitives.
• Make practical decisions regarding storage, queueing, and computing, establishing paved roads that facilitate the work of other teams.
• Define system boundaries and minimize cross-service coupling through well-defined architectural patterns.
• Advocate for platform-wide standards concerning correctness, reliability, and API design across different teams.
• Balance immediate delivery needs with long-term architectural integrity, ensuring the platform evolves without accruing systemic risk.
• Over 10 years of experience in building distributed/platform systems, with substantial experience in defining architecture across teams or domains.
• Experience developing mission-critical runtimes or workflow/orchestration systems.
• In-depth knowledge of durable execution, including state machines, event sourcing, saga/compensation, idempotency, and exactly/at-least-once semantics.
• Proven experience with security and governance in production systems (authentication, RBAC, audit, policy management).
• Hands-on experience with observability tools (such as Grafana or similar), including trace correlation across asynchronous boundaries.
• Strong systems design skills across storage, queues, schedulers, and event-driven architectures, with performance tuning experience under load.
• Proficiency in a modern programming language (such as Go, Rust, Java, or TypeScript) and familiarity with cloud-native stacks (containers, CI/CD, IaC).
• Comfortable working in regulated or high-assurance environments, with a strong emphasis on correctness, clarity, and documentation.
• Demonstrated ability to influence technical direction across an organization and foster the adoption of architectural standards.
• Capability to integrate advanced LLM features into system design and platform architecture decisions when appropriate.
• Career advancement opportunities with the potential for rapid growth based on strong performance as the company expands.
• Comprehensive health care, including medical, dental, and vision coverage for you and your family, fully paid by the employer.
• Paid maternity and paternity leave for 14 weeks at the employee's regular pay.
• Unlimited paid time off (PTO), subject to management approval.
• Opportunities for professional development and ongoing education.
• Optional 401K, FSA, and equity incentives available.
• Mental health support is accessible through Tara Mind.
• Cost-effective GLP-1 solutions offered through Crux.
Tether.to
Instrumental Group
Get handpicked remote jobs straight to your inbox weekly.