
Lead Engineer – Platform Performance, Reliability
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in India.
• Enhance performance and decrease latency:
• Develop architectural plans for the integration of new features related to custom objects.
• Identify and eliminate bottlenecks in backend services, APIs, and message flows.
• Profile Node.js services (CPU, heap, event loop) and optimize critical paths for better efficiency.
• Fortify platform reliability:
• Increase resilience through batching, caching, pooling, concurrency controls, and backpressure mechanisms.
• Strengthen services against cascading failures and slow dependencies.
• Implement rate-limiting, queuing, and circuit-breaker patterns that can scale under load.
• Enhance database queries, indexing strategies, denormalization, and read/write paths.
• Collaborate on infrastructure and operations:
• Partner with SRE and Infrastructure teams on autoscaling, capacity planning, quotas, and workload efficiency.
• Contribute to runtime configuration enhancements (GKE, Node.js, Redis, Pub/Sub, Firestore, ClickHouse).
• Engage in performance incident reviews and facilitate actionable root-cause resolutions.
• Develop enduring performance frameworks:
• Establish best practices for high-performance microservices, distributed patterns, and observability.
• Create reusable performance tools, dashboards, and profiling workflows.
• Mentor engineers in writing scalable code, interpreting metrics, and designing reliable services.
• Over 7 years of backend engineering experience with a focus on large-scale systems.
• Extensive knowledge of Node.js internals (event loop, memory model, async behavior).
• Expertise in diagnosing performance issues utilizing CPU/heap profilers, tracing, and metrics.
• Strong understanding of Microservices, Distributed systems, High-throughput APIs, Caching strategies, Queuing/Backpressure patterns, and Rate limiting/load balancing.
• Experience with databases such as MongoDB, Postgres/MySQL, Firestore, Redis, ClickHouse, or similar.
• Familiarity with Kubernetes, GCP/AWS, and observability tools (Grafana, Prometheus, OpenTelemetry).
• Possess excellent communication skills and the ability to articulate bottlenecks clearly without assigning blame.
• A mindset that values every millisecond, query, and allocation as opportunities for thoughtful optimization.
• EEO Statement: The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
• #LI-Remote #LI-HB1
Webedia
TechBiz Global
The Flex
Nodeworthy
Get handpicked remote jobs straight to your inbox weekly.