
Staff Software Engineer – Cloud Platform, Kafka
Posted Jun 19

Posted Jun 19
This is a fully remote position, open to applicants in California.
• Design, provision, and oversee Apache Kafka clusters (self-managed on GCP/AWS or utilizing Confluent Platform / MSK).
• Configure and optimize brokers, ZooKeeper/KRaft, topics, partitions, replication factors, and retention policies to achieve high throughput and minimal latency.
• Conduct cluster upgrades, rolling restarts, and broker replacements without any downtime.
• Implement and manage Kafka Connect pipelines for data ingestion and egress across diverse systems.
• Administer Kafka Streams and ksqlDB deployments for real-time stream processing tasks.
• Maintain the Schema Registry and uphold schema governance standards across teams.
• Define and monitor SLIs/SLOs for consumer lag, throughput, end-to-end latency, and broker health.
• Design and establish cloud infrastructure using Infrastructure as Code (IaC) – Terraform.
• Develop automated deployment pipelines for Kafka configuration changes utilizing GitOps workflows (ArgoCD, Flux).
• Create self-service tools and runbooks to alleviate manual tasks for development teams.
• Automate topic provisioning, ACL management, and schema registration through APIs and CLI tools.
• Integrate tools such as GitLab CI/CD or Cloud Build for automated testing and deployment.
• Ensure seamless integration of data pipelines with other GCP services like BigQuery and Cloud Storage.
• Monitor and enhance the performance, reliability, and cost-effectiveness of Kafka and streaming pipelines.
• Adopt security best practices for GCP resources, including IAM policies, encryption, and network security.
• Ensure Observability is a fundamental aspect of the infrastructure platforms, providing adequate visibility into their health, utilization, and costs.
• Collaborate extensively with cross-functional teams to comprehend their requirements; educate them through documentation and training, and enhance the adoption of platforms/tools.
• 10+ years of overall experience in DevOps cloud engineering or data engineering.
• 5+ years of experience working with Kafka at a production scale.
• Deep knowledge of Kafka internals: replication protocol, log compaction, consumer group coordination, partition leadership, and KRaft mode.
• Proficiency in container orchestration (Kubernetes / Helm) and deploying Kafka using Strimzi, Confluent Operator, or equivalent.
• Strong understanding of networking (VPC, peering, private endpoints, DNS, load balancing) within cloud environments.
• Hands-on experience with Kafka Connect, Schema Registry, and at least one stream processing framework (Kafka Streams, Flink, Spark Structured Streaming).
• Proficiency in Google Cloud Platform (GCP) services, including Dataflow, Pub/Sub, Kafka, Dataproc, BigQuery, and Cloud Storage.
• Expertise in Infrastructure as Code (IaC) tools such as Terraform or Cloud Deployment Manager.
• Familiarity with data orchestration tools like Apache Airflow or Cloud Composer.
• Experience with CI/CD tools like Jenkins, GitLab CI/CD, or Cloud Build.
• Knowledge of containerization and orchestration tools such as Docker and Kubernetes.
• Strong scripting abilities for automation (e.g., Bash, Python).
• Experience with monitoring tools like Cloud Monitoring, Prometheus, and Grafana.
• Familiarity with logging tools such as Cloud Logging or the ELK Stack.
• Strong problem-solving and analytical capabilities.
• Excellent communication and collaboration skills.
• Ability to thrive in a fast-paced, agile environment.
• As part of the total compensation package, this role may be eligible for a bonus.
GSB Solutions
General Dynamics Information Technology
Qualifacts
SD Solutions
Get handpicked remote jobs straight to your inbox weekly.