
AI Platform Engineer
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Portugal.
• Design, construct, and maintain Go microservices that facilitate AI model inference, data processing pipelines, and real-time streaming workflows.
• Create scalable APIs (gRPC/REST) that act as a conduit between AI models and production applications.
• Manage the Kubernetes infrastructure (EKS), including deployments, autoscaling policies, service mesh, and monitoring cluster health.
• Implement service-to-service communication utilizing gRPC and message queues (RabbitMQ/SQS) for asynchronous processing.
• Integrate with cloud AI services (AWS Bedrock, OpenAI, Anthropic) and oversee the model serving infrastructure.
• Develop multi-tenant functionalities including authentication (JWT/JWKS), rate limiting, usage tracking, and tenant isolation.
• Collaborate with the Data & AI team to productionize machine learning models—encapsulating them in production-ready services with appropriate health checks, circuit breakers, and graceful degradation.
• Establish comprehensive observability: structured logging, metrics (Prometheus), distributed tracing (Jaeger/Tempo), and alerting.
• Implement CI/CD pipelines and infrastructure-as-code (Terraform) for automated deployments and disaster recovery.
• Ensure high availability through effective monitoring, incident response, and post-mortem analysis.
• Optimize resource usage for GPU workloads and develop cost-efficient scaling strategies.
• Go Expertise: Minimum 3 years of professional Go development experience with a solid grasp of concurrency patterns, interfaces, channels, and error handling.
• Kubernetes Production Experience: At least 3 years of managing production Kubernetes clusters, including deployments, services, ingress controllers, resource management, and troubleshooting.
• Distributed Systems Knowledge: In-depth understanding of the CAP theorem, eventual consistency, idempotency, circuit breakers, and fault-tolerant design.
• gRPC & Async Messaging: Practical experience with gRPC/Protocol Buffers and message queues (RabbitMQ, SQS, Kafka) in production environments.
• Cloud Platform Experience: Strong experience with AWS services (EKS, S3, DynamoDB, Lambda) or equivalent cloud providers.
• DevOps Mindset: Familiarity with Docker, CI/CD pipelines, infrastructure-as-code, and GitOps workflows.
• Spoken language: Proficient communication in English (C1 level); German language skills are an advantage.
• A meaningful responsibility: We develop software to digitize the social care sector, enabling our clients to focus on providing better care and support by giving them more time.
• A flexible remote work model to accommodate your everyday life.
• Engaging and challenging tasks in a dynamic, forward-thinking environment.
• A culture of appreciation and a collaborative working atmosphere within a growing international company with opportunities for involvement.
• A creative work setting with flat hierarchies and quick decision-making processes.
• Attractive compensation packages accompanied by a permanent employment contract.
Attio
TechBiz Global
Get handpicked remote jobs straight to your inbox weekly.