
Senior AI Data Engineer
Posted 23 hours ago

Posted 23 hours ago
This is a fully remote position, open to applicants in Cyprus.
• Design, construct, and scale powerful ETL/ELT pipelines tailored for AI workloads, which include RAG, fine-tuning, and batch inference.
• Convert unstructured data sources such as PDFs, logs, and transcripts into structured and vectorized formats that are suitable for LLM consumption.
• Oversee and automate the data-to-model lifecycle, ensuring that AI knowledge bases stay updated with evolving business data.
• Create and sustain real-time feature pipelines that facilitate low-latency AI and machine learning applications.
• Integrate data platforms with Kafka and other event-driven systems to support real-time processing and AI-driven responses.
• Manage and enhance Feature Stores to maintain consistency between model training and production environments.
• Implement automated data quality controls and validation processes to guarantee the reliability and accuracy of data used for AI training and inference.
• Establish and uphold data lineage frameworks to ensure traceability, auditability, and regulatory compliance across data workflows.
• Uphold data security, privacy, and governance standards, including the protection of PII and compliance with industry regulations.
• Manage data movement and synchronization across on-premises systems, cloud platforms, and data warehouses.
• Optimize data storage and retrieval strategies for Vector Databases to support high-performance RAG and AI search workloads.
• Collaborate with Data Scientists, ML Engineers, Software Engineers, and business stakeholders to deliver scalable AI data solutions.
• Over 10 years of experience in Data Engineering or Backend Engineering with a strong emphasis on data platforms and pipelines.
• More than 2 years of hands-on experience supporting AI/ML data pipelines, including data preparation for machine learning and generative AI applications.
• Expert-level proficiency in Python and SQL; familiarity with Java or Scala is a plus.
• Extensive experience in building and maintaining real-time data streaming solutions using Apache Kafka, Flink, or Spark Streaming.
• Practical experience with modern data orchestration and transformation tools such as Airflow, dbt, and Prefect.
• Experience with Vector Databases and Feature Stores for supporting AI and machine learning workloads.
• Strong knowledge of cloud-based data services on AWS, Azure, or GCP, including services like Glue, Kinesis, Data Factory, or Dataflow.
• Experience in deploying and managing data workloads within Kubernetes (K8s) environments.
• Proven experience managing sensitive data within regulated industries such as Fintech, Healthcare, or other compliance-focused environments.
• Strong understanding of best practices related to data quality, governance, security, and privacy.
• A Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a related technical field. Equivalent practical experience will also be taken into account.
• Exceptional problem-solving abilities and the capacity to collaborate effectively with cross-functional engineering, data, and AI teams.
• Health insurance
• Professional development opportunities
HubSpot
Prima
Get handpicked remote jobs straight to your inbox weekly.