This is a fully remote position, open to applicants in Cyprus.

📋 Description

• Design, construct, and scale powerful ETL/ELT pipelines tailored for AI workloads, which include RAG, fine-tuning, and batch inference.

• Convert unstructured data sources such as PDFs, logs, and transcripts into structured and vectorized formats that are suitable for LLM consumption.

• Oversee and automate the data-to-model lifecycle, ensuring that AI knowledge bases stay updated with evolving business data.

• Create and sustain real-time feature pipelines that facilitate low-latency AI and machine learning applications.

• Integrate data platforms with Kafka and other event-driven systems to support real-time processing and AI-driven responses.

• Manage and enhance Feature Stores to maintain consistency between model training and production environments.

• Implement automated data quality controls and validation processes to guarantee the reliability and accuracy of data used for AI training and inference.

• Establish and uphold data lineage frameworks to ensure traceability, auditability, and regulatory compliance across data workflows.

• Uphold data security, privacy, and governance standards, including the protection of PII and compliance with industry regulations.

• Manage data movement and synchronization across on-premises systems, cloud platforms, and data warehouses.

• Optimize data storage and retrieval strategies for Vector Databases to support high-performance RAG and AI search workloads.

• Collaborate with Data Scientists, ML Engineers, Software Engineers, and business stakeholders to deliver scalable AI data solutions.

⛳️ Requirements

• Over 10 years of experience in Data Engineering or Backend Engineering with a strong emphasis on data platforms and pipelines.

• More than 2 years of hands-on experience supporting AI/ML data pipelines, including data preparation for machine learning and generative AI applications.

• Expert-level proficiency in Python and SQL; familiarity with Java or Scala is a plus.

• Extensive experience in building and maintaining real-time data streaming solutions using Apache Kafka, Flink, or Spark Streaming.

• Practical experience with modern data orchestration and transformation tools such as Airflow, dbt, and Prefect.

• Experience with Vector Databases and Feature Stores for supporting AI and machine learning workloads.

• Strong knowledge of cloud-based data services on AWS, Azure, or GCP, including services like Glue, Kinesis, Data Factory, or Dataflow.

• Experience in deploying and managing data workloads within Kubernetes (K8s) environments.

• Proven experience managing sensitive data within regulated industries such as Fintech, Healthcare, or other compliance-focused environments.

• Strong understanding of best practices related to data quality, governance, security, and privacy.

• A Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a related technical field. Equivalent practical experience will also be taken into account.

• Exceptional problem-solving abilities and the capacity to collaborate effectively with cross-functional engineering, data, and AI teams.

🏝️ Benefits

• Health insurance

• Professional development opportunities

Senior AI Data Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

AI Data Platform Lead

Data Engineer

Senior Product Manager, Events Data Platform

Technical Product Manager – Data Platform

Senior Director, Clinical Data Engineering

Senior Data Engineer

Never miss a great job!