
Data Scientist – RecSys
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Portugal.
• Design, implement, and enhance comprehensive recommendation pipelines, encompassing everything from data ingestion to model inference.
• Construct and sustain scalable ETL pipelines to ensure reliable and efficient data flows.
• Create, assess, and perpetually refine machine learning models for recommendation systems.
• Investigate, prototype, and apply cutting-edge approaches to enhance recommendation quality and influence key business metrics.
• Scale and optimize data and model pipelines to accommodate substantial data volumes and meet real-time or batch processing requirements.
• Integrate multi-modal data (e.g., behavioral, transactional, and contextual signals) from diverse systems into recommendation models.
• Guarantee the robustness and stability of pipelines by implementing unit and integration tests across data, modeling, and deployment workflows.
• Oversee and uphold end-to-end system performance, including data pipelines, model quality, and downstream effects.
• Design and evaluate A/B tests to assess model performance and facilitate data-driven product decisions.
• Create dashboards and observability tools to monitor model metrics, system health, and business KPIs.
• Collaborate closely with Data Engineers, Software Engineers, and stakeholders to deliver scalable, production-ready solutions.
• Proficient in Python with recent production experience, including practical knowledge of data science and machine learning libraries and frameworks (e.g., Pandas, Polars, NumPy, scikit-learn, PyTorch, TensorFlow, JAX, Hugging Face, …).
• Experience in building and deploying end-to-end machine learning systems on cloud AI platforms (Azure, GCP, or AWS), covering ETL pipelines to deployment and monitoring, including model versioning and experiment tracking, supporting both batch and real-time workflows.
• Strong grasp of deep learning-based recommender systems for next-item prediction, along with similar NLP architectures that model sequential patterns and context.
• Proven experience in constructing efficient data transformation pipelines for both transactional (OLTP) and analytical (OLAP) workloads, with extensive knowledge of SQL and NoSQL databases (e.g., PostgreSQL, MySQL, Redshift, Snowflake, BigQuery, MongoDB, Cassandra).
• Familiarity with unit and integration testing (e.g., Pytest), CI/CD pipelines, and Docker-based containerization.
• Experience in developing large-scale recommender systems (e.g., candidate generation, ranking, retrieval, personalization).
• A history of publications in deep learning at pertinent conferences or journals.
• Experience with Azure Data Factory / AWS Glue / Google Cloud Dataflow.
• Background in designing and analyzing A/B tests, with a solid understanding of relevant evaluation metrics.
• Experience in designing and implementing metadata-driven pipelines to scale automated A/B testing systems.
• Expertise in developing multi-modal models that unify various data types (e.g., text, images, audio).
• Experience applying transformer-based models or large language models (LLMs) to recommendation or personalization tasks.
• Knowledge of distributed training, including data parallelism and model parallelism.
• Familiarity with distributed data processing and big data technologies (e.g., Spark, Hadoop, Flink, Kafka, Hive, Presto, Databricks).
• Competitive compensation based on your experience and contributions.
• Opportunities for both professional and personal development.
• Engage with state-of-the-art machine learning infrastructure and systems at scale.
• Opportunities to contribute to open-source projects and remain active in the ML community.
• Chance to make a measurable and visible impact within a large-scale organization.
• Flexible working hours and a remote-friendly setup.
Arch Global Services (Philippines) Inc.
AVENCORE
Get handpicked remote jobs straight to your inbox weekly.