This is a fully remote position, open to applicants in Singapore.

📋 Description

• Take charge of data exploration and analysis on extensive financial crime datasets — encompassing sanctions, PEP (Politically Exposed Persons), and adverse media data — to reveal patterns, pinpoint false positives/negatives, and enhance feature developments.

• Create and assess agents and rule-based models by conducting experiments, validating hypotheses, and optimizing thresholds to boost alert efficiency.

• Design and implement production-ready API integrations — collaborating with software engineers and product teams to ensure seamless integration, testing, and merging of components.

• Execute customer-centric data studies across various enterprise clients (e.g., financial institutions) to evaluate model performance, analyze data quality, and propose data-driven solutions to minimize investigation workloads.

• Rapidly prototype and iterate — employing PySpark, Jupyter notebooks, and Python to analyze data, construct reproducible pipelines, and generate insights that guide product decisions.

• Investigate and resolve product-related issues in partnership with engineering and product teams.

• Contribute to R&D on innovative techniques — including graph-based methods (GNNs, graph embeddings) for transaction monitoring, LLM-based feature exploration, and RAG-based models.

• Clearly articulate findings through well-structured Jupyter notebooks, internal documentation, and presentations for stakeholders, transforming complex analytical results into actionable business insights.

⛳️ Requirements

• Bachelor's or Master's degree in Data Science, Computer Science, Statistics, or a related discipline.

• At least 3 years of practical experience in delivering data science projects, preferably in financial crime compliance, name screening, or AML/KYC sectors.

• Strong expertise in Python (data manipulation, modeling, and pipeline development) and SQL/Spark SQL for large-scale data querying and transformation.

• Practical experience with PySpark or comparable distributed data platforms.

• Understanding of NLP techniques and entity resolution concepts.

• Familiarity with LLMs or RAG-based models for tasks related to information extraction or classification is advantageous.

• Robust knowledge of data quality assessment, including profiling, anomaly detection, and merging logic across complex multi-source datasets.

• Proficient in Git, Docker, Linux, and collaborative development environments (including code reviews and pull requests).

• Strong analytical and problem-solving abilities — capable of investigating ambiguous data issues, formulating hypotheses, and rigorously validating findings.

• Effective communication skills — adept at documenting findings in a structured and reproducible way (Jupyter notebooks, Confluence) and presenting results clearly to both technical and non-technical audiences.

• A proactive mindset of ownership and curiosity: you take initiative, ask pertinent questions, and follow through to ensure delivery.

🏝️ Benefits

• 100% remote work forever.

• Paid Development Days.

• In-house training.

• Lunch-and-learns.

Data Scientist

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Scientist – Consulting and Industry

Senior Data Scientist

Senior Data Scientist – Programmatic

Data Scientist III, Forecasting

Cientista de Dados Sênior

Principal Data Science Engineer

Never miss a great job!