
Data Scientist
Posted May 23

Posted May 23
This is a fully remote position, open to applicants in Singapore.
• Take charge of data exploration and analysis on extensive financial crime datasets — encompassing sanctions, PEP (Politically Exposed Persons), and adverse media data — to reveal patterns, pinpoint false positives/negatives, and enhance feature developments.
• Create and assess agents and rule-based models by conducting experiments, validating hypotheses, and optimizing thresholds to boost alert efficiency.
• Design and implement production-ready API integrations — collaborating with software engineers and product teams to ensure seamless integration, testing, and merging of components.
• Execute customer-centric data studies across various enterprise clients (e.g., financial institutions) to evaluate model performance, analyze data quality, and propose data-driven solutions to minimize investigation workloads.
• Rapidly prototype and iterate — employing PySpark, Jupyter notebooks, and Python to analyze data, construct reproducible pipelines, and generate insights that guide product decisions.
• Investigate and resolve product-related issues in partnership with engineering and product teams.
• Contribute to R&D on innovative techniques — including graph-based methods (GNNs, graph embeddings) for transaction monitoring, LLM-based feature exploration, and RAG-based models.
• Clearly articulate findings through well-structured Jupyter notebooks, internal documentation, and presentations for stakeholders, transforming complex analytical results into actionable business insights.
• Bachelor's or Master's degree in Data Science, Computer Science, Statistics, or a related discipline.
• At least 3 years of practical experience in delivering data science projects, preferably in financial crime compliance, name screening, or AML/KYC sectors.
• Strong expertise in Python (data manipulation, modeling, and pipeline development) and SQL/Spark SQL for large-scale data querying and transformation.
• Practical experience with PySpark or comparable distributed data platforms.
• Understanding of NLP techniques and entity resolution concepts.
• Familiarity with LLMs or RAG-based models for tasks related to information extraction or classification is advantageous.
• Robust knowledge of data quality assessment, including profiling, anomaly detection, and merging logic across complex multi-source datasets.
• Proficient in Git, Docker, Linux, and collaborative development environments (including code reviews and pull requests).
• Strong analytical and problem-solving abilities — capable of investigating ambiguous data issues, formulating hypotheses, and rigorously validating findings.
• Effective communication skills — adept at documenting findings in a structured and reproducible way (Jupyter notebooks, Confluence) and presenting results clearly to both technical and non-technical audiences.
• A proactive mindset of ownership and curiosity: you take initiative, ask pertinent questions, and follow through to ensure delivery.
• 100% remote work forever.
• Paid Development Days.
• In-house training.
• Lunch-and-learns.
AVENCORE
Smadex
ShipBob, Inc.
Get handpicked remote jobs straight to your inbox weekly.