This is a fully remote position, open to applicants in California.

📋 Description

• Take charge of the design, development, and assessment of machine learning, statistical, and graph-based algorithms for entity resolution, identity trust scoring, and anomaly detection on large datasets.

• Design and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to enhance match rates, minimize false positives/negatives, and assist downstream fraud and KYC models.

• Construct and manage scalable data pipelines and feature stores using Spark/PySpark (or Scala), including data normalization, deduplication, and feature computation across extensive PII datasets in AWS/Databricks environments.

• Drive A/B tests and offline/online experimentation for new models, features, and data sources; establish success metrics, design experiments, and ensure thorough validation prior to implementation.

• Assess new internal and external data sources: investigate signal quality, design backtests, quantify incremental value, and provide clear recommendations regarding vendor selection and integration.

• Collaborate closely with product managers and engineers to convert ambiguous business and regulatory requirements (e.g., KYC coverage, watchlist matching) into specific modeling and data roadmaps.

• Deliver in-depth analytical support to Socure’s compliance and regulatory product suite, including investigative analyses, root-cause analysis for anomalies, and clear narratives for internal and external stakeholders.

• Contribute to model governance and documentation: articulate model logic, data dependencies, limitations, and monitoring strategies to internal risk and compliance stakeholders.

• Guide junior data scientists and engineers on best practices in data exploration, feature engineering, experimentation, and code quality.

• Effectively communicate complex technical concepts and trade-offs in a concise, structured manner to both technical and non-technical audiences (e.g., product reviews, customer meetings, internal briefings).

⛳️ Requirements

• Master’s degree with 3+ years of relevant industry experience, or a Ph.D. with 1+ years of experience in applied ML/data science roles; a background in Computer Science, Statistics, Mathematics, or related quantitative fields is preferred.

• Strong expertise in Python (preferred) or Scala, including experience with ML libraries such as scikit-learn, XGBoost, TensorFlow, or PyTorch.

• Extensive experience with Spark or PySpark and distributed data systems (e.g., AWS EMR, Databricks) while handling very large, complex datasets.

• Profound understanding of supervised and unsupervised learning, feature engineering, model evaluation, and experimental design (A/B testing, holdout strategies, stratification).

• Experience in developing production-quality data pipelines and automated workflows using Airflow or similar orchestration tools.

• Practical knowledge of graph databases and/or graph frameworks (Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) along with graph algorithms for clustering, link prediction, and community detection is highly preferred.

• Strong SQL skills and experience with large-scale analytical data stores.

• Experience in at least one of the following: identity verification, fraud detection, credit risk, or related high-stakes domains is a plus.

• Proven ability to lead medium-to-large projects from start to finish, make sound trade-off decisions in ambiguous situations, and influence cross-functional stakeholders using data and clear reasoning.

🏝️ Benefits

• Offers Equity

• Offers Bonus

Senior Data Scientist – Big Data R&D, Identity Graph, KYC

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Collaboration Lead

Lead Data Scientist

Senior Data Scientist, Product Analytics

Head of Data and Insights

Product Manager – MS, Data

Staff Data Scientist – Digital Assets

Never miss a great job!