This is a fully remote position, open to applicants in California.

📋 Description

• Assist in the design and execution of machine learning, data mining, statistical, and graph-based algorithms to analyze extensive datasets for identity verification and anomaly detection.

• Examine large datasets to aid in the development and enhancement of entity-resolution and identity-matching algorithms that power Socure’s KYC and compliance solutions.

• Construct and sustain components of data-processing pipelines (ETL, feature generation, normalization) utilizing tools like Spark/PySpark and AWS (e.g., EMR, S3).

• Provide support to senior data scientists with feature engineering, data exploration, error analysis, and A/B test configuration for new models and signals.

• Assist in assessing new third-party and internal data sources: evaluate data quality, design offline experiments, and summarize impacts on coverage and model performance.

• Develop and maintain SQL and Python/R scripts for data extraction, transformation, and validation; participate in code reviews and basic testing.

• Offer analytical support to compliance and regulatory product teams, including ad hoc investigations, simple dashboards, and in-depth data analyses.

• Present findings clearly and structured to colleagues and cross-functional partners (Product, Engineering, Client Analysis), emphasizing key insights and trade-offs.

• Thrive in a dynamic, cross-functional setting; take ownership of well-defined tasks and see them through to completion.

⛳️ Requirements

• Master’s degree with 2+ years of experience, or Ph.D. with 1+ years of experience in a data science or analytics role, or equivalent practical experience.

• Proficient in at least one general-purpose programming language utilized in data science (Python or Scala).

• Strong experience in writing and optimizing SQL for large datasets; comfortable working in data lake/warehouse environments.

• Hands-on experience with Spark or PySpark and common ML libraries (e.g., scikit-learn, XGBoost, TensorFlow/PyTorch is an advantage).

• Familiar with UNIX environments and the AWS ecosystem (e.g., EMR, S3); experience with Databricks is a plus.

• Working knowledge of supervised/unsupervised machine learning and basic statistics (similarity measures, clustering, evaluation metrics).

• Exposure to graph techniques or graph databases (Neo4j, AWS Neptune, GraphFrames) is a significant advantage.

• Bonus: experience with Elasticsearch or DynamoDB; workflow tools such as Airflow for automating data pipelines.

• Capable of breaking down loosely defined problems, asking insightful clarifying questions, and iterating swiftly with feedback.

🏝️ Benefits

• Offers Equity

• Offers Bonus

Data Scientist II – Big Data R&D, Identity Graph, KYC

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Collaboration Lead

Lead Data Scientist

Senior Data Scientist, Product Analytics

Head of Data and Insights

Product Manager – MS, Data

Staff Data Scientist – Digital Assets

Never miss a great job!