Remotery

Data Scientist II – Big Data R&D, Identity Graph, KYC

atSocureUS flagCaliforniaFull-timeData ScientistJuniorMid-level$140k – $170k/year

Posted Jun 21

This is a fully remote position, open to applicants in California.

📋 Description

• Assist in the design and execution of machine learning, data mining, statistical, and graph-based algorithms to analyze extensive datasets for identity verification and anomaly detection.

• Examine large datasets to aid in the development and enhancement of entity-resolution and identity-matching algorithms that power Socure’s KYC and compliance solutions.

• Construct and sustain components of data-processing pipelines (ETL, feature generation, normalization) utilizing tools like Spark/PySpark and AWS (e.g., EMR, S3).

• Provide support to senior data scientists with feature engineering, data exploration, error analysis, and A/B test configuration for new models and signals.

• Assist in assessing new third-party and internal data sources: evaluate data quality, design offline experiments, and summarize impacts on coverage and model performance.

• Develop and maintain SQL and Python/R scripts for data extraction, transformation, and validation; participate in code reviews and basic testing.

• Offer analytical support to compliance and regulatory product teams, including ad hoc investigations, simple dashboards, and in-depth data analyses.

• Present findings clearly and structured to colleagues and cross-functional partners (Product, Engineering, Client Analysis), emphasizing key insights and trade-offs.

• Thrive in a dynamic, cross-functional setting; take ownership of well-defined tasks and see them through to completion.


⛳️ Requirements

• Master’s degree with 2+ years of experience, or Ph.D. with 1+ years of experience in a data science or analytics role, or equivalent practical experience.

• Proficient in at least one general-purpose programming language utilized in data science (Python or Scala).

• Strong experience in writing and optimizing SQL for large datasets; comfortable working in data lake/warehouse environments.

• Hands-on experience with Spark or PySpark and common ML libraries (e.g., scikit-learn, XGBoost, TensorFlow/PyTorch is an advantage).

• Familiar with UNIX environments and the AWS ecosystem (e.g., EMR, S3); experience with Databricks is a plus.

• Working knowledge of supervised/unsupervised machine learning and basic statistics (similarity measures, clustering, evaluation metrics).

• Exposure to graph techniques or graph databases (Neo4j, AWS Neptune, GraphFrames) is a significant advantage.

• Bonus: experience with Elasticsearch or DynamoDB; workflow tools such as Airflow for automating data pipelines.

• Capable of breaking down loosely defined problems, asking insightful clarifying questions, and iterating swiftly with feedback.


🏝️ Benefits

• Offers Equity

• Offers Bonus

People also viewed

Zeta Global21 min ago

Data Collaboration Lead

US flagUnited States OnlyFull-timeData Scientist$180k – $200k/year
ApplyView job
Humana1 hour ago

Lead Data Scientist

US flagKentucky, +4 more statesFull-timeData Scientist$142.3k – $195.7k/year
ApplyView job
Binance.US11 hours ago

Senior Data Scientist, Product Analytics

US flagUnited States OnlyFull-timeData Scientist$170k – $195k/year
ApplyView job
10x Genomics11 hours ago

Head of Data and Insights

US flagUnited States OnlyFull-timeData Scientist$318.3k – $430.7k/year
ApplyView job
Dynatron Software, Inc.11 hours ago

Product Manager – MS, Data

US flagUnited States OnlyFull-timeData Scientist$140k – $150k/year
ApplyView job
Circle11 hours ago

Staff Data Scientist – Digital Assets

US flagCalifornia OnlyFull-timeData Scientist$195k – $257.5k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers