
Data Scientist
Posted 4 hours ago

Posted 4 hours ago
This is a fully remote position, open to applicants in Brazil.
• Development of Predictive Models: Create propensity to default models using traditional ML techniques (binary classification, ensemble models such as Random Forest, Gradient Boosting, and equivalents), focusing on precision, recall, and production stability.
• Feature Selection and Engineering: Conduct exploratory data analysis (EDA), identify and select relevant variables from structured contractual data, and progressively incorporate external variables (weather, harvest, macroeconomic scenarios, bankruptcy recovery data, and news sources).
• Experimentation in Databricks: Develop and version modeling experiments using Databricks and MLflow, ensuring traceability of runs, parameters, metrics, and model artifacts throughout iterative cycles.
• Model Validation and Evaluation: Design and execute robust validation strategies (cross-validation, temporal backtesting, score stability analysis) to ensure that models perform reliably over increasing projection windows (3, 6, and 12 months).
• Iterative Improvement Cycles: Actively participate in iterative model refinement cycles—each sprint, incorporate new variables, reassess performance, and document learnings in the program's Knowledge Base.
• Collaboration with Data Engineering: Work closely with Data Engineers to ensure that data pipelines correctly feed the models and that model outputs (scores, projections, alerts) are made available at the appropriate platform layers.
• Communication of Results: Translate technical model results into business language, supporting the Data Strategist in communicating with client stakeholders (superintendencies, credit, leadership).
• Technical Documentation: Document methodologies, modeling decisions, and outcomes in structured formats that contribute to the Knowledge Base and can be leveraged by AI agents in subsequent phases.
• Production Monitoring: Monitor model performance in production, identify distribution deviations (data drift, concept drift), and propose corrective actions or re-training.
• Solid experience in data science with a focus on predictive modeling for production business problems.
• Proven experience with classification and ensemble models (Random Forest, Gradient Boosting, XGBoost, or equivalents) in credit, risk, or anomaly detection contexts.
• Experience with Databricks for model development, experimentation, and versioning (Delta Lake, MLflow, Spark MLlib, or equivalent libraries in a distributed environment).
• Strong expertise in feature selection, handling imbalanced data, and temporal validation strategies for risk models.
• Experience in data analysis and modeling within the AWS ecosystem (S3, Athena, SageMaker, or equivalent managed ML cloud services).
• Ability to communicate model results and limitations to non-technical audiences clearly and with a focus on business decision-making.
• Experience in the financial services sector (credit, risk, delinquency, or similar).
• Health and dental insurance;
• Meal and food vouchers;
• Childcare assistance;
• Extended parental leave;
• Partnerships with gyms and health and wellness professionals via Wellhub (Gympass) TotalPass;
• Profit-sharing (PLR);
• Life insurance;
• Continuous learning platform (CI&T University);
• Discount club;
• Free online platform dedicated to promoting physical, mental health, and well-being;
• Responsible parenting and pregnancy course;
• Partnerships with online course platforms;
• Language learning platform;
• And many more.
Weekday (YC W21)
RATP Dev USA
OnHires
Get handpicked remote jobs straight to your inbox weekly.