This is a fully remote position, open to applicants in Brazil.

📋 Description

• Create comprehensive NLP pipelines that include entity extraction, terminological normalization, semantic matching, and clustering;

• Perform exploratory analysis (EDA), assess data quality, completeness, and analytical feasibility on both structured and unstructured datasets;

• Establish modeling strategies that balance the fine-tuning of Transformer models (such as BERTimbau) and the application of LLMs for data extraction and structuring, with explicit criteria for reproducibility, cost-effectiveness, and auditability;

• Develop embedding pipelines, RAG, and semantic search functionalities utilizing vector databases (Qdrant, Milvus, ChromaDB);

• Collaborate with domain experts to fine-tune prioritization scores and detect anomalies using methods like Isolation Forest, Autoencoders, and HDBSCAN;

• Manage version control for experiments and models to ensure traceability and governance;

• Generate comprehensive technical and scientific documentation, including reports and, when appropriate, research papers;

• Serve as the technical liaison with domain experts to confirm criteria, thresholds, and metrics.

⛳️ Requirements

• Bachelor's degree in Data Science, Statistics, Computer Science, or a related discipline;

• Over 5 years of experience working on NLP projects in a production environment, preferably in Portuguese;

• Strong command of Python, along with libraries such as pandas, scikit-learn, and PyTorch (Transformers);

• Practical experience with Transformer models (including BERTimbau and multilingual BERT);

• Experience with Applied Generative AI involving prompt engineering, RAG, structured outputs, embeddings, and tool utilization;

• Familiarity with Hugging Face transformers, spaCy, and sentence-transformers;

• Proficient in using vector databases (Qdrant, Milvus, or ChromaDB) and conducting similarity searches;

• Solid understanding of the CRISP-DM methodology and principles of MLOps (MLflow);

• Ability to effectively communicate technical findings to both technical and non-technical stakeholders;

• Experience deploying open-source LLMs (vLLM, Ollama, TGI, llama.cpp) in on-premise GPU settings;

• Knowledge of GPU orchestration in Kubernetes (including GPU pass-through, MIG, and NVIDIA GPU Operator);

• Publications in scientific forums related to NLP, ML, or applied data science;

• Experience working with free-text datasets that exhibit low standardization and typical natural language data quality issues.

🏝️ Benefits

• Opportunity to work on cutting-edge projects in the field of NLP;

• Collaborative work environment with industry experts;

• Professional development and training opportunities;

• Competitive salary and benefits package.

Senior AI Specialist / Data Scientist

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Scientist – Consulting and Industry

Senior Data Scientist

Senior Data Scientist – Programmatic

Data Scientist III, Forecasting

Cientista de Dados Sênior

Principal Data Science Engineer

Never miss a great job!