
Senior AI Specialist / Data Scientist
Posted May 22

Posted May 22
This is a fully remote position, open to applicants in Brazil.
• Create comprehensive NLP pipelines that include entity extraction, terminological normalization, semantic matching, and clustering;
• Perform exploratory analysis (EDA), assess data quality, completeness, and analytical feasibility on both structured and unstructured datasets;
• Establish modeling strategies that balance the fine-tuning of Transformer models (such as BERTimbau) and the application of LLMs for data extraction and structuring, with explicit criteria for reproducibility, cost-effectiveness, and auditability;
• Develop embedding pipelines, RAG, and semantic search functionalities utilizing vector databases (Qdrant, Milvus, ChromaDB);
• Collaborate with domain experts to fine-tune prioritization scores and detect anomalies using methods like Isolation Forest, Autoencoders, and HDBSCAN;
• Manage version control for experiments and models to ensure traceability and governance;
• Generate comprehensive technical and scientific documentation, including reports and, when appropriate, research papers;
• Serve as the technical liaison with domain experts to confirm criteria, thresholds, and metrics.
• Bachelor's degree in Data Science, Statistics, Computer Science, or a related discipline;
• Over 5 years of experience working on NLP projects in a production environment, preferably in Portuguese;
• Strong command of Python, along with libraries such as pandas, scikit-learn, and PyTorch (Transformers);
• Practical experience with Transformer models (including BERTimbau and multilingual BERT);
• Experience with Applied Generative AI involving prompt engineering, RAG, structured outputs, embeddings, and tool utilization;
• Familiarity with Hugging Face transformers, spaCy, and sentence-transformers;
• Proficient in using vector databases (Qdrant, Milvus, or ChromaDB) and conducting similarity searches;
• Solid understanding of the CRISP-DM methodology and principles of MLOps (MLflow);
• Ability to effectively communicate technical findings to both technical and non-technical stakeholders;
• Experience deploying open-source LLMs (vLLM, Ollama, TGI, llama.cpp) in on-premise GPU settings;
• Knowledge of GPU orchestration in Kubernetes (including GPU pass-through, MIG, and NVIDIA GPU Operator);
• Publications in scientific forums related to NLP, ML, or applied data science;
• Experience working with free-text datasets that exhibit low standardization and typical natural language data quality issues.
• Opportunity to work on cutting-edge projects in the field of NLP;
• Collaborative work environment with industry experts;
• Professional development and training opportunities;
• Competitive salary and benefits package.
AVENCORE
Smadex
ShipBob, Inc.
Get handpicked remote jobs straight to your inbox weekly.