
Bolsista Mestre – Cientista de Dados, NLP, Sistemas de Busca Semântica
Posted May 30

Posted May 30
This is a fully remote position, open to applicants in Brazil.
• Text preprocessing and enrichment: Cleaning, tokenization, lemmatization, and noise removal in textual documents
• Generation and management of embeddings: Creating document and query embeddings using models such as Sentence-BERT, OpenAI ada, or similar
• Development of semantic search pipelines, information retrieval, and applications with RAG: Building pipelines that combine: query → embedding → vector search → re-ranking (optional)
• Assessing the quality of retrieval and generated responses in NLP and RAG scenarios, using metrics like recall@k and MRR
• Adaptation and experimentation with models in PyTorch or TensorFlow
• Documentation and versioning: Documenting pipelines, technical decisions, and experimental results
• Ongoing or completed Master's degree
• Computer Science, Computer/Software Engineering, Information Systems, Statistics, Applied Mathematics, Electrical Engineering, Data Science, or related fields in Exact Sciences and Engineering
• Proficiency in Python programming with a strong understanding of typing, data manipulation (pandas/polars), and use of virtual environments
• Practical experience with NLP, including: tokenization, stemming/lemmatization, stopword removal, and vectorization (TF IDF, word2vec, or embeddings)
• Applied knowledge of embeddings, semantic search, and RAG techniques (e.g., Sentence BERT, current embedding models or similar), as well as familiarity with vector databases (e.g., FAISS, ChromaDB, Pinecone, Qdrant)
• Experience with at least one deep learning framework (PyTorch or TensorFlow)
• Understanding of analytical pipelines and experimentation in data science applied to NLP (e.g., extraction → preprocessing → embedding → search/classification → generation with context) and versioning with Git
• Scholarship: R$ 9,000.00
AVENCORE
Smadex
ShipBob, Inc.
Get handpicked remote jobs straight to your inbox weekly.