
Bolsista Mestre – Cientista de Dados, NLP, Sistemas de Busca Semântica
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Brazil.
• Pre-processing and enrichment of texts: Cleaning, tokenization, lemmatization, and removal of noise in textual documents.
• Generation and management of embeddings: Create document and query embeddings using models such as Sentence-BERT, OpenAI Ada, or similar.
• Development of semantic search pipelines, information retrieval, and applications with RAG: Build pipelines that integrate: query → embedding → vector search → re-ranking (optional) → utilization of retrieved context in applications with RAG.
• Adaptation and experimentation with models in PyTorch or TensorFlow: Employ pre-trained models and tailor them for specific tasks.
• Documentation and version control: Document pipelines, technical decisions, and experiment results.
• Master’s degree in progress or completed.
• Fields of study include Computer Science, Computer/Software Engineering, Information Systems, Statistics, Applied Mathematics, Electrical Engineering, Data Science, or related areas within Exact Sciences and Engineering.
• Proficient in Python programming, including typing, data manipulation (pandas/polars), and the use of virtual environments.
• Practical experience with NLP, encompassing: tokenization, stemming/lemmatization, stopword removal, and vectorization (TF IDF, word2vec, or embeddings).
• Applied knowledge of embeddings, semantic search, and RAG techniques (e.g., Sentence BERT, contemporary embedding models or similar), as well as familiarity with vector databases (e.g., FAISS, ChromaDB, Pinecone, Qdrant).
• Experience with at least one deep learning framework (PyTorch or TensorFlow).
• Understanding of analytical pipelines and experimentation in data science applied to NLP (e.g., extraction → pre-processing → embedding → search/classification → context generation) and version control using Git.
• Experience with generative models (LLMs), prompt engineering, and response evaluation will be a significant advantage.
• Knowledge of relational databases (PostgreSQL) and NoSQL databases (MongoDB, Redis).
• Experience in real-world projects (academic or professional) with documentation and testing.
• Comprehensive benefits package including health insurance, retirement plans, and professional development opportunities.
• Flexible working hours and remote work options.
• Collaborative and innovative work environment fostering creativity and growth.
AVENCORE
Smadex
ShipBob, Inc.
Get handpicked remote jobs straight to your inbox weekly.