01 // AI Team
Senior Data Scientist - Publicis Groupe
Working on Spirax Sarco's MiM platform, an enterprise AI assistant for internal knowledge retrieval, support, and document-based question answering across large technical document corpora.
- Led LLM observability for the MiM platform, implementing Langfuse trace instrumentation and helping the engineering team use traces for debugging, QA, latency analysis, and root-cause investigation.
- Built an end-to-end evaluation system for MiM, including golden datasets, LLM-as-judge scoring, question-level observability reports, KPI dashboards, and benchmark exports for PowerBI/Azure SQL.
- Designed MiM vs Microsoft 365 Copilot benchmarks using isolated per-question conversations, comparing answer quality, grounding, latency, failures, and user-facing behavior across STS and WM business units.
- Generated a traceable synthetic QA dataset from ~5,800 PDFs and ~60k pages, producing ~30,000 QA pairs with lineage from business unit -> category -> document -> page -> evidence block.
- Developed corpus processing and OCR pipelines with resumable page-level registries, Azure-hosted Mistral Document AI extraction, text-quality profiling, and recovery workflows for failed or oversized PDFs.
- Optimized full-corpus ingestion from roughly one week to ~4 hours, enabling faster iteration on retrieval, evaluation, and product-quality experiments.
- Ran retrieval benchmark sweeps across chunking strategies, embedding models, FAISS dense search, BM25, hybrid RRF, MMR, and hierarchical retrieval; reported retrieval quality, context precision, context waste, and token-cost trade-offs.
- Built internal QA and benchmark inspection UIs so product owners, SMEs, and engineers could review generated questions, source evidence, MiM/Copilot answers, judge scores, and failure rationales.
- Investigated bounded agentic retrieval for MiM workflows, showing stronger source coverage and answer quality than single-retrieval baselines for multi-document sales and technical support scenarios.