Job Description
The engineer in this role makes the entire team faster without proportionally increasing headcount, and enables systematic accuracy improvement as a repeatable engineering capability rather than an ad-hoc effort.
Job Duties and Responsibilities :
– Design and build agentic evaluation pipelines : error detection – root cause – hypothesis generation – prompt variant testing – A/B measurement – production promotion, with minimal human intervention.
– Own the accuracy measurement infrastructure automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations.
– Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms classification and extraction correction loops, NTP rule generation, performance reporting.
– Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.
– Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real world document types.
– Design and maintain A/B testing infrastructure for prompt and model changes no untested changes go to production.
– Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.
– Optimize LLM costs while maintaining quality: prompt compression, output token minimization, model selection and migration strategies.
– Write production-grade data pipelines with error handling, retries, logging, and monitoring.
– Collaborate with platform engineering and applied research functions on architecture and methodology translation.
– Mentor 1- 2 junior engineers; build tooling and documentation they can operate independently.
Required Qualifications :
– BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.
Experience Range :
– 8- 10 years total; minimum 4- 6 years building production LLM or AI systems; minimum 4- 6 years in evaluation, quality measurement, or accuracy improvement work.
“Must-have” Skills :
– Production-grade Python – clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)
– Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with systematic, measurement-driven prompt engineering methodology over instinct
– Agentic pipeline design multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops
– Evaluation framework design for LLM systems precision/recall/F1, confusion matrices, A/B testing, per class error analysis
– Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type
– MongoDB or equivalent NoSQL queries, aggregations, indexing pandas / numpy for data processing and batch analysis
– Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)
– Clear written communication able to explain model behaviour and accuracy findings to non-technical
stakeholders
“Would-be-nice” Skills :
– Document AI : PDF parsing, layout-aware extraction, OCR, structured form extraction
– RAG pipeline design and vector search (Pinecone, Weaviate, or similar)
– Classification systems with large label spaces (50+ classes)
– Async Python (asyncio, aiohttp) for pipeline throughput
– Embedding models and semantic similarity for document matching
– Prior experience working alongside a Research or Applied Science team as the engineering counterpart
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline
#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru
Apply Now