Senior AI System Engineer

April 24, 2026
Application ends: July 23, 2026
Apply Now

Job Description

The engineer in this role makes the entire team faster without proportionally increasing headcount, and enables systematic accuracy improvement as a repeatable engineering capability rather than an ad-hoc effort.

Job Duties and Responsibilities :

– Design and build agentic evaluation pipelines : error detection – root cause – hypothesis generation – prompt variant testing – A/B measurement – production promotion, with minimal human intervention.

– Own the accuracy measurement infrastructure automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations.

– Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms classification and extraction correction loops, NTP rule generation, performance reporting.

– Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.

– Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real world document types.

– Design and maintain A/B testing infrastructure for prompt and model changes no untested changes go to production.

– Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.

– Optimize LLM costs while maintaining quality: prompt compression, output token minimization, model selection and migration strategies.

– Write production-grade data pipelines with error handling, retries, logging, and monitoring.

– Collaborate with platform engineering and applied research functions on architecture and methodology translation.

– Mentor 1- 2 junior engineers; build tooling and documentation they can operate independently.

Required Qualifications :

– BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.

Experience Range :

– 8- 10 years total; minimum 4- 6 years building production LLM or AI systems; minimum 4- 6 years in evaluation, quality measurement, or accuracy improvement work.

“Must-have” Skills :

– Production-grade Python – clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)

– Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with systematic, measurement-driven prompt engineering methodology over instinct

– Agentic pipeline design multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops

– Evaluation framework design for LLM systems precision/recall/F1, confusion matrices, A/B testing, per class error analysis

– Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type

– MongoDB or equivalent NoSQL queries, aggregations, indexing pandas / numpy for data processing and batch analysis

– Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)

– Clear written communication able to explain model behaviour and accuracy findings to non-technical

stakeholders

“Would-be-nice” Skills :

– Document AI : PDF parsing, layout-aware extraction, OCR, structured form extraction

– RAG pipeline design and vector search (Pinecone, Weaviate, or similar)

– Classification systems with large label spaces (50+ classes)

– Async Python (asyncio, aiohttp) for pipeline throughput

– Embedding models and semantic similarity for document matching

– Prior experience working alongside a Research or Applied Science team as the engineering counterpart

Are you interested in this position?

Apply by clicking on the “Apply Now” button below!

#GraphicDesignJobsOnline

#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru