Job Description
Role Overview
Help design and evaluate autonomous AI agents across multiple LLMs, spanning health, education, daily life, and other real-world domains (all coding work). Shape the future of agentic AI systems by providing expert human feedback to leading AI organisations. Help train Large Language Models (LLMs) for complex, multi-step architectural workflows.
Key Responsibilities
AI Agent Evaluation
- Write evaluation rubrics with objective pass/fail criteria
- Debug agent traces to identify failure patterns
- Stress test agents against edge cases, prompt injection, and tool misuse
Technical Assessment
- Assess production-grade modular software architecture
- Analyse multi-turn system interactions and behaviours
- Provide high-density technical feedback for LLM training
Project Workflow
- Create an account and upload a resume/ID
- Complete the onboarding assessment
- Start earning through flexible task assignments
Qualifications
- Experience in backend engineering, AI automation, or complex systems integration
- Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting)
- Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases
- Practical experience building for live, non-mocked environments and handling multi-turn system interactions
Preferred (Nice to Have)
- Experience integrating agents with live tools such as Supabase, Gmail, and other APIs
- Familiarity with persistent state and session-tracking patterns
- Experience identifying privacy leaks, authority escalation, or indirect prompt injection vulnerabilities
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline
#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru
Apply Now