freelance

How Mercor scores evaluator quality, explained

The signals Mercor uses to rate evaluator quality and what to do with each one to climb the rate ladder.

Mercor pays evaluators differently based on a quality score that combines several signals. Understanding each lets you optimise where it matters and ignore the noise. ## Signal 1: inter-rater agreement On calibration sets where ground truth exists or where multiple evaluators rate the same artefact, your agreement rate is measured. Above the threshold (typically high 80s to low 90s percent on most tracks) you advance; below it your rate caps. ## Signal 2: justification depth A rating without explanation gets the minimum credit even if correct. A rating with specific, falsifiable reasoning ("the second response misses the off-by-one error on line 14") gets full credit and unlocks higher-paying batches. ## Signal 3: response speed within reasonable limits Faster acceptance and completion of task invitations is rewarded. Sustained speed beyond ~24 hours per task average can be a red flag for over-rushing; the system optimises for quality, not pure speed. ## Signal 4: task completion rate The percentage of accepted tasks you actually complete. Declining tasks after accepting damages the rate; not accepting in the first place is neutral. ## Signal 5: review feedback from human reviewers Periodically a human reviews your evaluations and notes patterns. Constructive criticism applied in subsequent work raises your score. ## What to do Read every rubric carefully before the first batch in a new track. Write justifications as if a human reviewer will read them (because one will). Accept only tasks you can complete cleanly within the deadline. Treat the first 10 to 20 tasks in any new track as the calibration window. ## What gets you removed Sustained inter-rater agreement below threshold across multiple tracks. Justifications that read as LLM-generated. Accepting and abandoning tasks repeatedly.

Freelance marketplace

Turn your skills into income on Fiverr

Fiverr is the fastest way to start selling tech services to buyers worldwide. List a gig in minutes, set your own prices, and get paid for code, automation, design and AI work.

Start selling on Fiverr →

Get paid by AI labs

Earn $30-$100/hour evaluating AI model outputs

Mercor matches vetted experts (developers, researchers, domain specialists) with paid evaluation work for frontier AI labs. Async, remote, USD payouts. Best fit if you have technical depth and want flexible high-rate side income.

Apply to Mercor →

Frequently asked questions

How does Mercor weight evaluator quality?

A composite of inter-rater agreement, justification depth, completion rate and human review feedback. Inter-rater agreement and justification depth are the two heaviest signals.

Can I improve my Mercor rate without doing more hours?

Yes. Higher justification depth and consistent rubric adherence raise the rate without volume change. Quality compounds; quantity alone does not.