Gender-neutral job evaluation: how to build a system that complies with the EU Pay Transparency Directive

DIRECTIVE 2023/970

UPDATED FEBRUARY 2026

A practitioner's guide to building a job evaluation system that satisfies Article 4 of the EU Pay Transparency Directive: the four mandatory factors, the bias traps that disqualify most existing schemes, and a step-by-step build process.

150+ employees in your company?
You have
days left

until the law takes effect
and the first reports are due

23 / 27
Countries with active progress
as of today,

TL;DR

  • Article 4(1) of the EU Pay Transparency Directive requires employers to use gender-neutral job evaluation and classification systems built on objective criteria.
  • The four mandatory factors are skills, effort, responsibility, and working conditions. Soft skills cannot be undervalued, per Article 4(4).
  • Most existing grading systems will not pass a directive-aligned review. They were built before gender-neutral evaluation was a legal requirement.
  • Gender-neutral evaluation is the upstream dependency for every other directive obligation: reporting categories under Article 9(1)(g), joint pay assessments under Article 10, and the burden-of-proof shift under Article 18(2).
  • The European Commission and EIGE published a step-by-step toolkit in March 2026. It is a useful methodology starter, but it stops short of integration and maintenance.
  • Watch for four bias traps: the lifting asymmetry, undervalued communication skills, unrepresentative benchmark roles, and soft skills treated as personality traits.
  • Build the system before the audit. Categories defined after the analysis will not hold up under regulatory review.

Article 4 of the EU Pay Transparency Directive looks simple on paper. Use gender-neutral criteria, apply them objectively, and make sure soft skills are not undervalued.

However, most companies discover the problem only when the pay equity audit produces results no one can defend. The grading scheme was built years before anyone asked whether "communication skills" were coded the same way for sales reps and customer service teams. The benchmark roles were drawn from engineering, where most employees were men. The job descriptions used different language for similar work, depending on who wrote them.

This article is for HR directors, benefits & compensation leads, and job architecture owners building toward EU Pay Transparency Directive compliance. It covers what Article 4 actually requires, why most existing schemes fail it, the four bias traps to watch for, and a step-by-step path to a system that holds up. For the full scope of directive obligations, the requirements guide provides more context.

What gender-neutral job evaluation requires under the directive

The directive's foundation is in Article 4(1): Member States must ensure employers have pay structures that guarantee equal pay for equal work or work of equal value. The structures themselves are not optional. They have to enable comparison.

Article 4(2) tells Member States to make analytical tools and methodologies available so employers can build gender-neutral job evaluation and classification systems. Article 4(3) authorizes the European Commission, in consultation with EIGE, to publish Union-wide guidelines. The March 2026 EIGE toolkit is the current version of those guidelines.

The core requirement sits in Article 4(4). Pay structures must enable assessment on the basis of objective, gender-neutral criteria agreed upon with workers' representatives where such representatives exist. The criteria must include skills, effort, responsibility, working conditions, and any other factor relevant to the specific job or position. They must be applied in an objective, gender-neutral manner. The directive then adds a sentence that does a lot of work: "In particular, relevant soft skills shall not be undervalued."

That sentence rules out many existing grading schemes. If your job evaluation does not formally measure soft skills and apply them across roles, the system is not gender-neutral. It is gender-coded.

Why this matters for everything else

Article 4 is upstream of the rest of the directive. Worker categories under Article 9(1)(g), the categories you report pay gaps against, must be built using the criteria from Article 4(4).

The pay equity audit runs on these categories. The joint pay assessment under Article 10 re-examines them when a 5% gap goes unjustified. Article 18(2) shifts the burden of proof to employers who have failed to meet their transparency obligations. A defensible evaluation is the documented evidence that prevents that shift from biting.

Skip the foundation, and every downstream obligation produces results that cannot be explained.

Why most existing grading systems fail (the bias traps)

Most pay structures in use today were not designed to be gender-neutral. They were designed to quickly slot people into bands. That history shows up in four specific patterns.

The lifting asymmetry

Warehouse roles often score points for physical effort, childcare and elderly care roles often do not, even though lifting children, supporting adults on transfers, and standing for full shifts are the same physical demands. The directive's working conditions and effort factors are supposed to capture both. In practice, schemes built around traditionally male-coded work tend to ignore physical effort in caring professions.

If your scheme awards effort points for warehouse work but not for nursery or elder care of similar physical intensity, the evaluation is not gender-neutral.

Communication coded differently in different roles

Communication skills are the most common factor employers say they evaluate. They are also one of the most inconsistently applied.

In sales and management roles, communication is graded as a high-level skill: persuasion, negotiation, and influence. In customer service or reception roles, the same skill is often coded as a baseline requirement, worth no points beyond the entry-level grade.

The work is comparable. The grading is not. Article 4(4) requires criteria to be applied in an objective, gender-neutral manner. A factor cannot be a senior skill in one job family and a junior skill in another.

Benchmark roles drawn from the wrong sample

Job evaluation schemes are calibrated on benchmark roles: a representative selection used to set the factor weights and grade boundaries. If the sample is drawn mostly from male-dominated functions, the resulting weights tilt toward the demands of those roles. Female-dominated functions then get evaluated on a scale that was never designed to measure their work.

The EIGE guidance is explicit on this point. Benchmark selection has to reflect the actual mix of work across the organization, including a representative sample of female-coded roles, even when those roles are a minority of headcount.

Soft skills treated as personality traits

This is the biggest trap. Emotional labor, conflict de-escalation, customer empathy, and holding boundaries with difficult clients show up in performance reviews as personality traits ("She's so patient with the team") rather than as evaluation factors with weights and grades.

The directive does not allow this. Article 4(4) states explicitly that "relevant soft skills shall not be undervalued." If soft skills are decorative comments rather than scored factors, they are by definition undervalued.

Skills

Skills cover what the role requires the person to know and do. The EIGE toolkit breaks it into knowledge (formal qualifications and technical expertise), interpersonal skills (communication, negotiation, and conflict management), problem-solving (analytical reasoning and judgment), planning and organizational skills, and physical skills, as applicable to the job. Skills are not only technical. A scheme that scores qualifications and tools but ignores interpersonal demands fails the soft-skills test in Article 4(4).

Effort

Effort covers mental, emotional, and physical demands. Mental effort is concentration, judgment under pressure, or multitasking. Emotional effort is the demand of managing difficult interactions, holding professional composure, and absorbing the emotional state of others. Physical effort is the body's contribution: standing, lifting, repetitive motion, and manual dexterity.

The most common failure here is treating emotional effort as not measurable, when in fact it is. Roles with high emotional demand (care work, complaint handling, intensive client management) earn effort points.

Responsibility

Responsibility covers accountability across four domains: people (line management, mentoring, and safeguarding), resources (equipment, materials, and budgets), information (handling confidential or sensitive data), and financial impact (decisions that move money, even indirectly). Responsibility scales with the magnitude of what the role is accountable for, not with the role's title.

Working conditions

Working conditions refer to the physical and psychological environment of the workplace. Physical conditions include exposure to noise, temperature, weather, and hazardous substances. Psychological conditions include exposure to stress, conflict, isolation, or unpredictability in workload.

The directive lets you add "any other factor relevant to the specific job or position" beyond the four. Performance is the most common addition. The role of performance data in pay justification covers how it integrates. Additional factors must be applied in the same objective, gender-neutral way as the mandatory four.

How to build a gender-neutral job evaluation system

A defensible system takes the eight steps below. None can be skipped. None can be done quickly without paying for it later.

Step 1: Set up an evaluation committee

The committee designs the scheme, selects benchmark roles, and signs off on grade decisions. Article 4(4) requires criteria to be agreed with workers' representatives where they exist. A mixed committee that includes worker representation is also the simplest defense against the bias traps above. A committee composed solely of HR and senior managers tends to reproduce the assumptions that made the existing scheme uneven in the first place.

Step 2: Choose a methodology

Three families of methodologies exist. Point-factor analytical schemes score each role on weighted factors and subfactors, producing a numeric total for each role. Classification schemes write grade descriptions and slot roles into the best-fit grade. Ranking schemes order roles relative to each other based on overall value.

Point-factor is the most defensible under the directive because it produces transparent, factor-by-factor scores that survive an audit. Classification schemes can work but require very precise grade descriptions. Ranking schemes rarely meet the objectivity test in Article 4(4).

Step 3: Define and weight factors

The four mandatory factors are broken down into sub-factors. Each sub-factor needs a definition, a scale (commonly five to seven levels), and a weight. The weights set how much each factor contributes to the total score.

Weights are where bias enters the system. If physical effort carries a high weight and emotional effort a low one, female-dominated care roles will score lower than male-dominated manual roles even when the work is of equivalent demand. Weighting decisions should be made transparently, justified in writing, and reviewed by the full committee.

Step 4: Select benchmark roles

Pick fifteen to thirty roles that represent the full range of work in the organization. The selection has to cover every job family, every level, and a representative mix of male- and female-dominated work. If your organization has a few highly female-coded roles, include them in the benchmarks even when they are a minority. Otherwise, the scheme will not be calibrated to evaluate them fairly.

Step 5: Write gender-neutral job descriptions

Job descriptions feed the evaluation. If they are written in gendered or inconsistent language, the evaluation inherits the bias.

Two practices help. First, use a standard template that prompts the writer to explicitly describe each of the four factors. Second, audit existing descriptions side by side: read a male-coded role and a female-coded role of similar level. If the language varies in tone, formality, or coverage of factors, rewrite both to the same standard.

Step 6: Evaluate roles and grade them

Run every benchmark role through the scheme. Score each factor independently. Discuss disagreements on the committee. Document the final score and the reasoning. Once benchmarks are graded, slot the remaining roles based on comparisons with benchmarks of similar scores.

Step 7: Test for bias

A finished scheme has to be tested before it is rolled out. The test: compare scores across male-coded and female-coded roles that, by external benchmarking or expert judgment, are equivalent work. Where the scores diverge, ask why. Sometimes the divergence is real, because one role demands more responsibility than the other. Often, it reveals a bias trap that has to be rebalanced.

Document the test, the divergences found, and the corrections made. That record is the contemporaneous evidence the burden-of-proof shift in Article 18(2) makes valuable.

Step 8: Link grades to pay structure

The final step is connecting grades to salary bands. Two roles in the same grade should be in the same pay band. Movement between bands has to be explainable in objective, gender-neutral terms: most often, performance, experience, or specialized skills. Without this link, the evaluation is theoretical. With it, every pay decision becomes traceable to a documented grade.

The EIGE toolkit: useful, but only a starting point

The European Commission and EIGE published an updated step-by-step toolkit in March 2026, authorized by Article 4(3) of the directive. The toolkit offers three calibrated methodologies for micro-organizations (with fewer than 10 employees), small and medium-sized employers, and large employers.

We recommend using it because it is the most authoritative reference on methodology available, written by the people advising the commission.

However, be aware of its limits. The toolkit covers methodology design and committee setup. It does not cover how evaluation outputs feed compensation bands, how they connect to performance records, or how the system is maintained as the organization changes. It assumes you will work all of that out separately.

For organizations using disconnected systems for pay, performance, and HR records, working it out separately is the whole problem.

Linking job evaluation to pay reporting and equal-value comparisons

The reporting obligations in Article 9 require employers to publish gender pay gaps "by categories of workers," not by job title. Two people with different titles may be doing equivalent work and must be compared. Article 9(1)(g) makes this explicit: the gap is reported by categories built on the gender-neutral criteria from Article 4(4).

That means worker categories cannot be invented during the audit. They have to be built before the audit, based on a documented job evaluation. Categories assigned during analysis, based on whatever groupings are most convenient at the time, will not hold up under regulatory review. The pay equity audit guide covers what happens when categories are wrong.

The same point applies to the joint pay assessment. Article 10(4) requires the assessment to include an analysis of the existing gender-neutral job evaluation and classification systems. If no such system exists, one has to be established as part of the remedial process. Building it under the time pressure of an active joint assessment is much harder than building it in advance.

Keeping the system defensible over time

A job evaluation built once and filed away will be out of date within 18 months. New roles appear. Existing roles change shape. Promotions move people between grades. The directive does not include a separate clause on maintenance, but Article 4(4)'s requirement that criteria be applied consistently is a continuous obligation, not a one-time obligation.

The operational question is straightforward. How do you make sure every new role is evaluated against the same factors, every promotion is justified on the same scale, and every salary band still reflects the underlying grades, without rebuilding the system every year?

For most companies, the honest answer today is they do not. The evaluation lives in a spreadsheet. The compensation system lives somewhere else. The performance records live in a third system. By the time a regulator or works council asks for evidence, reconstructing the linkages can take weeks.

The realistic alternative is an evaluation system that lives inside the same platform as compensation, performance, and role records.

How Mirro supports gender-neutral job architecture

Job architecture is one of Mirro's core capabilities. Pay structures, grading, and salary bands live in the same system as performance reviews, objectives, and engagement data. That matters for gender-neutral evaluation because the four-factor scoring is only useful when it remains connected to the work it is supposed to justify.

Several features of Mirro's self-service performance management map directly onto the directive's evaluation requirements.

The Question Library

The Question Library assigns a unique item ID to each evaluation factor and sub-factor. The same "communication skills" item, with the same definition and scale, applies to every role in every department. Rephrasing the question for a specific form is allowed (sales communication may sound different from customer service communication), but the underlying item ID stays stable. That is the technical answer to the bias trap where the same skill is graded differently across job families. The directive's "objective gender-neutral manner" requirement under Article 4(4) survives because the scoring scale does not change between roles.

The Audience Builder

The Audience Builder lets administrators define worker categories using metadata: role, level, location, department, and tenure. These intersections become the categories of workers used for pay reporting under Article 9(1)(g). The same categories that drive evaluation also drive reporting, so the upstream and downstream sides of the directive stay aligned.

Perspective Selection

Perspective Selection lets evaluation pull input from the employee, the manager, or both. Multi-perspective evaluation is the most reliable defense against single-rater bias, one of the bias traps named earlier.

Continuous Performance automation

Continuous performance automation answers the maintenance question. A check-in can be configured to trigger a set number of months automatically after an employee's hire date, then recur on a defined cycle. Promotions, role changes, and anniversaries become re-evaluation events without HR having to schedule them by hand. The evaluation stays current with the organization rather than becoming outdated.

Visibility and reporting controls

Visibility and reporting controls, plus real-time dashboards, make the audit trail instantly retrievable. When a labor inspectorate or works council asks for the basis of a pay decision, the answer is in the system, not in a spreadsheet someone built two years ago.

The operational impact

For small employers, the operational impact is measurable. Mirro estimates two to twenty hours saved per evaluation cycle compared with manual configuration. That is the practical version of "compliance without building an HR department." For larger employers, the same mechanics produce consistency at scale: the same evaluation logic applied across hundreds of roles, documented in a form a regulator can follow.

See how Mirro's job architecture supports gender-neutral evaluation under the directive.

Frequently asked questions

  • Does my company need to adopt gender-neutral job evaluation if we are under 100 employees?
  • Yes. Article 4(1) applies to all employers, regardless of size. The reporting obligations in Article 9 only apply to companies with 100 or more employees, but the underlying requirement to have gender-neutral pay structures has no headcount threshold. Smaller employers can use the EIGE toolkit's lighter methodology, but they cannot skip the obligation.

  • Can we keep our existing grading scheme if we audit it for bias?
  • Sometimes. If the scheme already measures skills, effort, responsibility, and working conditions on documented, consistent scales, a bias audit may be sufficient to bring it into compliance. If it does not measure all four factors, or if soft skills are absent or treated as personality traits, an audit alone will not close the gap. A rebuild is the more honest path.

  • Is the EIGE toolkit mandatory?
  • No. Article 4(3) authorizes the EU Commission to publish guidelines; it does not require employers to use them. The toolkit is the most authoritative methodology available, but any methodology that meets the criteria in Article 4(4) is acceptable. Most employers find the toolkit a useful starting point rather than a complete answer.

  • What if our country has not transposed the directive yet?
  • Article 4(1) becomes legally enforceable in your country when transposition occurs. Member States had until 7 June 2026 to transpose the directive. Where transposition is delayed, the directive's principles remain the standard for regulators, and courts will apply them once national law catches up. The compliance timeline and checklist cover the deadlines and country-by-country status. Companies that wait for full transposition typically discover that the work they postponed takes longer to do under deadline pressure.

  • How often should we re-evaluate roles?
  • There is no fixed cadence in the directive. The practical standard is any time a role changes shape, on every promotion or material restructure, and on a periodic cycle for the wider scheme, typically every two to three years for the full review. Continuous re-evaluation as part of the normal role lifecycle is more defensible than batch re-evaluation every few years.

Conclusion

Gender-neutral job evaluation is the foundation on which the rest of the directive sits. The pay equity audit, the reporting categories, the joint pay assessment, and the burden-of-proof shield—all of them depend on a defensible answer to a single question. Did you build the categories you are reporting on using objective, gender-neutral criteria applied consistently?

The companies that take this seriously now will have an audit trail when regulators or work councils start asking. The companies that wait will find themselves rebuilding the foundation while the audit is already running.

Book a demo to see how Mirro helps companies become fully compliant with the EU Pay Transparency Directive.