Skip to main content
Versions
no_ci_v1
- Description: Assessment version without any Confidence Intervals (CIs).
ci_v2
- Description: Assessment version introducing Confidence Intervals (CIs) for each skill, question and interview level.
pf_domain_v3
- Description: Assessment version with Pass/Fail outcome and grouping of domains and Confidence Intervals (CIs) for each skill, question and interview level.
- Features:
- Structured by assessment domains.
- Role-based Pass/Fail scoring for improved candidate filtering.
- Domain-level scoring with weighted importance.
- Core domain coverage requirement for passing.
pf_domain_3p_v4
- Description: Most recent version with Pass/Fail, domain grouping, and Confidence Intervals (CIs) for each skill, question and interview level.
- Features:
- Assessment is grouped by domains.
- Role-based Pass/Fail outcome for each candidate.
- Enhanced with new tweak for better candidate evaluation.
- Domain-level scoring with weighted importance.
- Core domain coverage requirement for passing.
Role-Based Pass/Fail Scoring Approach
The Pass/Fail versions (pf_domain_v3 and pf_domain_3p_v4) use a role-based scoring approach that helps determine whether a candidate passes or fails an interview based on the skills relevant to the role. This methodology consists of three main steps:
Step 1: Map Skills to Role Domains
-
Define the role – For example, Backend Engineer.
-
Create a mapping between skills and domains:
- Group related skills into domains (e.g., Data & DB – SQL, NoSQL, Database Design)
- Assign a domain type:
- Core – Essential skills for the role
- Secondary – Important but not critical
- Optional – Nice-to-have skills
- Assign a domain weight to reflect importance (determined by subject-matter experts):
- Core → higher weight
- Secondary → medium weight
- Optional → lower weight
Note: Weights are scaled so that the sum of all included domains equals 1 for calculation of the interview score.
For reference, see the domain and skills mapping spreadsheet.
Example domain configuration:
| Domain | Category | Weight |
|---|
| API Design | core | 0.30 |
| Data & DB | core | 0.26 |
| System Design | secondary | 0.19 |
| Languages & Frameworks | secondary | 0.15 |
| Performance | optional | 0.07 |
| Cloud & DevOps | optional | 0.04 |
Step 2: Map Candidate Skills to Domains
For each interview:
-
Collect the candidate’s demonstrated skills from the interview.
-
Map each skill to its corresponding domain based on the mapping table.
-
Calculate the domain score:
- Domain Score = average(skill scores within that domain)
- If the candidate did not demonstrate any skills in a domain, mark it as “Not Mentioned” instead of penalizing
- If the domain is undemonstrated in the PDF, simply leave it blank
- Domain level confidence intervals use the same approach with pooled skills
-
Calculate Core Domain Coverage:
- Core Domain Coverage = (Number of Core Domains with Demonstrated Skills) ÷ (Total Number of Core Domains)
- To be considered PASSED, a candidate must demonstrate skills in 100% of Core Domains
- If any Core Domain is not demonstrated, the candidate is automatically marked FAILED, regardless of the overall interview score
Example domain scoring:
| Domain | Skill names | Observations | Mean score across skills | Confidence intervals (95%) | Stability |
|---|
| Data & DB | SQL, NoSQL | 9 | 80 | 75-83 | Reliable |
| API Design | REST | 7 | 75 | 73-80 | Reliable |
| Languages & Frameworks | Git, Docker, Java | 6 | 70 | 65-77 | Moderately Uncertain |
| Performance | | Not Mentioned | | | |
Step 3: Calculate Interview Score and Determine Pass/Fail
-
Combine domain scores using the domain weights to calculate a new interview score based on domains.
-
Keep confidence intervals to reflect uncertainty in the scoring.
-
Only include domains with demonstrated skills in the score calculation.
-
Determine Pass/Fail:
- Initial baseline:
- PASS → top 50% of scores
- FAIL → bottom 50%
- Absolute threshold: After the first evaluation batch, an absolute threshold for Pass/Fail is introduced (calibrated using real interview data). This threshold helps ensure domain-level reliability and is iteratively refined over time as more data becomes available.
Limitations
If a candidate demonstrates only one skill within a domain, the domain is still considered assessed. However, the resulting domain score may be biased by limited evidence from a single skill.