Datasetpaper · face perception

Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set

Analysis generated by Claude Opus 4.8 (claude-opus-4-8), on a dataset by Lisa DeBruine + 1 more, in response to a question from Mark Hahnel.

Paper Object

Version: ark:/99999/dp-face-preference-congruence.v1
Concept: ark:/99999/dp-face-preference-congruence
Source dataset: Face Research Lab London Set

A compiled view of a research object (RO-Crate). Switch between the paper and its parts; the narrative is rendered from the object, not hand-edited.

Summary

This is a pre-registered secondary analysis of a single, fixed question: do raters give higher attractiveness ratings to faces of their preferred gender ("preference congruence"), and does that effect survive once the repeated-measures structure — each rater rates many faces, each face is rated by many raters — is properly accounted for? Using the Face Research Lab London Set (2,513 raters × 102 faces; 256,326 attractiveness ratings on a 1–7 scale), we mapped each rating to congruent or incongruent with the rater's stated sexual preference and fit crossed linear mixed models with random intercepts for both rater and face.

The headline result is a small but statistically clear positive congruence effect: rating a face of one's preferred gender raises the attractiveness score by 0.037 points (95% CI 0.028 to 0.047; Holm-corrected p = 2.6e-13; standardized effect 0.024). Two features make this a cautionary tale for secondary analysts. First, the naive per-rater comparison points the opposite way (mean congruent − incongruent difference = -0.139 points; Wilcoxon p = 9.3e-21): a Simpson-type reversal driven by which faces fall into each category, which the crossed model removes. Second, the effect is entirely carried by male raters (+0.091 points) and is null for female raters (-0.014 points, CI spans zero), it grows with rater age, and it reverses sign if congruence is imputed from rater sex under a heterosexual assumption rather than taken from stated preference. The congruence effect is therefore real, specific to stated preference, and far too small to matter for any individual rating.

Provenance and methods

Source data. Face Research Lab London Set, DeBruine & Jones, figshare [10.6084/m9.figshare.5047666](https://doi.org/10.6084/m9.figshare.5047666), CC-BY-4.0. Two files were used, and their MD5 checksums were verified against the manifest at load time and asserted in analysis.py:

london_faces_ratings.csv — MD5 fe3743a1bb6b2b414e5170b8723aec1b; one row per rater (2,513

raters), columns rater_sex, rater_sexpref, rater_age, then 102 face-rating columns X001–X173 (the integer in X0NN is the face_id), each a 1–7 attractiveness rating.

london_faces_info.csv — MD5 52e9d142812130f0b19af168260fb160; one row per face (102 faces),

columns face_id, face_age, face_gender (53 male, 49 female), face_eth.

Reshaping. The rating matrix was melted to long form (one row per rater × face rating), missing cells dropped, and joined to face metadata on face_id, giving 256,326 observations.

Congruence mapping (stated explicitly). Each rating was labelled from rater_sexpref: men → prefers male faces; women → prefers female faces; either → prefers both (every face congruent); neither → prefers neither (every face incongruent); missing preference → undefined and excluded. A rating is congruent (1) if the face's gender is in the rater's preferred set, else incongruent (0). For the robustness test T3b, a separate sex-based congruence was defined under a heterosexual assumption (male rater → female faces congruent; female rater → male faces congruent; intersex/missing sex excluded).

Models. The primary estimator is a crossed linear mixed model, rating ~ congruent + (1|rater) + (1|face_id), fit by REML in statsmodels MixedLM (crossed random intercepts encoded as two variance components under a single dummy group). Because the crossed optimisation is high-dimensional, every reported fixed-effect estimate (T1 congruence, the T2 sex interaction and both per-sex slopes, the T2 age interaction, T3a, and T3b) was independently cross-checked against a two-way (rater + face) within fixed-effects slope of the same term — an optimiser-independent benchmark computed in analysis.py; all seven agreed to within 0.0020 points (largest discrepancy on the sex interaction: within-FE 0.1029 vs mixed 0.1049; the primary T1 estimate agreed to 0.0003, within-FE 0.0376 vs mixed 0.037). Effect sizes are reported as the coefficient divided by the SD of ratings (1.576). Seeds were set (numpy seed 42). The full pipeline is deterministic and re-running analysis.py reproduces every number.

Pre-registered tests (exactly these, nothing else).

T1 — preference congruence. The congruence coefficient in the crossed model above. An unadjusted

per-rater paired comparison (Wilcoxon signed-rank on mean congruent vs incongruent) is reported alongside as context.

T2 — moderation. congruent × rater_sex interaction (with stratified per-sex slopes) and

congruent × rater_age (age mean-centred) interaction, each in a crossed model.

T3 — robustness. T1 (a) restricted to raters with a single-gender preference (men/women only),

and (b) with congruence defined by the rater-sex-based heterosexual assumption.

Multiplicity. The five inferential tests (T1 congruence, T2 sex interaction, T2 age interaction, T3a, T3b) form the pre-registered family and were corrected together with the Holm–Bonferroni method. All five survive correction.

Data records

The project folder contains:

analysis.py — self-contained, seeded analysis that reads the two CSVs (asserting their MD5s),

reshapes and joins, runs T1–T3, applies the Holm correction, and writes every figure, table, and results.json.

london_faces_ratings.csv, london_faces_info.csv — the source data (unmodified copies).
figures/ — one PNG per pre-registered test (≥200 DPI, colourblind-safe Okabe–Ito palette):

fig-1-preference-congruence.png (T1), fig-2-moderation-sex-age.png (T2), fig-3-robustness.png (T3).

tables/ — tbl-1-t1-congruence.csv, tbl-2-t2-moderation.csv, tbl-3-t3-robustness.csv,

tbl-4-composition.csv, plus tables/datapackage.json (Frictionless tabular-data-package describing each table's schema).

results.json — every statistic reported here (estimate, 95% CI, raw and Holm-corrected p, effect

size, n).

claims.json, components.json — machine-readable claim list and component index.
environment.txt — Python version and pip freeze.

Technical validation

Three independent checks support the numbers. (1) Estimator agreement. Each crossed-model fixed-effect estimate was matched against a two-way (rater + face) within fixed-effects benchmark for the same term, computed in analysis.py; all seven agree to within 0.0020 rating points (T1: within-FE 0.0376 vs mixed 0.037; T2 sex interaction 0.1029 vs 0.1049 — the largest discrepancy; T2 age interaction 0.0037 vs 0.0037; T3a 0.0360 vs 0.0358; T3b −0.0308 vs −0.0310; full per-term table in results.json under meta.estimator_validation). This rules out an optimiser artefact and confirms the shrinkage-based mixed estimates recover the same within-subject, within-face signal. (2) Convergence. All crossed fits converged (REML). The rater and face variance components are substantial (rater 0.697, face 0.605, residual 1.188), confirming both grouping factors carry real variance and must be modelled — which is exactly why the naive per-rater analysis is misleading. (3) Direction sanity. The reversal between the unadjusted per-rater difference (-0.139) and the adjusted estimate (+0.037) is the expected consequence of face-level confounding (the "either"/"neither" categories and the differing attractiveness of male vs female faces load onto the congruence label); it is stable across specifications and is presented openly rather than hidden.

Usage notes

Interpret the congruence effect as statistically robust but negligible in magnitude: at 0.037 points on a 1–7 scale (about 0.024 SD), it is far below what would shift any individual judgement, and it is not uniform across raters. Specifically:

The effect is carried by male raters (+0.091 points, 95% CI 0.074 to 0.108;

n = 844) and is null for female raters (-0.014 points, 95% CI -0.031 to 0.003; n = 1362). The sex interaction is 0.105 points (Holm p = 4.1e-13).

It increases with rater age by 0.0037 points/year (95% CI 0.0028 to

0.0047; Holm p = 3.8e-14).

It is specific to stated preference: excluding bisexual/"either" and asexual/"neither" raters

leaves it unchanged (0.036 points; Holm p = 1.3e-12), but defining congruence from rater sex under a heterosexual assumption reverses the sign (-0.031 points; Holm p = 2.7e-12). Analysts should not substitute sex-based imputation for stated preference.

These estimates are associational and specific to this convenience sample of online raters and this particular face set; they are not evidence about causal mechanisms of attraction.

Code availability

All code is in analysis.py in this project folder. It has no hidden state: it reads the two MD5-verified CSVs, sets a fixed seed, and regenerates results.json, all four tables, and all three figures on a single run. Package versions are pinned in environment.txt. The crossed mixed models are computationally heavy (each fit takes minutes and up to ~40 GB RAM; a full run is on the order of 1–2 hours on a multi-core machine).

Claims

1. Preferred-gender congruence raises attractiveness ratings by 0.037 points (95% CI 0.028–0.047; Holm p = 2.6e-13; standardized 0.024) in the crossed model — statistically robust but very small. [fig-1, tbl-1] 2. The naive per-rater difference is negative (-0.139 points; Wilcoxon p = 9.3e-21), the opposite sign to the adjusted estimate — a Simpson-type reversal from face-composition confounding. [fig-1, tbl-1] 3. The congruence effect is moderated by rater sex: +0.091 for male raters, null (-0.014) for female raters (interaction 0.105; Holm p = 4.1e-13). [fig-2, tbl-2] 4. The congruence effect increases with rater age by 0.0037 points/year (95% CI 0.0028–0.0047; Holm p = 3.8e-14). [fig-2, tbl-2] 5. The effect is robust to excluding either/neither raters (0.036 points; Holm p = 1.3e-12). [fig-3, tbl-3] 6. The effect reverses sign under a rater-sex-based heterosexual-assumption definition (-0.031 points; Holm p = 2.7e-12), showing it is specific to stated preference. [fig-3, tbl-3]

Parts

Summary

Provenance and methods

london_faces_ratings.csv — MD5 fe3743a1bb6b2b414e5170b8723aec1b; one row per rater (2,513

raters), columns rater_sex, rater_sexpref, rater_age, then 102 face-rating columns X001–X173 (the integer in X0NN is the face_id), each a 1–7 attractiveness rating.

london_faces_info.csv — MD5 52e9d142812130f0b19af168260fb160; one row per face (102 faces),

columns face_id, face_age, face_gender (53 male, 49 female), face_eth.

Reshaping. The rating matrix was melted to long form (one row per rater × face rating), missing cells dropped, and joined to face metadata on face_id, giving 256,326 observations.

Pre-registered tests (exactly these, nothing else).

T1 — preference congruence. The congruence coefficient in the crossed model above. An unadjusted

per-rater paired comparison (Wilcoxon signed-rank on mean congruent vs incongruent) is reported alongside as context.

T2 — moderation. congruent × rater_sex interaction (with stratified per-sex slopes) and

congruent × rater_age (age mean-centred) interaction, each in a crossed model.

T3 — robustness. T1 (a) restricted to raters with a single-gender preference (men/women only),

and (b) with congruence defined by the rater-sex-based heterosexual assumption.

Data records

The project folder contains:

analysis.py — self-contained, seeded analysis that reads the two CSVs (asserting their MD5s),

reshapes and joins, runs T1–T3, applies the Holm correction, and writes every figure, table, and results.json.

london_faces_ratings.csv, london_faces_info.csv — the source data (unmodified copies).
figures/ — one PNG per pre-registered test (≥200 DPI, colourblind-safe Okabe–Ito palette):

fig-1-preference-congruence.png (T1), fig-2-moderation-sex-age.png (T2), fig-3-robustness.png (T3).

tables/ — tbl-1-t1-congruence.csv, tbl-2-t2-moderation.csv, tbl-3-t3-robustness.csv,

tbl-4-composition.csv, plus tables/datapackage.json (Frictionless tabular-data-package describing each table's schema).

results.json — every statistic reported here (estimate, 95% CI, raw and Holm-corrected p, effect

size, n).

claims.json, components.json — machine-readable claim list and component index.
environment.txt — Python version and pip freeze.

Technical validation

Usage notes

The effect is carried by male raters (+0.091 points, 95% CI 0.074 to 0.108;

n = 844) and is null for female raters (-0.014 points, 95% CI -0.031 to 0.003; n = 1362). The sex interaction is 0.105 points (Holm p = 4.1e-13).

It increases with rater age by 0.0037 points/year (95% CI 0.0028 to

0.0047; Holm p = 3.8e-14).

It is specific to stated preference: excluding bisexual/"either" and asexual/"neither" raters

These estimates are associational and specific to this convenience sample of online raters and this particular face set; they are not evidence about causal mechanisms of attraction.

Code availability

Claims

Component inventory

Name	Type	Path	Produced by	ARK
`analysis`	code	`analysis.py` download	—	`ark:/99999/dp-face-preference-congruence.v1/analysis`
`fig-1`	figure	`figures/fig-1-preference-congruence.png` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/fig-1`
`fig-2`	figure	`figures/fig-2-moderation-sex-age.png` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/fig-2`
`fig-3`	figure	`figures/fig-3-robustness.png` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/fig-3`
`tbl-1`	table	`tables/tbl-1-t1-congruence.csv` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/tbl-1`
`tbl-2`	table	`tables/tbl-2-t2-moderation.csv` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/tbl-2`
`tbl-3`	table	`tables/tbl-3-t3-robustness.csv` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/tbl-3`
`tbl-4`	table	`tables/tbl-4-composition.csv` download	`analysis`	`ark:/99999/dp-face-preference-congruence.v1/tbl-4`
`narrative`	narrative	`narrative.md`	—	`ark:/99999/dp-face-preference-congruence.v1/narrative`

Provenance

this version wasDerivedFrom Face Research Lab London Set (doi:10.6084/m9.figshare.5047666)
this version wasAttributedTo Claude Opus 4.8 (claude-opus-4-8)
this version wasRequestedBy Mark Hahnel
fig-1 wasGeneratedBy the analysis (analysis)
fig-2 wasGeneratedBy the analysis (analysis)
fig-3 wasGeneratedBy the analysis (analysis)
tbl-1 wasGeneratedBy the analysis (analysis)
tbl-2 wasGeneratedBy the analysis (analysis)
tbl-3 wasGeneratedBy the analysis (analysis)
tbl-4 wasGeneratedBy the analysis (analysis)

Figures

Figure 1 (fig-1) from Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set — Figure 1 — supports claims 1, 2. code → figure

Figure 2 (fig-2) from Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set — Figure 2 — supports claims 3, 4. code → figure

Figure 3 (fig-3) from Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set — Figure 3 — supports claims 5, 6. code → figure

Tables

Table 1 — tbl-1

quantity	estimate	ci_low	ci_high	p_raw	p_holm	effect_size_std	n_obs	n_raters
Adjusted congruence effect (crossed LMM)	0.03731567672020113	0.027559753053533278	0.04707160038686897	6.543688331210333e-14	2.6174753324841334e-13	0.023681999459721437	225420	2210
Unadjusted per-rater mean difference (congruent - incongruent)	-0.1386592183094913			9.26511618718288e-21			225420	1990

Download CSV.

Table 2 — tbl-2

term	estimate	ci_low	ci_high	p_raw	p_holm	n_raters
Congruence × rater sex (male vs female), interaction	0.10492271422027083	0.07713449839892321	0.13271093004161844	1.3574439504542895e-13	4.0723318513628686e-13	2206
Congruence slope, male raters	0.09078054169610214	0.07362693798742156	0.10793414540478272	3.3058730589160625e-25		844
Congruence slope, female raters	-0.014142170951795596	-0.0309446946906447	0.002660352787053507	0.09901621758838511		1362
Congruence × rater age, interaction (per year)	0.0037230713432045434	0.0027844151923147467	0.00466172749409434	7.606312894044302e-15	3.8031564470221506e-14	2206

Download CSV.

Table 3 — tbl-3

specification	estimate	ci_low	ci_high	p_raw	p_holm	n_raters
Primary (stated preference, all raters)	0.03731567672020113	0.027559753053533278	0.04707160038686897	6.543688331210333e-14	2.6174753324841334e-13	2210
T3a: single-gender preference (men/women only)	0.03578064037629097	0.026028110137824613	0.04553317061475733	6.43996185725113e-13	1.287992371450226e-12	1990
T3b: sex-based expected preference (heterosexual assumption)	-0.03098054522685466	-0.03966558914644843	-0.022295501307260886	2.7212602173071034e-12	2.7212602173071034e-12	2507

Download CSV.

Table 4 — tbl-4

variable	category	n_raters
rater_sex	female	1552.0
rater_sex	male	955.0
rater_sex	intersex	3.0
rater_sex	nan	3.0
rater_sexpref	men	1242.0
rater_sexpref	women	748.0
rater_sexpref	nan	303.0
rater_sexpref	either	213.0
rater_sexpref	neither	7.0
rater_age	mean	26.7
face_gender	male	53.0
face_gender	female	49.0

Download CSV.

Claims

Each claim is individually addressable and carries its verification status, the figures or tables that support it, and its distance from the raw data.

#

Once crossed random intercepts for rater and face are included, rating a face of a rater's preferred gender raises the attractiveness rating by 0.037 points on the 1-7 scale (95% CI 0.028 to 0.047; standardized effect 0.024; raw p=6.54e-14, Holm p=2.62e-13; n=225420 ratings from 2210 raters). The effect is statistically robust but very small in magnitude.

unverified confirmatory novelty C confidence 0.9 supported by fig-1, tbl-1 ark:/99999/dp-face-preference-congruence.v1/claim-1
#

The unadjusted per-rater mean difference (congruent minus incongruent) is negative (-0.139 points; Wilcoxon signed-rank p=9.27e-21, n=1990 raters), the opposite sign to the +0.037-point crossed-model estimate — a Simpson-type reversal produced by face-composition confounding that the crossed model removes.

unverified exploratory novelty B confidence 0.85 supported by fig-1, tbl-1 ark:/99999/dp-face-preference-congruence.v1/claim-2
#

The congruence effect depends strongly on rater sex (congruence x sex interaction 0.105 points, 95% CI 0.077 to 0.133; raw p=1.36e-13, Holm p=4.07e-13). It is carried by male raters (+0.091 points, 95% CI 0.074 to 0.108) and is null for female raters (-0.014 points, 95% CI -0.031 to 0.003, CI spans zero).

unverified confirmatory novelty B confidence 0.9 supported by fig-2, tbl-2 ark:/99999/dp-face-preference-congruence.v1/claim-3
#

The congruence effect increases with rater age by 0.0037 points per year (95% CI 0.0028 to 0.0047; raw p=7.61e-15, Holm p=3.80e-14; at the mean age of 26.6 years the effect is 0.035 points).

unverified confirmatory novelty C confidence 0.8 supported by fig-2, tbl-2 ark:/99999/dp-face-preference-congruence.v1/claim-4
#

Restricting to raters with an unambiguous single-gender preference (men or women only) leaves the estimate essentially unchanged: 0.036 points (95% CI 0.026 to 0.046; Holm p=1.29e-12; n=1990 raters), versus +0.037 in the primary model.

unverified confirmatory novelty C confidence 0.9 supported by fig-3, tbl-3 ark:/99999/dp-face-preference-congruence.v1/claim-5
#

When congruence is defined from rater sex under a heterosexual assumption instead of stated preference, the effect reverses to -0.031 points (95% CI -0.040 to -0.022; Holm p=2.72e-12; n=2507 raters): faces of the assumed-preferred gender are rated slightly lower, showing the positive congruence effect is specific to stated preference.

unverified confirmatory (null result) novelty B confidence 0.85 supported by fig-3, tbl-3 ark:/99999/dp-face-preference-congruence.v1/claim-6

Cite

BibTeX

@misc{face-perception-attractiveness-preference,
  title        = {Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set},
  author       = {Claude Opus 4.8},
  howpublished = {datasetpapers},
  note         = {datasetpaper ark:/99999/dp-face-preference-congruence.v1; based on Face Research Lab London Set (doi:10.6084/m9.figshare.5047666), data by Lisa DeBruine et al.},
  url          = {https://datasetpapers.com/papers/face-perception-attractiveness-preference/}
}

Text

Claude Opus 4.8. Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set. datasetpapers. ark:/99999/dp-face-preference-congruence.v1. https://datasetpapers.com/papers/face-perception-attractiveness-preference/

Preferred-gender congruence and facial attractiveness: a crossed mixed-model reanalysis of the Face Research Lab London Set

Summary

Provenance and methods

Data records

Technical validation

Usage notes

Code availability

Claims

Parts

Summary

Provenance and methods

Data records

Technical validation

Usage notes

Code availability

Claims

Component inventory

Provenance

Figures

Tables

Claims

Cite

Data, code & machine surfaces