Datasetpaper · microbial genomics
Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis
- Version
ark:/99999/dp-bodo-divergence.v1- Concept
ark:/99999/dp-bodo-divergence
A compiled view of a research object (RO-Crate). Switch between the paper and its parts; the narrative is rendered from the object, not hand-edited.
Arm B, round 2 (open question), of the engine A/B test. Generated by the automated executor (Claude Opus 4.8 via the Code path) under protocol secondary-analysis 0.1, on Figshare 10.6084/m9.figshare.31362613 (CC-BY-4.0), Supplementary File 9. Trust distance: 0.
Summary
The source paper reports unexpected genetic diversity among single-cell Bodo genomes. We tested that at the level of protein-domain (PFAM) content: are the seven single-cell genomes functionally distinct from the Bodo saltans reference, and do they cohere as a group. Using PFAM domain counts for eight genomes, normalised per genome to control for size, the single cells are significantly more distant from the reference than from one another (Bray-Curtis; Mann-Whitney p = 0.0024), and the reference is the single most divergent genome. The signal is real but not fully robust: it does not survive switching to cosine distance, and the single cells show internal substructure.
The question and why it matters
Genome-level diversity can be driven by assembly completeness rather than biology. Asking the question at the level of PFAM-domain composition, normalised to relative frequencies, tests whether the diversity is functional and structured, which is what would make it biologically interesting rather than a technical artifact.
Methods
PFAM domain counts for the eight genomes (B. saltans reference plus seven single cells) were taken from Supplementary File 9, sheet PFAMs counts. Each genome's counts were normalised to relative frequencies to control for genome size and completeness. Pairwise Bray-Curtis distances were computed, and the genomes were clustered (average linkage) and ordinated (classical multidimensional scaling). Pre-registered tests: whether single-cell-to-reference distances exceed within-single-cell distances (Mann-Whitney, one-sided), which genome is most divergent, and robustness under an alternative (cosine) distance.
Results and technical validation
Mean pairwise Bray-Curtis distance across the eight genomes is 0.144. Single-cell-to-reference distances (mean 0.179) are significantly greater than within-single-cell distances (mean 0.132), Mann-Whitney p = 0.0024, so the single cells form a group distinct from the reference. B. saltans has the largest mean distance to the others (0.179 versus 0.124 to 0.167 for the single cells), i.e. it is the outlier. Robustness: under cosine distance the same comparison is not significant (p = 0.16), and the ordination shows two single cells (B7, F10) separating from the tight cluster of the other five, so the group is real but internally structured and metric-dependent.
Limitations
PFAM counts depend on annotation and on assembly completeness; normalisation reduces but does not remove that dependence. Eight genomes is a small set for ordination, so PCoA axes are indicative, not definitive. The metric sensitivity (Bray-Curtis significant, cosine not) means the separation should be reported as a tendency, not a settled fact.
Code availability
analysis2.py is self-contained: it downloads Supplementary File 9 by Figshare id, verifies md5, and reproduces every number and figure deterministically.
Parts
Summary
The source paper reports unexpected genetic diversity among single-cell Bodo genomes. We tested that at the level of protein-domain (PFAM) content: are the seven single-cell genomes functionally distinct from the Bodo saltans reference, and do they cohere as a group. Using PFAM domain counts for eight genomes, normalised per genome to control for size, the single cells are significantly more distant from the reference than from one another (Bray-Curtis; Mann-Whitney p = 0.0024), and the reference is the single most divergent genome. The signal is real but not fully robust: it does not survive switching to cosine distance, and the single cells show internal substructure.
The question and why it matters
Genome-level diversity can be driven by assembly completeness rather than biology. Asking the question at the level of PFAM-domain composition, normalised to relative frequencies, tests whether the diversity is functional and structured, which is what would make it biologically interesting rather than a technical artifact.
Methods
PFAM domain counts for the eight genomes (B. saltans reference plus seven single cells) were taken from Supplementary File 9, sheet PFAMs counts. Each genome's counts were normalised to relative frequencies to control for genome size and completeness. Pairwise Bray-Curtis distances were computed, and the genomes were clustered (average linkage) and ordinated (classical multidimensional scaling). Pre-registered tests: whether single-cell-to-reference distances exceed within-single-cell distances (Mann-Whitney, one-sided), which genome is most divergent, and robustness under an alternative (cosine) distance.
Results and technical validation
Mean pairwise Bray-Curtis distance across the eight genomes is 0.144. Single-cell-to-reference distances (mean 0.179) are significantly greater than within-single-cell distances (mean 0.132), Mann-Whitney p = 0.0024, so the single cells form a group distinct from the reference. B. saltans has the largest mean distance to the others (0.179 versus 0.124 to 0.167 for the single cells), i.e. it is the outlier. Robustness: under cosine distance the same comparison is not significant (p = 0.16), and the ordination shows two single cells (B7, F10) separating from the tight cluster of the other five, so the group is real but internally structured and metric-dependent.
Limitations
PFAM counts depend on annotation and on assembly completeness; normalisation reduces but does not remove that dependence. Eight genomes is a small set for ordination, so PCoA axes are indicative, not definitive. The metric sensitivity (Bray-Curtis significant, cosine not) means the separation should be reported as a tendency, not a settled fact.
Code availability
analysis2.py is self-contained: it downloads Supplementary File 9 by Figshare id, verifies md5, and reproduces every number and figure deterministically.
Component inventory
| Name | Type | Path | Produced by | ARK |
|---|---|---|---|---|
analysis2 |
code | analysis2.py download |
— | ark:/99999/dp-bodo-divergence.v1/analysis2 |
fig-1 |
figure | figures/fig-1-dendrogram.png download |
analysis2 |
ark:/99999/dp-bodo-divergence.v1/fig-1 |
fig-2 |
figure | figures/fig-2-pcoa.png download |
analysis2 |
ark:/99999/dp-bodo-divergence.v1/fig-2 |
tbl-1 |
table | tables/tbl-1-braycurtis-distance.csv download |
analysis2 |
ark:/99999/dp-bodo-divergence.v1/tbl-1 |
narrative |
narrative | narrative2.md |
— | ark:/99999/dp-bodo-divergence.v1/narrative |
Provenance
this versionwasDerivedFrom Single-cell sequencing of Bodo spp. flagellates and their bacterial endosymbionts (doi:10.6084/m9.figshare.31362613)this versionwasAttributedTo Claude Opus 4.8 (claude-opus-4-8)this versionwasRequestedBy Mark Hahnelfig-1wasGeneratedBy the analysis (analysis2)fig-2wasGeneratedBy the analysis (analysis2)tbl-1wasGeneratedBy the analysis (analysis2)
Figures
Tables
tbl-1| Bsal | A8 | A10 | B7 | F10 | B2 | G10 | H10 | |
|---|---|---|---|---|---|---|---|---|
| Bsal | 0.0 | 0.1743 | 0.1707 | 0.2003 | 0.2118 | 0.1588 | 0.1739 | 0.1637 |
| A8 | 0.1743 | 0.0 | 0.094 | 0.1661 | 0.177 | 0.1063 | 0.1198 | 0.1125 |
| A10 | 0.1707 | 0.094 | 0.0 | 0.1536 | 0.1666 | 0.1033 | 0.1085 | 0.099 |
| B7 | 0.2003 | 0.1661 | 0.1536 | 0.0 | 0.099 | 0.1625 | 0.1647 | 0.1566 |
| F10 | 0.2118 | 0.177 | 0.1666 | 0.099 | 0.0 | 0.1713 | 0.1757 | 0.1657 |
| B2 | 0.1588 | 0.1063 | 0.1033 | 0.1625 | 0.1713 | 0.0 | 0.0878 | 0.0808 |
| G10 | 0.1739 | 0.1198 | 0.1085 | 0.1647 | 0.1757 | 0.0878 | 0.0 | 0.0927 |
| H10 | 0.1637 | 0.1125 | 0.099 | 0.1566 | 0.1657 | 0.0808 | 0.0927 | 0.0 |
Claims
Each claim is individually addressable and carries its verification status, the figures or tables that support it, and its distance from the raw data.
-
Normalised PFAM-domain profiles place the single cells significantly further from the B. saltans reference (mean Bray-Curtis 0.179) than from one another (0.132), Mann-Whitney p = 0.0024, so the single cells form a coherent functional group distinct from the reference.
-
B. saltans has the largest mean Bray-Curtis distance to the other genomes (0.179 versus 0.124 to 0.167 for the single cells), consistent with the single cells clustering together and the reference sitting as the outlier.
-
The separation is significant under Bray-Curtis (p = 0.0024) but not under cosine distance (p = 0.16), and the ordination shows two single cells (B7, F10) diverging from the other five, so the grouping should be read as a tendency rather than a settled result.
Cite
@misc{bodo-functional-divergence,
title = {Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis},
author = {Claude Opus 4.8},
howpublished = {datasetpapers},
note = {datasetpaper ark:/99999/dp-bodo-divergence.v1; based on Single-cell sequencing of Bodo spp. flagellates and their bacterial endosymbionts (doi:10.6084/m9.figshare.31362613), data by Sally D. Warring et al.},
url = {https://datasetpapers.com/papers/bodo-functional-divergence/}
}
Claude Opus 4.8. Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis. datasetpapers. ark:/99999/dp-bodo-divergence.v1. https://datasetpapers.com/papers/bodo-functional-divergence/