datasetpapers

Datasetpaper · microbial genomics

Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis

Version
ark:/99999/dp-bodo-divergence.v1
Concept
ark:/99999/dp-bodo-divergence
Source dataset
Single-cell sequencing of Bodo spp. flagellates and their bacterial endosymbionts

A compiled view of a research object (RO-Crate). Switch between the paper and its parts; the narrative is rendered from the object, not hand-edited.

Arm B, round 2 (open question), of the engine A/B test. Generated by the automated executor (Claude Opus 4.8 via the Code path) under protocol secondary-analysis 0.1, on Figshare 10.6084/m9.figshare.31362613 (CC-BY-4.0), Supplementary File 9. Trust distance: 0.

Summary

The source paper reports unexpected genetic diversity among single-cell Bodo genomes. We tested that at the level of protein-domain (PFAM) content: are the seven single-cell genomes functionally distinct from the Bodo saltans reference, and do they cohere as a group. Using PFAM domain counts for eight genomes, normalised per genome to control for size, the single cells are significantly more distant from the reference than from one another (Bray-Curtis; Mann-Whitney p = 0.0024), and the reference is the single most divergent genome. The signal is real but not fully robust: it does not survive switching to cosine distance, and the single cells show internal substructure.

The question and why it matters

Genome-level diversity can be driven by assembly completeness rather than biology. Asking the question at the level of PFAM-domain composition, normalised to relative frequencies, tests whether the diversity is functional and structured, which is what would make it biologically interesting rather than a technical artifact.

Methods

PFAM domain counts for the eight genomes (B. saltans reference plus seven single cells) were taken from Supplementary File 9, sheet PFAMs counts. Each genome's counts were normalised to relative frequencies to control for genome size and completeness. Pairwise Bray-Curtis distances were computed, and the genomes were clustered (average linkage) and ordinated (classical multidimensional scaling). Pre-registered tests: whether single-cell-to-reference distances exceed within-single-cell distances (Mann-Whitney, one-sided), which genome is most divergent, and robustness under an alternative (cosine) distance.

Results and technical validation

Mean pairwise Bray-Curtis distance across the eight genomes is 0.144. Single-cell-to-reference distances (mean 0.179) are significantly greater than within-single-cell distances (mean 0.132), Mann-Whitney p = 0.0024, so the single cells form a group distinct from the reference. B. saltans has the largest mean distance to the others (0.179 versus 0.124 to 0.167 for the single cells), i.e. it is the outlier. Robustness: under cosine distance the same comparison is not significant (p = 0.16), and the ordination shows two single cells (B7, F10) separating from the tight cluster of the other five, so the group is real but internally structured and metric-dependent.

Limitations

PFAM counts depend on annotation and on assembly completeness; normalisation reduces but does not remove that dependence. Eight genomes is a small set for ordination, so PCoA axes are indicative, not definitive. The metric sensitivity (Bray-Curtis significant, cosine not) means the separation should be reported as a tendency, not a settled fact.

Code availability

analysis2.py is self-contained: it downloads Supplementary File 9 by Figshare id, verifies md5, and reproduces every number and figure deterministically.

Parts

Summary

The source paper reports unexpected genetic diversity among single-cell Bodo genomes. We tested that at the level of protein-domain (PFAM) content: are the seven single-cell genomes functionally distinct from the Bodo saltans reference, and do they cohere as a group. Using PFAM domain counts for eight genomes, normalised per genome to control for size, the single cells are significantly more distant from the reference than from one another (Bray-Curtis; Mann-Whitney p = 0.0024), and the reference is the single most divergent genome. The signal is real but not fully robust: it does not survive switching to cosine distance, and the single cells show internal substructure.

The question and why it matters

Genome-level diversity can be driven by assembly completeness rather than biology. Asking the question at the level of PFAM-domain composition, normalised to relative frequencies, tests whether the diversity is functional and structured, which is what would make it biologically interesting rather than a technical artifact.

Methods

PFAM domain counts for the eight genomes (B. saltans reference plus seven single cells) were taken from Supplementary File 9, sheet PFAMs counts. Each genome's counts were normalised to relative frequencies to control for genome size and completeness. Pairwise Bray-Curtis distances were computed, and the genomes were clustered (average linkage) and ordinated (classical multidimensional scaling). Pre-registered tests: whether single-cell-to-reference distances exceed within-single-cell distances (Mann-Whitney, one-sided), which genome is most divergent, and robustness under an alternative (cosine) distance.

Results and technical validation

Mean pairwise Bray-Curtis distance across the eight genomes is 0.144. Single-cell-to-reference distances (mean 0.179) are significantly greater than within-single-cell distances (mean 0.132), Mann-Whitney p = 0.0024, so the single cells form a group distinct from the reference. B. saltans has the largest mean distance to the others (0.179 versus 0.124 to 0.167 for the single cells), i.e. it is the outlier. Robustness: under cosine distance the same comparison is not significant (p = 0.16), and the ordination shows two single cells (B7, F10) separating from the tight cluster of the other five, so the group is real but internally structured and metric-dependent.

Limitations

PFAM counts depend on annotation and on assembly completeness; normalisation reduces but does not remove that dependence. Eight genomes is a small set for ordination, so PCoA axes are indicative, not definitive. The metric sensitivity (Bray-Curtis significant, cosine not) means the separation should be reported as a tendency, not a settled fact.

Code availability

analysis2.py is self-contained: it downloads Supplementary File 9 by Figshare id, verifies md5, and reproduces every number and figure deterministically.

Component inventory

NameTypePathProduced byARK
analysis2 code analysis2.py download ark:/99999/dp-bodo-divergence.v1/analysis2
fig-1 figure figures/fig-1-dendrogram.png download analysis2 ark:/99999/dp-bodo-divergence.v1/fig-1
fig-2 figure figures/fig-2-pcoa.png download analysis2 ark:/99999/dp-bodo-divergence.v1/fig-2
tbl-1 table tables/tbl-1-braycurtis-distance.csv download analysis2 ark:/99999/dp-bodo-divergence.v1/tbl-1
narrative narrative narrative2.md ark:/99999/dp-bodo-divergence.v1/narrative

Provenance

  • this version wasDerivedFrom Single-cell sequencing of Bodo spp. flagellates and their bacterial endosymbionts (doi:10.6084/m9.figshare.31362613)
  • this version wasAttributedTo Claude Opus 4.8 (claude-opus-4-8)
  • this version wasRequestedBy Mark Hahnel
  • fig-1 wasGeneratedBy the analysis (analysis2)
  • fig-2 wasGeneratedBy the analysis (analysis2)
  • tbl-1 wasGeneratedBy the analysis (analysis2)

Figures

Figure 1 (fig-1) from Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis
Figure 1 — supports claims 1, 2. code → figure
Figure 2 (fig-2) from Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis
Figure 2 — supports claims 1, 3. code → figure

Tables

Table 1 — tbl-1
BsalA8A10B7F10B2G10H10
Bsal0.00.17430.17070.20030.21180.15880.17390.1637
A80.17430.00.0940.16610.1770.10630.11980.1125
A100.17070.0940.00.15360.16660.10330.10850.099
B70.20030.16610.15360.00.0990.16250.16470.1566
F100.21180.1770.16660.0990.00.17130.17570.1657
B20.15880.10630.10330.16250.17130.00.08780.0808
G100.17390.11980.10850.16470.17570.08780.00.0927
H100.16370.11250.0990.15660.16570.08080.09270.0

Download CSV.

Claims

Each claim is individually addressable and carries its verification status, the figures or tables that support it, and its distance from the raw data.

  1. #

    Normalised PFAM-domain profiles place the single cells significantly further from the B. saltans reference (mean Bray-Curtis 0.179) than from one another (0.132), Mann-Whitney p = 0.0024, so the single cells form a coherent functional group distinct from the reference.

    re-executed confirmatory novelty B confidence 0.85 supported by fig-1, fig-2 ark:/99999/dp-bodo-divergence.v1/claim-1

  2. #

    B. saltans has the largest mean Bray-Curtis distance to the other genomes (0.179 versus 0.124 to 0.167 for the single cells), consistent with the single cells clustering together and the reference sitting as the outlier.

    re-executed exploratory novelty B confidence 0.85 supported by fig-1 ark:/99999/dp-bodo-divergence.v1/claim-2

  3. #

    The separation is significant under Bray-Curtis (p = 0.0024) but not under cosine distance (p = 0.16), and the ordination shows two single cells (B7, F10) diverging from the other five, so the grouping should be read as a tendency rather than a settled result.

    re-executed exploratory novelty B confidence 0.8 supported by fig-2 ark:/99999/dp-bodo-divergence.v1/claim-3

Cite

BibTeX
@misc{bodo-functional-divergence,
  title        = {Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis},
  author       = {Claude Opus 4.8},
  howpublished = {datasetpapers},
  note         = {datasetpaper ark:/99999/dp-bodo-divergence.v1; based on Single-cell sequencing of Bodo spp. flagellates and their bacterial endosymbionts (doi:10.6084/m9.figshare.31362613), data by Sally D. Warring et al.},
  url          = {https://datasetpapers.com/papers/bodo-functional-divergence/}
}
Text
Claude Opus 4.8. Do single-cell Bodo genomes form a functional group distinct from the reference? A PFAM-domain divergence analysis. datasetpapers. ark:/99999/dp-bodo-divergence.v1. https://datasetpapers.com/papers/bodo-functional-divergence/

Data, code & machine surfaces