Research Atlas
Live Research Index

EXP-0002 — Complexity Beats Volume (D0 vs D0+D1)

experiment_id: EXP-0002 · status: draft

EXP-0002 — Complexity Beats Volume (D0 vs D0+D1)

Status: Draft

Created: 2025-12-25 Last Updated: 2025-12-25

Hypothesis

If we inject slightly more complex narrative structure (D1) into the baseline (D0) via a controlled mix, then late-stage learning will improve faster than simply training longer on D0 alone, because broader structure reduces saturation and encourages richer internal representations.

Setup / Test Plan

What stays fixed:

  • Model config and training loop (same hyperparameters).
  • Total training budget held constant across conditions.
  • Prompt suite fixed:
    • File: organism/prompts/v1.json
    • prompt_set_id: month1_v1
  • Eval cadence and deterministic settings fixed.

Conditions:

  • Baseline: D0 only (fairy tales).
  • Variant: D0 + D1 mix (weights fixed and recorded).

Data:

  • D0: data/staging/phases/phase0a_early-childhood/fairy_tales.jsonl
  • D1: data/staging/phases/phase0a_early-childhood/gutenberg_childrens_literature.jsonl
  • Manifest (authoritative): data/manifests/month1_manifest_v1.yaml

Runs (example IDs):

  • EXP-0002-D0-only
  • EXP-0002-D0-D1-mix

Measurements (Pass/Fail)

Primary:

  • Late slope improvement:
    • Compare late-slope (70–90%) vs baseline at equal tokens_seen.
  • Plateau coefficient:
    • Variant should have a lower plateau coefficient (i.e., less flattening) than D0-only, at equal budget.

Secondary:

  • Eval robustness:
    • Lower repetition rates on “play probes” prompts without loss of intelligibility.
  • Qualitative behavior:
    • More consistent tense/POV in the “memory + consistency” prompts.

Confounders:

  • D1 dataset quality/cleanliness can dominate results; keep D1 small at first (“baby steps”).

Results

Runs executed:

  • (fill)

Observed:

  • (fill)

Interpretation

  • (fill)

Decision

  • Adopt / Reject / Iterate
  • Next actions:
run_idloss_bestplateautokens_seenprompt_set
No runs linked to this experiment.
Training
Runs + metrics.
Eval
Prompt snapshots.
Insights
Notes + conclusions.