Research Atlas
Live Research Index

Gutenberg Curriculum Baseline

experiment_id: exp_002 · status: draft

Experiment — Gutenberg Curriculum Baseline

Hypothesis (Falsifiable)

Training on curriculum-aligned Gutenberg narrative text will reduce loss steadily without early collapse into repetition, and will improve short prompt continuation coherence relative to a minimal smoke dataset.

Variables

Independent (Changed)

  • Dataset: Gutenberg curriculum subset (vs prior minimal samples).

Controlled (Fixed)

  • Model architecture (locked via organism/configs/models/*).
  • Prompt suite (organism/prompts/*).
  • Training loop implementation (organism/training/train.py).

Explicitly Not Tested

  • Tokenizer changes beyond the current codec/tokenization module.
  • Any RLHF/preference shaping.

Dataset Specification

  • Manifest: data/manifests/phase0_curriculum_training.json
  • Local staging: data/staging/phases/phase0_curriculum_training/

Runs

  • (Add run IDs here after execution.)

Results Summary (1 page max)

(Write after runs complete.)

Interpretation

(Write after results summary.)

Next Actions

  • Decide whether the curriculum baseline is good enough to proceed to the next curriculum stage.
run_idloss_bestplateautokens_seenprompt_set
No runs linked to this experiment.
Training
Runs + metrics.
Eval
Prompt snapshots.
Insights
Notes + conclusions.