Gutenberg Curriculum Baseline
experiment_id: exp_002 · status: draft
Experiment — Gutenberg Curriculum Baseline
Hypothesis (Falsifiable)
Training on curriculum-aligned Gutenberg narrative text will reduce loss steadily without early collapse into repetition, and will improve short prompt continuation coherence relative to a minimal smoke dataset.
Variables
Independent (Changed)
- Dataset: Gutenberg curriculum subset (vs prior minimal samples).
Controlled (Fixed)
- Model architecture (locked via
organism/configs/models/*). - Prompt suite (
organism/prompts/*). - Training loop implementation (
organism/training/train.py).
Explicitly Not Tested
- Tokenizer changes beyond the current codec/tokenization module.
- Any RLHF/preference shaping.
Dataset Specification
- Manifest:
data/manifests/phase0_curriculum_training.json - Local staging:
data/staging/phases/phase0_curriculum_training/
Runs
- (Add run IDs here after execution.)
Results Summary (1 page max)
(Write after runs complete.)
Interpretation
(Write after results summary.)
Next Actions
- Decide whether the curriculum baseline is good enough to proceed to the next curriculum stage.
Runs
View runs| run_id | loss_best | plateau | tokens_seen | prompt_set |
|---|---|---|---|---|
| No runs linked to this experiment. | ||||
Training
Runs + metrics.
Eval
Prompt snapshots.
Insights
Notes + conclusions.