Research Atlas
Live Research Index

Curriculum Ordering Baseline (Dataset Plan Locked)

experiment_id: exp_001 · status: draft

Experiment — Curriculum Ordering Baseline (Dataset Plan Locked)

Hypothesis (Falsifiable)

A curriculum-ordered training schedule will alter training stability and/or downstream behavior relative to a shuffled control, even when total token count and model architecture are held constant.

Variables

Independent (Changed)

  • Dataset ordering: curriculum (D0 then D1 then D2) vs mixed baseline.

Controlled (Fixed)

  • Source corpus: Project Gutenberg plain-text .txt.
  • Language filter: English only.
  • Preprocessing rules (see Dataset Specification).
  • Token budget distribution: D0 25% / D1 50% / D2 25%.
  • Model architecture and training loop.

Explicitly Not Tested

  • Licensing constraints (not a project constraint here).
  • Semantic purity of categories (segmentation is coarse and heuristic).
  • RLHF or preference shaping.
  • Biometric or experiential data.

Dataset Specification

  • Dataset: dataset_gutenberg_exp0001_v1
  • Spec: docs/projects/project_001/datasets/dataset_gutenberg_exp0001_v1.md

Runs

  • Control (C): D0 + D1 + D2 fully mixed from step 0.
  • Curriculum (K): D0 -> D0+D1 -> D1 -> D1+D2 -> D2.
  • Run logs:
    • EXP-0001-C-runlog.md
    • EXP-0001-K-runlog.md

Results Summary (1 page max)

(Write after runs complete.)

Interpretation

(Write after results summary.)

Next Actions

  • Build dataset slices for D0/D1/D2 and validate token budgets.
  • Execute control vs curriculum runs with fixed seeds.
run_idloss_bestplateautokens_seenprompt_set
No runs linked to this experiment.
Training
Runs + metrics.
Eval
Prompt snapshots.
Insights
Notes + conclusions.