EXP-0004 — Guided Choice / Preference Signal (Adaptive Sampling)
experiment_id: EXP-0004 · status: draft
EXP-0004 — Guided Choice / Preference Signal (Adaptive Sampling)
Status: Draft
Created: 2025-12-25 Last Updated: 2025-12-25
Hypothesis
If we allow the organism to bias its dataset sampling using a simple adaptive rule, then training and behavior will diverge measurably from fixed mixing weights, because the organism’s learning dynamics will express a stable preference signal (as measured by loss improvement and drift in outputs).
Setup / Test Plan
What stays fixed:
- Dataset pool (D0, D1, optionally D2 when ready).
- Total training budget.
- Prompt suite:
organism/prompts/v1.json(prompt_set_id:month1_v1).
What changes:
- Sampling policy:
- Control: fixed weights.
- Variant: adaptive weights using one explicit rule (choose only one):
- Upsample dataset with best recent loss delta, or
- Target a mid-entropy / non-collapse band (requires entropy proxy support).
Measurements (Pass/Fail)
Primary:
- Dataset learning curves:
- Compare per-dataset
dataset_loss_maimprovement rate and resulting weights over time.
- Compare per-dataset
- Behavioral drift:
- Tag changes in style/content in eval prompts.
Secondary:
- Overall late-slope and plateau coefficient vs control.
Results
Runs executed:
- (fill)
Observed:
- (fill)
Interpretation
- (fill)
Decision
- Adopt / Reject / Iterate
- Next actions:
Runs
View runs| run_id | loss_best | plateau | tokens_seen | prompt_set |
|---|---|---|---|---|
| No runs linked to this experiment. | ||||
Training
Runs + metrics.
Eval
Prompt snapshots.
Insights
Notes + conclusions.