Methods
Framing, evaluation philosophy, and scope boundaries.
Methods
Training philosophy
- Curriculum-first design with explicit control conditions.
- Iteration focuses on measurable deltas, not subjective impressions.
- Experiments are scoped, incremental, and repeatable.
What was intentionally excluded
- No product demo fine-tuning.
- No synthetic evaluation for vanity metrics.
- No prompt-only "success" claims without supporting runs.
Known limitations
- Small-scale runs can mask long-horizon behaviors.
- Loss curves alone are insufficient to evaluate capability shifts.
- Data sourcing constraints bias early results.
Evaluation philosophy
Why loss is insufficient
- Loss can improve while behaviors regress or narrow.
- Curriculum shifts can cause divergence without visible loss spikes.
- Behavioral evaluation is required for reproducibility.
Behavioral analysis approach
- Compare control vs. curriculum runs on identical eval suites.
- Track preference drift and instability markers.
- Publish failure cases alongside successful runs.
Ethics & scope
- This project does not claim sentience or AGI.
- Safety boundaries and misuse vectors are documented explicitly.
- Research is scoped to understanding training dynamics, not deploying agents.
Glossary
- Curriculum: Ordered training data designed to shape learning stages.
- Bias: Statistical skew in data (not claims about intent).
- Preference signal: Any input that shifts a model toward desired behaviors.
- Novelty pressure: The push to generalize beyond the training distribution.