NorOLMo is a fully-open Norwegian language model continually-trained on OLMo2 (13B stage 2) by the Language Technology Group at the University of Oslo. This view tracks NorOLMo's performance across 33 training checkpoints (steps 1k–33k) plus several late-stage ablation runs, evaluated on the NorEval benchmark (Mikhailov et al., 2025).
The dashboard includes several ablations of NorOLMo's late training stages:
Ablation lines start at different training steps, reflecting where each experiment diverges from the main run.
Shaded bands show combined uncertainty from two independent sources, added in quadrature:
Aggregate views normalize scores before averaging. The default random baseline normalization maps each score to:
normalized = (raw − random_baseline) / (max_performance − random_baseline) × 100
where 0 = random chance and 100 = perfect performance.