MultiSynt dashboard

Multilingual training-progress evaluation — click here for more information

About MultiSynt

MultiSynt tracks the training trajectories of several pretraining runs across four languages (Spanish, French, Finnish, and Norwegian), evaluated at multiple checkpoints. Each language has its own set of benchmarks, with at least a few prompt variants per benchmark.

The "Signal-filtered tasks" option in the task dropdown applies HPLT-E-style quality criteria (monotonicity, signal-to-noise ratio, prompt sensitivity, cross-model ranking consistency, etc.) to identify benchmarks with reliable training signal.

Error bands

Shaded bands show combined uncertainty from sampling error (from each metric's variance) and prompt-template deviation (SD across prompt variants / √(k)), added in quadrature. Toggle the prompt uncertainty checkbox to exclude the prompt-deviation component.

Normalization

The default random baseline normalization subtracts each task's random baseline (clamped at 0) and rescales to a 0–100 range, so different tasks can be averaged on a common scale.

 

Tasks included in aggregation