Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder
Published in IEEE SDS 2026 (Full Paper, Zürich) & ICLR 2026 Workshop on Data-FM (Rio de Janeiro, Brazil), 2026
Shows that dataset scaling laws hold even at small scales, where training on only 30% of data achieves about 90% of full performance in a tiny attention-only decoder.
Recommended citation: Wiegand, G.-H., Raichle, L., Staedeli, R., Handschuh, S., Hrycej, T., & Bermeitinger, B. (2026). "Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder." IEEE International Conference on Data Science (SDS 2026), Zürich, Switzerland.
