Plain language
What this result means
Winograd convolution trades multiplications for linear transforms. Those transforms are just arithmetic circuits, and their add/sub count had mostly been reduced with common-subexpression heuristics. This result shows that old, widely used Winograd tiles still had removable transform work under the standard addition-count metric.
- The most practical headline is F(4x4,3x3), the tile family used in CNN inference libraries: the transform count goes from 328 additions to 289.
- The proof strength differs by row. Some individual transforms are proven optimal by CP-SAT; the larger tile totals are verified achievable upper bounds below the prior counts.
- This does not automatically mean a faster kernel. It is an arithmetic-circuit result. The separate kernel report explains where op-count improvements do and do not turn into wall-clock speed.
Visual notes
How to read the result
Result table
Exact SLP minimization cuts the add/sub work in Winograd transform circuits.
Method
How it was found
The campaign rebuilt the Winograd matrices, normalized the gauges so addition counts are comparable, then minimized each transform as an add/sub straight-line program.
- Re-derived the Toom-Cook / Winograd matrices and checked the convolution identity.
- Gauge-normalized the matrices so constants are pushed into the free scaling convention used by Winograd counts.
- Ran a cancellation-aware greedy SLP search for achievable circuits and CP-SAT for optimality proofs where the instance was small enough.
- Dropped point-set searches and DSP-transform transfer attempts when they only matched or lost.
Verification
How it was checked
The verifier reconstructs every transform from its stored circuit, recounts the add/sub gates, and checks the Winograd identities against direct convolution. It also verifies the 2-D nesting formula.
Scope
What is not being claimed
The metric is transform additions with multiply-by-constant treated as free. Only specific transforms are proven optimal; the larger totals are verified improvements, not global lower bounds. This is not a wall-clock GPU claim.
References
Baseline sources
Citation
How to cite
Numaro Autoresearch Team. "Winograd convolution with fewer transform additions." Numaro Research Report NUMARO-2026-001, 2026.
@techreport{numaro2026WinogradTransformAdditions,
title = {Winograd convolution with fewer transform additions},
author = {Numaro Autoresearch Team},
institution = {Numaro},
number = {NUMARO-2026-001},
year = {2026},
url = {https://numaro.tech/research/winograd-transform-additions-2026/}
}