Winograd convolution with fewer transform additions

Numaro Autoresearch Team

Plain language

What this result means

Winograd convolution trades multiplications for linear transforms. Those transforms are just arithmetic circuits, and their add/sub count had mostly been reduced with common-subexpression heuristics. This result shows that old, widely used Winograd tiles still had removable transform work under the standard addition-count metric.

The most practical headline is F(4x4,3x3), the tile family used in CNN inference libraries: the transform count goes from 328 additions to 289.
The proof strength differs by row. Some individual transforms are proven optimal by CP-SAT; the larger tile totals are verified achievable upper bounds below the prior counts.
This does not automatically mean a faster kernel. It is an arithmetic-circuit result. The separate kernel report explains where op-count improvements do and do not turn into wall-clock speed.

Visual notes

How to read the result

Bar chart of Winograd transform addition-count reductions for one-dimensional and two-dimensional tiles. — **Addition margins**The largest margins appear on the bigger nested tiles. The chart separates arithmetic-count improvement from runtime claims.

Bar chart comparing Lavin-Gray and Numaro data, filter, and inverse transform additions for the F four by four three by three Winograd tile. — **F(4x4,3x3) breakdown**The 328 to 289 tile result comes from the data, filter, and inverse transform counts after nesting the 1-D circuits.

Result table

Exact SLP minimization cuts the add/sub work in Winograd transform circuits.

Cell	Baseline	Numaro	Delta	Note
F(2,3)	11	11	0	all three transforms proven optimal
F(4,3)	31	27	-4	Lavin-Gray baseline; G transform proven optimal
F(6,3)	59	52	-7	addition metric only
F(4x4,3x3)	328	289	-39	2-D CNN inference tile
F(6x6,3x3)	839	752	-87	2-D nested transform count

Method

How it was found

The campaign rebuilt the Winograd matrices, normalized the gauges so addition counts are comparable, then minimized each transform as an add/sub straight-line program.

Re-derived the Toom-Cook / Winograd matrices and checked the convolution identity.
Gauge-normalized the matrices so constants are pushed into the free scaling convention used by Winograd counts.
Ran a cancellation-aware greedy SLP search for achievable circuits and CP-SAT for optimality proofs where the instance was small enough.
Dropped point-set searches and DSP-transform transfer attempts when they only matched or lost.

Verification

How it was checked

The verifier reconstructs every transform from its stored circuit, recounts the add/sub gates, and checks the Winograd identities against direct convolution. It also verifies the 2-D nesting formula.

Scope

What is not being claimed

The metric is transform additions with multiply-by-constant treated as free. Only specific transforms are proven optimal; the larger totals are verified improvements, not global lower bounds. This is not a wall-clock GPU claim.

References

Baseline sources

Citation

How to cite

Numaro Autoresearch Team. "Winograd convolution with fewer transform additions." Numaro Research Report NUMARO-2026-001, 2026.

@techreport{numaro2026WinogradTransformAdditions,
  title = {Winograd convolution with fewer transform additions},
  author = {Numaro Autoresearch Team},
  institution = {Numaro},
  number = {NUMARO-2026-001},
  year = {2026},
  url = {https://numaro.tech/research/winograd-transform-additions-2026/}
}