Plain language

What this result means

Winograd convolution trades multiplications for linear transforms. Those transforms are just arithmetic circuits, and their add/sub count had mostly been reduced with common-subexpression heuristics. This result shows that old, widely used Winograd tiles still had removable transform work under the standard addition-count metric.

  • The most practical headline is F(4x4,3x3), the tile family used in CNN inference libraries: the transform count goes from 328 additions to 289.
  • The proof strength differs by row. Some individual transforms are proven optimal by CP-SAT; the larger tile totals are verified achievable upper bounds below the prior counts.
  • This does not automatically mean a faster kernel. It is an arithmetic-circuit result. The separate kernel report explains where op-count improvements do and do not turn into wall-clock speed.

Visual notes

How to read the result

Bar chart of Winograd transform addition-count reductions for one-dimensional and two-dimensional tiles.
Addition marginsThe largest margins appear on the bigger nested tiles. The chart separates arithmetic-count improvement from runtime claims.
Bar chart comparing Lavin-Gray and Numaro data, filter, and inverse transform additions for the F four by four three by three Winograd tile.
F(4x4,3x3) breakdownThe 328 to 289 tile result comes from the data, filter, and inverse transform counts after nesting the 1-D circuits.

Result table

Exact SLP minimization cuts the add/sub work in Winograd transform circuits.

CellBaselineNumaroDeltaNote
F(2,3)11110all three transforms proven optimal
F(4,3)3127-4Lavin-Gray baseline; G transform proven optimal
F(6,3)5952-7addition metric only
F(4x4,3x3)328289-392-D CNN inference tile
F(6x6,3x3)839752-872-D nested transform count

Method

How it was found

The campaign rebuilt the Winograd matrices, normalized the gauges so addition counts are comparable, then minimized each transform as an add/sub straight-line program.

  • Re-derived the Toom-Cook / Winograd matrices and checked the convolution identity.
  • Gauge-normalized the matrices so constants are pushed into the free scaling convention used by Winograd counts.
  • Ran a cancellation-aware greedy SLP search for achievable circuits and CP-SAT for optimality proofs where the instance was small enough.
  • Dropped point-set searches and DSP-transform transfer attempts when they only matched or lost.

Verification

How it was checked

The verifier reconstructs every transform from its stored circuit, recounts the add/sub gates, and checks the Winograd identities against direct convolution. It also verifies the 2-D nesting formula.

Scope

What is not being claimed

The metric is transform additions with multiply-by-constant treated as free. Only specific transforms are proven optimal; the larger totals are verified improvements, not global lower bounds. This is not a wall-clock GPU claim.

References

Baseline sources

Citation

How to cite

Numaro Autoresearch Team. "Winograd convolution with fewer transform additions." Numaro Research Report NUMARO-2026-001, 2026.

@techreport{numaro2026WinogradTransformAdditions,
  title = {Winograd convolution with fewer transform additions},
  author = {Numaro Autoresearch Team},
  institution = {Numaro},
  number = {NUMARO-2026-001},
  year = {2026},
  url = {https://numaro.tech/research/winograd-transform-additions-2026/}
}