New best-known additive complexities for fast matrix-multiplication schemes

Numaro Autoresearch Team

Plain language

What this result means

Fast matrix multiplication is not only about the number of multiplications. A rank-r scheme still has to form input sums and output combinations around those multiplications. This result lowers that surrounding add/sub work for specific Brent-valid schemes without changing the rank.

The largest representative margin is <2,5,9>: 264 additions down to 254 at rank 73.
Most of the gain comes from input-side matrices that are perfectly peelable: the exact lower bound equals the constructed circuit count.
Several candidate wins were dropped after rerunning Perminov's own reducer at higher effort. Only strict beats over that stronger baseline are kept.

Visual notes

How to read the result

Horizontal bar chart of additive-complexity margins for fast matrix multiplication schemes. — **Record margins**The wins are small but repeated: each row is a strict add/sub reduction at the same rank after rerunning strong reducer baselines.

Donut chart showing the U, V, and W transpose side costs for the largest-margin matrix multiplication scheme. — **Side costs**For the largest-margin scheme, the total separates into U, V, and W-transpose programs. The input side is where exact peeling opens the gap.

Result table

Thirty-one Brent-certified schemes use fewer additions at the same rank.

Cell	Baseline	Numaro	Delta	Note
<2,5,9>	264	254	-10	largest representative margin
<2,8,8>	424	418	-6	large peelable input side
<2,7,7>	320	314	-6	peeling plus reducer hybrid
<2,4,9>	207	202	-5	Tellegen output map
<4,4,4>	159	157	-2	rank 49

Method

How it was found

For each fixed scheme, the total addition count separates into the costs of U, V, and W-transpose. The campaign minimized those sides independently and kept the best reconstructable circuit per side.

Reproduced the FMM/Perminov and plinopt baselines before claiming any improvement.
Used exact peeling SLPs where F(M) equals the number of distinct non-trivial rows up to sign.
Applied a Tellegen output map when computing W first and transposing the program was cheaper.
Extracted and reused Perminov's own best per-side reductions for dense sides, instead of forcing our method everywhere.

Verification

How it was checked

verify.py checks the Brent tensor identity over the integers, runs randomized matrix products, reconstructs every side from the stored SLP, recounts gates, and asserts the new total is below the old total.

Scope

What is not being claimed

These are best-known additions for specific schemes at fixed rank. They are not proofs of globally minimal additive complexity for each matrix format, and XOR-only GF(2) counts are a different metric.

References

Baseline sources

Citation

How to cite

Numaro Autoresearch Team. "New best-known additive complexities for fast matrix-multiplication schemes." Numaro Research Report NUMARO-2026-002, 2026.

@techreport{numaro2026MatrixMultiplicationAdditive,
  title = {New best-known additive complexities for fast matrix-multiplication schemes},
  author = {Numaro Autoresearch Team},
  institution = {Numaro},
  number = {NUMARO-2026-002},
  year = {2026},
  url = {https://numaro.tech/research/matrix-multiplication-additive-complexity-2026/}
}