# Numaro full LLM context


Numaro is an AI autoresearch engine. The system uses multi-agent research, hypothesis generation, tool-making agents, automatic experiments, scalable compute, and shared memory to run technical research campaigns.


Primary site: https://numaro.tech/

Contact: contact@numaro.tech


This file expands the public research index into a machine-readable plain-text summary. Use the canonical report URLs for citation and the HTML pages for tables, figures, metadata, and references.


## Winograd convolution with fewer transform additions

URL: https://numaro.tech/research/winograd-transform-additions-2026/
Report: NUMARO-2026-001
Date: 2026-06-29
Authors: Numaro Autoresearch Team
Domain: Fast convolution
Hero metric: 289 adds for F(4x4,3x3)

Plain-language summary: Winograd convolution trades multiplications for linear transforms. Those transforms are just arithmetic circuits, and their add/sub count had mostly been reduced with common-subexpression heuristics. This result shows that old, widely used Winograd tiles still had removable transform work under the standard addition-count metric.

Key details:
- The most practical headline is F(4x4,3x3), the tile family used in CNN inference libraries: the transform count goes from 328 additions to 289.
- The proof strength differs by row. Some individual transforms are proven optimal by CP-SAT; the larger tile totals are verified achievable upper bounds below the prior counts.
- This does not automatically mean a faster kernel. It is an arithmetic-circuit result. The separate kernel report explains where op-count improvements do and do not turn into wall-clock speed.

Hard part: The matrices are small, but the search is over programs, not entries. For each transform, the system has to find a shorter sequence of add/sub gates that reconstructs the exact same matrix. A one-gate improvement can be real only if the entire circuit still reproduces the convolution identity.

Technical claim: A Winograd convolution computes fewer multiplications by paying for three linear transforms. We reduced the number of add/sub gates in those transforms using exact and cancellation-aware straight-line-program minimization. The 2-D F(4x4,3x3) tile drops from the Lavin-Gray count of 328 transform additions to 289, and several 1-D transforms get the first optimality proofs.

Result rows:
- F(2,3): baseline 11 (https://arxiv.org/abs/1509.09308); Numaro 11; delta 0; note: all three transforms proven optimal
- F(4,3): baseline 31 (https://arxiv.org/abs/1509.09308); Numaro 27; delta -4; note: Lavin-Gray baseline; G transform proven optimal
- F(6,3): baseline 59 (https://arxiv.org/abs/1509.09308); Numaro 52; delta -7; note: addition metric only
- F(4x4,3x3): baseline 328 (https://arxiv.org/abs/1509.09308); Numaro 289; delta -39; note: 2-D CNN inference tile
- F(6x6,3x3): baseline 839 (https://arxiv.org/abs/1509.09308); Numaro 752; delta -87; note: 2-D nested transform count

Visual notes:
- Addition margins: The largest margins appear on the bigger nested tiles. The chart separates arithmetic-count improvement from runtime claims. (https://numaro.tech/assets/research/winograd/addition-margins.svg)
- F(4x4,3x3) breakdown: The 328 to 289 tile result comes from the data, filter, and inverse transform counts after nesting the 1-D circuits. (https://numaro.tech/assets/research/winograd/tile-breakdown.svg)

Method: The campaign rebuilt the Winograd matrices, normalized the gauges so addition counts are comparable, then minimized each transform as an add/sub straight-line program.

Method steps:
- Re-derived the Toom-Cook / Winograd matrices and checked the convolution identity.
- Gauge-normalized the matrices so constants are pushed into the free scaling convention used by Winograd counts.
- Ran a cancellation-aware greedy SLP search for achievable circuits and CP-SAT for optimality proofs where the instance was small enough.
- Dropped point-set searches and DSP-transform transfer attempts when they only matched or lost.

Verification: The verifier reconstructs every transform from its stored circuit, recounts the add/sub gates, and checks the Winograd identities against direct convolution. It also verifies the 2-D nesting formula.

Scope: The metric is transform additions with multiply-by-constant treated as free. Only specific transforms are proven optimal; the larger totals are verified improvements, not global lower bounds. This is not a wall-clock GPU claim.

Baseline sources:
- Lavin and Gray, Fast Algorithms for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)
- Winograd, Arithmetic Complexity of Computations. (https://doi.org/10.1137/1.9781611970364)
- Boyar, Matthews, and Peralta, Logic Minimization Techniques with Applications to Cryptology. (https://doi.org/10.1007/s00145-012-9124-7)

## New best-known additive complexities for fast matrix-multiplication schemes

URL: https://numaro.tech/research/matrix-multiplication-additive-complexity-2026/
Report: NUMARO-2026-002
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Algebraic complexity
Hero metric: 31 additive-complexity records

Plain-language summary: Fast matrix multiplication is not only about the number of multiplications. A rank-r scheme still has to form input sums and output combinations around those multiplications. This result lowers that surrounding add/sub work for specific Brent-valid schemes without changing the rank.

Key details:
- The largest representative margin is <2,5,9>: 264 additions down to 254 at rank 73.
- Most of the gain comes from input-side matrices that are perfectly peelable: the exact lower bound equals the constructed circuit count.
- Several candidate wins were dropped after rerunning Perminov's own reducer at higher effort. Only strict beats over that stronger baseline are kept.

Hard part: A fast-multiplication scheme is a tensor identity, so a smaller circuit cannot change the scheme. The checker rebuilds U, V, and W-transpose from the supplied straight-line programs and then rechecks the Brent identity over the integers. Margins are thin because the field's reducers are already strong.

Technical claim: Fast matrix-multiplication schemes reduce multiplication count, but they also need additions to form the linear combinations around those multiplications. We found 31 ternary, Brent-valid schemes where the total add/sub count is strictly lower than the strongest prior achievable count at the same rank. Each record includes explicit straight-line programs and a standalone verifier.

Result rows:
- <2,5,9>: baseline 264 (https://github.com/dronperminov/FastMatrixMultiplication); Numaro 254; delta -10; note: largest representative margin
- <2,8,8>: baseline 424 (https://github.com/dronperminov/FastMatrixMultiplication); Numaro 418; delta -6; note: large peelable input side
- <2,7,7>: baseline 320 (https://github.com/dronperminov/FastMatrixMultiplication); Numaro 314; delta -6; note: peeling plus reducer hybrid
- <2,4,9>: baseline 207 (https://github.com/dronperminov/FastMatrixMultiplication); Numaro 202; delta -5; note: Tellegen output map
- <4,4,4>: baseline 159 (https://github.com/dronperminov/FastMatrixMultiplication); Numaro 157; delta -2; note: rank 49

Visual notes:
- Record margins: The wins are small but repeated: each row is a strict add/sub reduction at the same rank after rerunning strong reducer baselines. (https://numaro.tech/assets/research/matrix-additive/record-margins.svg)
- Side costs: For the largest-margin scheme, the total separates into U, V, and W-transpose programs. The input side is where exact peeling opens the gap. (https://numaro.tech/assets/research/matrix-additive/side-breakdown.svg)

Method: For each fixed scheme, the total addition count separates into the costs of U, V, and W-transpose. The campaign minimized those sides independently and kept the best reconstructable circuit per side.

Method steps:
- Reproduced the FMM/Perminov and plinopt baselines before claiming any improvement.
- Used exact peeling SLPs where F(M) equals the number of distinct non-trivial rows up to sign.
- Applied a Tellegen output map when computing W first and transposing the program was cheaper.
- Extracted and reused Perminov's own best per-side reductions for dense sides, instead of forcing our method everywhere.

Verification: verify.py checks the Brent tensor identity over the integers, runs randomized matrix products, reconstructs every side from the stored SLP, recounts gates, and asserts the new total is below the old total.

Scope: These are best-known additions for specific schemes at fixed rank. They are not proofs of globally minimal additive complexity for each matrix format, and XOR-only GF(2) counts are a different metric.

Baseline sources:
- Perminov / FastMatrixMultiplication project. (https://github.com/dronperminov/FastMatrixMultiplication)
- Dumas, Pernet, Sedoglavic and collaborators, plinopt. (https://arxiv.org/abs/1712.05656)
- Martensson, Stankovski, and Wagner, additive complexity of matrix multiplication schemes. (https://eprint.iacr.org/2026/849)

## Bit-exact GPU kernels in regimes vendor libraries leave open

URL: https://numaro.tech/research/faster-ml-kernels-2026/
Report: NUMARO-2026-003
Date: 2026-06-30
Authors: Numaro Autoresearch Team
Domain: GPU kernels
Hero metric: 2-4.5x small batched matmul

Plain-language summary: This is the most directly practical result in the ledger. It connects research output to actual runtime: when a model is bottlenecked by small batched matrices, fine-grained MoE, depthwise convolution, or skinny int8 decode, a kernel that fills the GPU or removes a memory pass can matter immediately.

Key details:
- The same report also records losses. Large square GEMM remains cuBLAS territory, and a generic W4A16 kernel is much slower than tinygemm.
- The kernel speedups are measured as ratios under paired, interleaved timing, because the GPU was shared and absolute timings were noisy.
- The op-count records and the kernel records are separate. Fewer arithmetic operations did not automatically make a faster GPU kernel.

Hard part: Vendor libraries are strongest in their core regimes, so a broad 'faster GEMM' claim would be wrong. The useful work is finding narrow shape regimes where the production path leaves occupancy, launch overhead, or memory traffic on the table. Each speed number is paired with a correctness check first.

Technical claim: On a single NVIDIA L40S, we built Triton kernels that match the stock PyTorch/cuBLAS/cuDNN operators exactly or within the stated tolerance, then beat them in specific under-served regimes. The wins are not from lower arithmetic count; they come from occupancy, fusion, and structural gaps in the library path.

Result rows:
- Batched matmul: baseline torch.bmm (https://pytorch.org/docs/stable/generated/torch.bmm.html); Numaro 2-4.5x; delta faster; note: small/medium matrices, batch >= 512
- Fused depthwise 3x3: baseline cuDNN + activation (https://docs.nvidia.com/deeplearning/cudnn/); Numaro 1.55x fp16 / 1.31x fp32; delta faster; note: bit-exact
- int8 W8A8: baseline torch._int_mm (https://pytorch.org/docs/stable/torch.html); Numaro up to 3.7x; delta faster; note: skinny-M decode
- MoE grouped GEMM: baseline per-expert loop (https://github.com/pytorch/pytorch); Numaro 2.5-3.6x end-to-end; delta faster; note: fine-grained MoE
- W4A16: baseline tinygemm (https://pytorch.org/docs/stable/torch.html); Numaro ~12x slower; delta loss; note: kept as a boundary condition

Visual notes:
- Regime map: The big numbers live in specific gaps. The chart also includes the W4A16 loss to show where the method does not apply. (https://numaro.tech/assets/research/faster-kernels/speedup-regimes.svg)
- Amdahl check: End-to-end speedup tracks how much of the layer the swapped operator occupies. Fine-grained MoE benefits most because the grouped GEMM is almost the whole layer. (https://numaro.tech/assets/research/faster-kernels/amdahl-fractions.svg)

Method: Each kernel targets a specific production gap: not enough occupancy, an avoidable memory pass, a missing primitive, or a launch-heavy loop.

Method steps:
- Mapped where cuBLAS/cuDNN/PyTorch under-filled the GPU or forced extra memory traffic.
- Wrote narrow Triton kernels for those exact regimes instead of trying to beat vendor libraries everywhere.
- Timed stock and custom kernels back-to-back under a GPU timing lock.
- Kept losses in the report so the boundary of the method is visible.

Verification: Each subdirectory has a verifier that checks correctness before speed. Some kernels are bit-identical; others are compared against a higher-precision or stock reference with an explicit tolerance where reduction order differs.

Scope: All numbers are on one L40S under the stated environment. They are not claims for every GPU, every shape, or cross-library SOTA. Large square GEMM and 4-bit weight-only decode are explicitly not beaten.

Baseline sources:
- NVIDIA cuBLAS documentation. (https://docs.nvidia.com/cuda/cublas/)
- NVIDIA cuDNN documentation. (https://docs.nvidia.com/deeplearning/cudnn/)
- Triton language documentation. (https://triton-lang.org/)
- FlashAttention repository. (https://github.com/Dao-AILab/flash-attention)

## Circle packing in the unit square: new sum-of-radii layouts

URL: https://numaro.tech/research/circle-packing-unit-square-2026/
Report: NUMARO-2026-004
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Geometry optimization
Hero metric: 9 unit-square records beaten

Plain-language summary: Circle packing is easy to check and hard to find. Circles either fit or they do not, but moving one circle changes the feasible radii of many others. Record layouts live in narrow local basins, and some margins are only around 1e-4, so a candidate has to survive exact overlap and wall checks rather than just look better.

Key details:
- For n=26, the result clears AlphaEvolve and ShinkaEvolve, and is effectively tied with ThetaEvolve at the same optimum-level value while using a stricter feasibility tolerance.
- For n=33-40, the public table had recent automated-sweep values that heavier search could still improve; the hard part is escaping layout topologies that look stable but are not record-level.
- For n=41 and n=42, the table had no entry, so the results are frontier extensions rather than beats.

Hard part: The verifier is binary, but the search space is not a list ; it is a continuous landscape of circle centers plus a changing contact graph. For n=36, that is already 72 center coordinates before the radii are recomputed. The useful analogy is closer to finding one valid atom in a universe-sized cloud of arrangements : most near-record layouts fail because one tiny overlap, wall violation, or wrong contact pattern destroys the value once the radii are solved exactly.

Technical claim: We pack n non-overlapping circles in the unit square to maximize the sum of radii. The campaign improves the public n=26 AlphaEvolve benchmark and the n=33-40 values on Erich Friedman's table, and gives first-known values for n=41 and n=42. These are best-known values, not optimality proofs.

Result rows:
- n=26: baseline 2.63586276 (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 2.6359830853; delta +0.000120; note: AlphaEvolve benchmark
- n=33: baseline 2.9789 (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 2.9872850086; delta +0.008385; note: Viquerat June 2026
- n=36: baseline 3.11023 (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 3.1210039084; delta +0.010774; note: largest margin in the set
- n=40: baseline 3.28632 (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 3.2923915726; delta +0.006072; note: Viquerat June 2026
- n=41: baseline none listed (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 3.3346864337; delta first-known; note: frontier extension
- n=42: baseline none listed (https://erich-friedman.github.io/packing/cirRsqu/); Numaro 3.3769116039; delta first-known; note: frontier extension

Visual notes:
- The n=36 layout: The actual coordinates for the largest-margin record in the set. Thin internal lines mark near-contact pairs; the square boundary is the constraint. (https://numaro.tech/assets/research/circle-packing/layout-n36.svg?v=2)
- Record margins: The margin chart keeps the scale honest: n=36 is the clear outlier, while n=34 is a small but strict table beat. (https://numaro.tech/assets/research/circle-packing/record-margins.svg?v=2)
- Frontier shape: Solid points are the new layouts. The dashed line is the prior public table where it exists; the open points are first-known n greater than 40. (https://numaro.tech/assets/research/circle-packing/frontier.svg?v=2)

Method: The system searches circle centers, then uses a linear-program referee to assign the best feasible radii for those centers.

Method steps:
- Reproduced the public baselines and fixed them as dated targets.
- Ran GPU multistart, island genetic search, adaptive basin hopping, and analytic-Jacobian SLSQP.
- Used cross-n seeding: a good n layout seeds n+1 by adding a circle, and n-1 by removing one.
- Rejected higher-looking values when the LP or overlap checker showed they were tolerance artifacts.

Verification: Each layout is checked from its raw coordinates. Every pair of circles must be far enough apart, every circle must stay inside the square, and the final score is the sum of all radii. The reported beats are then compared against the dated public baseline values.

Scope: The records are best-known feasible packings, not proofs of optimality. Values are pinned to the dated public baselines named in the report.

Baseline sources:
- Erich Friedman, Circles in Squares / Packing Copies to Maximize Total Perimeter. (https://erich-friedman.github.io/packing/cirRsqu/)
- DeepMind AlphaEvolve. (https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/)
- Sakana AI comparison systems. (https://sakana.ai/)

## Maximizing total radius in triangle and pentagon containers

URL: https://numaro.tech/research/friedman-packing-center-2026/
Report: NUMARO-2026-005
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Geometry optimization
Hero metric: 17 verified entries

Plain-language summary: This is the same test as circle packing in the unit square, but with harder walls. The square is replaced by a triangle or a pentagon, so the contact patterns change and the system has to rebuild the search around new constraints instead of reusing one fixed geometry.

Key details:
- The three clear beats are pentagon n=15, triangle n=27, and triangle n=30.
- Friedman's page prints values with a trailing plus, so hidden digits may exist. A beat only counts when the new layout clears the whole hidden-digit window.
- Pentagon n=21-30 and triangle n=31-34 had no listed value, so those rows are frontier extensions rather than beats.

Hard part: As in the square-packing page, the search is a continuous layout problem , not a finite checklist. Here there is a second trap: the published baselines are rounded down and marked with '+', so a tiny numerical improvement may still be inside the hidden digits . A record only counts when it clears the whole printed window .

Technical claim: We improve three entries in Erich Friedman's Packing Center for unequal circles in a triangle or pentagon, and report first-known values for higher-n pentagon and triangle instances. A beat is claimed only when the new value clears the whole display-truncation window.

Result rows:
- cirRpen n=15: baseline 2.61251 (https://erich-friedman.github.io/packing/cirRpen/); Numaro 2.612714; delta +0.00020; note: Viquerat June 2026
- cirRtri n=27: baseline 1.76300 (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.764217; delta +0.00122; note: Cantrell July 2011
- cirRtri n=30: baseline 1.86315 (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.865806; delta +0.00266; note: Viquerat June 2026
- cirRpen n=21: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.088531; delta first-known; note: frontier extension
- cirRpen n=22: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.165663; delta first-known; note: frontier extension
- cirRpen n=23: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.247349; delta first-known; note: frontier extension
- cirRpen n=24: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.327581; delta first-known; note: frontier extension
- cirRpen n=25: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.391645; delta first-known; note: frontier extension
- cirRpen n=26: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.476635; delta first-known; note: frontier extension
- cirRpen n=27: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.539869; delta first-known; note: frontier extension
- cirRpen n=28: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.610265; delta first-known; note: frontier extension
- cirRpen n=29: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.681347; delta first-known; note: frontier extension
- cirRpen n=30: baseline none listed (https://erich-friedman.github.io/packing/cirRpen/); Numaro 3.745880; delta first-known; note: frontier extension
- cirRtri n=31: baseline none listed (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.894825; delta first-known; note: frontier extension
- cirRtri n=32: baseline none listed (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.923736; delta first-known; note: frontier extension
- cirRtri n=33: baseline none listed (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.956110; delta first-known; note: frontier extension
- cirRtri n=34: baseline none listed (https://erich-friedman.github.io/packing/cirRtri/); Numaro 1.988375; delta first-known; note: frontier extension

Visual notes:
- Triangle n=30 layout: The highest-margin triangle beat in the report. The container changes the search: wall contacts are now angled edge constraints, not square boundaries. (https://numaro.tech/assets/research/friedman-packing/triangle-n30.svg?v=1)
- Pentagon n=15 layout: The pentagon record uses the same raw-coordinate check as the square-packing page, but with five edge half-planes instead of four square walls. (https://numaro.tech/assets/research/friedman-packing/pentagon-n15.svg?v=1)
- The hidden-digit gate: Friedman prints a value with '+', so the visible number is only a floor. The chart shows each claimed beat clearing the whole hidden-digit window. (https://numaro.tech/assets/research/friedman-packing/clear-beat-gate.svg?v=1)

Method: The system searches circle centers inside each container, then recomputes the best radii for those centers. It keeps only layouts whose score survives the final geometry check and the printed-baseline gate.

Method steps:
- Reused the square-packing search loop, but replaced square walls with triangle and pentagon edge constraints.
- Tried many center layouts using multistart search and topology perturbations.
- Recomputed the best feasible radii for promising center layouts instead of trusting the optimizer's radii.
- Counted a beat only when the final value cleared Friedman's full printed window.

Verification: Each layout is checked from raw coordinates. Every circle must stay inside the triangle or pentagon, every pair of circles must stay disjoint, and the final score is the sum of all radii. For the three record claims, the score also has to clear the hidden digits allowed by Friedman's printed plus sign.

Scope: Only the three clear beats are record claims. Values that merely match within Friedman's display window are not counted, and the frontier rows are first-known feasible layouts rather than optimality proofs.

Baseline sources:
- Erich Friedman, Packing Copies to Maximize Total Perimeter. (https://erich-friedman.github.io/packing/)
- Friedman, Circles in Triangles. (https://erich-friedman.github.io/packing/cirRtri/)
- Friedman, Circles in Pentagons. (https://erich-friedman.github.io/packing/cirRpen/)

## Covering-design records improving the La Jolla Covering Repository

URL: https://numaro.tech/research/covering-designs-la-jolla-2026/
Report: NUMARO-2026-006
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Extremal combinatorics
Hero metric: 6,440 blocks saved

Plain-language summary: This result matters because the system treated the repository as a dependency graph, not just a list of numbers. A better source covering can imply a better derived covering. The record only counts after the derived object is actually built and checked.

Key details:
- A (v,k,t)-covering is a set of k-point blocks. Every t-point subset must appear inside at least one block; fewer blocks is better.
- The four records come from the standard induction relation C(v,k,t) <= C(v-1,k-1,t-1) + C(v-1,k,t).
- This is not a claim that the source coverings are new. The win is detecting the unpropagated derived cells, materializing the witnesses, and checking them.

Hard part: The verifier asks a binary question, but the checklist is enormous . C(92,7,4) means 131,122 explicit seven-point blocks and 2,794,155 four-subsets to cover, which is still small enough to enumerate. For the three t=7/8 rows, the checklist jumps to 2.9 x 10^9 to 2.1 x 10^10 subsets , so the proof has to move through the R4 induction theorem and the two LJCR source coverings. The trap is simple: a table inequality is not yet a witness .

Technical claim: A covering design is a compression problem: cover every t-subset with as few k-blocks as possible. We lower four best-known values in the La Jolla Covering Repository, saving 6,440 blocks total. The headline C(92,7,4) row is checked by enumerating all 2,794,155 four-subsets; the larger t=7/8 rows are certified by the standard induction theorem applied to LJCR source coverings.

Result rows:
- C(77,25,8): baseline 179,223 (https://ljcr.dmgordon.org/show_cover.php?v=77&k=25&t=8); Numaro 175,063; delta -4,160; note: R4 theorem
- C(92,7,4): baseline 132,580 (https://ljcr.dmgordon.org/show_cover.php?v=92&k=7&t=4); Numaro 131,122; delta -1,458; note: exhaustive: 2,794,155 subsets
- C(86,23,7): baseline 153,683 (https://ljcr.dmgordon.org/show_cover.php?v=86&k=23&t=7); Numaro 153,047; delta -636; note: R4 theorem
- C(79,21,7): baseline 165,213 (https://ljcr.dmgordon.org/show_cover.php?v=79&k=21&t=7); Numaro 165,027; delta -186; note: R4 theorem

Visual notes:
- Blocks saved: The largest row saves 4,160 blocks; all four rows are strictly below the listed La Jolla value. (https://numaro.tech/assets/research/covering-designs/record-savings.svg)
- The induction witness: Each derived covering is built from two source coverings through the standard R4 construction. (https://numaro.tech/assets/research/covering-designs/r4-source-split.svg)
- Checklist scale: The t=4 row can be enumerated directly. The t=7 and t=8 rows are too large for brute-force enumeration here, so the proof goes through the theorem and the source coverings. (https://numaro.tech/assets/research/covering-designs/coverage-scale.svg)

Method: The campaign parsed the LJCR table, propagated standard covering relations to a fixpoint, and kept the four cells where the current table lagged the induction construction.

Method steps:
- Fetched and parsed 9,482 LJCR entries.
- Applied delete-point, enlarge-block, reduce-t, and R4 induction relations.
- Built each derived covering from the two linked LJCR source coverings.
- Checked nearby mature cells with CP-SAT and simulated annealing; those attempts matched records but did not beat them.

Verification: Each candidate has to have the claimed number of blocks, the right block size, and points in the correct range. For C(92,7,4), every one of the 2,794,155 four-point subsets is checked directly. For the larger rows, the check rebuilds the two source counts and uses the standard induction theorem: subsets containing the new point are covered by one source; subsets not containing it are covered by the other.

Scope: These are best-known improvements, not optimality proofs. LJCR is a strong mature repository; the point here is a narrow propagation gap, not that the table is broadly weak. The underlying source coverings are not claimed as new.

Baseline sources:
- D. Gordon, La Jolla Covering Repository. (https://ljcr.dmgordon.org/)
- La Jolla individual covering lookup. (https://ljcr.dmgordon.org/show_cover.php)

## Covering-array records improving Colbourn's best-known CAN tables

URL: https://numaro.tech/research/covering-arrays-colbourn-2026/
Report: NUMARO-2026-007
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Combinatorial testing
Hero metric: 40 array table improvements

Plain-language summary: Covering arrays are compact test suites: every t-way interaction still appears, but fewer rows means fewer tests. This result matters because it separates a promised size from a built object. The array has to exist before it can be a record.

Key details:
- CA(N;t,k,v) means N rows, k columns, and v symbols. For every choice of t columns, all v^t tuples must appear somewhere.
- The largest margins come from ordered-design constructions; most of the individual cells come from Torres-Jimenez binary arrays.
- Some attractive size predictions were rejected because they did not build at the claimed size. The kept rows are actual arrays.

Hard part: A smaller size prediction is not a record until it is an array. Each candidate had to be materialized as an N x k table, then checked against every C(k,t) column set and every v^t tuple. One missing tuple turns the beat into a non-result .

Technical claim: A covering array CA(N;t,k,v) is an N-row table that covers every t-column symbol tuple. We found 40 explicit arrays with fewer rows than the listed Colbourn CAN values, saving 1,042 rows total. The largest single improvement is CA(14928;3,8,24), down from 15,180 to 14,928.

Result rows:
- CA(14928;3,8,24): baseline 15,180 (https://www.data2intelligence.de/ColbournTables/t3v24.html); Numaro 14,928; delta -252 rows; note: ordered design
- CA(8760;3,8,20): baseline 8,930 (https://www.data2intelligence.de/ColbournTables/t3v20.html); Numaro 8,760; delta -170 rows; note: ordered design
- CA(9330;3,12,20): baseline 9,500 (https://www.data2intelligence.de/ColbournTables/t3v20.html); Numaro 9,330; delta -170 rows; note: ordered design
- CA(6444;3,8,18): baseline 6,579 (https://www.data2intelligence.de/ColbournTables/t3v18.html); Numaro 6,444; delta -135 rows; note: ordered design
- CA(143;4,199,2): baseline 154 (https://www.data2intelligence.de/ColbournTables/t4v2.html); Numaro 143; delta -11 rows; note: Torres-Jimenez
- CA(2215;3,16,10): baseline 2,223 (https://www.data2intelligence.de/ColbournTables/t3v10.html); Numaro 2,215; delta -8 rows; note: CK doubling
- CA(421;6,29,2): baseline 426 (https://www.data2intelligence.de/ColbournTables/t6v2.html); Numaro 421; delta -5 rows; note: Torres-Jimenez
- CA(2615;3,41,10): baseline 2,618 (https://www.data2intelligence.de/ColbournTables/t3v10.html); Numaro 2,615; delta -3 rows; note: Dwyer database

Visual notes:
- Rows saved: The ordered-design rows create the largest margins; the full run contains 40 table improvements. (https://numaro.tech/assets/research/covering-arrays/record-margins.svg)
- Where the savings came from: Ordered design accounts for most rows saved, while Torres-Jimenez accounts for most individual cells. (https://numaro.tech/assets/research/covering-arrays/construction-breakdown.svg)
- One actual witness: This is the stored CA(143;4,199,2) binary array rendered as a barcode, not a schematic. (https://numaro.tech/assets/research/covering-arrays/array-slice-t4-k199-v2.svg)

Method: The campaign compared listed CAN values against arrays obtainable from standard construction catalogues, then built the candidates and kept only arrays that actually checked out.

Method steps:
- Triple-confirmed the Colbourn baseline values from the mirror and CAs package data.
- Used construction catalogues to propose smaller N values.
- Materialized each candidate as an explicit array; predictions that failed to build at the claimed size were dropped.
- Checked every kept array against every column-subset and tuple requirement.

Verification: For each array, the checker reads the N x k table, checks that every entry is in the allowed symbol range, then tries every choice of t columns and confirms that all v^t tuples appear. All 40 kept arrays pass with zero missing requirements.

Scope: These are best-known improvements, not optimality proofs. The underlying arrays come from published constructions and catalogues. The contribution is detecting the table-beating cells, building the objects, and checking them from scratch.

Baseline sources:
- Colbourn, Covering Array Tables, November 2024 snapshot. (https://www.data2intelligence.de/ColbournTables/)
- Groemping, CAs R package. (https://github.com/ugroempi/CAs)
- Dwyer covering-array database. (https://github.com/aadwyer/CA_Database)

## New lower bounds for Zarankiewicz numbers

URL: https://numaro.tech/research/zarankiewicz-2026/
Report: NUMARO-2026-008
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Extremal combinatorics
Hero metric: 31 lower bounds raised

Plain-language summary: This result matters because every claimed number has to be carried by an explicit matrix. A larger lower bound is not a guess: it is a concrete 0/1 object whose forbidden 3 by 3 pattern can be checked directly.

Key details:
- The largest gain is z(12,21), where the lower bound moves from 116 to 126.
- The headline small witness is z(13,17;3,3) >= 110.
- Two values are pinned exactly because the new lower bound reaches the known upper bound: z(10,20)=102 and z(11,18)=101.

Hard part: The rule is local, but the search is global . A candidate can have many 1s and still fail because three rows share three columns somewhere in the matrix. For z(16,23), that means checking C(16,3) x C(23,3) possible 3 by 3 submatrices. One forbidden block destroys the whole witness .

Technical claim: The Zarankiewicz number z(m,n;3,3) is the maximum number of 1s in an m by n matrix with no 3 by 3 all-ones submatrix. We give 31 stronger lower bounds, adding 150 total 1s across the checked grid. Two cells reach the known upper bound and are therefore exact: z(10,20;3,3)=102 and z(11,18;3,3)=101.

Result rows:
- z(12,21;3,3): baseline 116 (https://arxiv.org/abs/2605.01120); Numaro 126; delta +10; note: largest gain
- z(12,23;3,3): baseline 125 (https://arxiv.org/abs/2605.01120); Numaro 134; delta +9; note: also follows past a monotone anchor
- z(16,19;3,3): baseline 132 (https://arxiv.org/abs/2605.01120); Numaro 141; delta +9; note: explicit witness
- z(14,23;3,3): baseline 138 (https://arxiv.org/abs/2605.01120); Numaro 146; delta +8; note: explicit witness
- z(10,20;3,3): baseline 99 (https://arxiv.org/abs/2605.01120); Numaro 102; delta +3; note: exact; reaches known upper bound
- z(11,18;3,3): baseline 97 (https://arxiv.org/abs/2605.01120); Numaro 101; delta +4; note: exact; reaches known upper bound

Visual notes:
- Frontier heatmap: Top number is the known upper bound; bottom number is the Numaro lower bound. White cells are exact values. (https://numaro.tech/assets/research/zarankiewicz/frontier-heatmap.svg)
- Actual witness matrices: Three stored 0/1 matrices: the two exact cells and the headline z(13,17) witness. (https://numaro.tech/assets/research/zarankiewicz/witness-matrices.svg)
- Largest moves: The biggest frontier move is z(12,21), from 116 to 126 ones. (https://numaro.tech/assets/research/zarankiewicz/gain-bars.svg)

Method: The campaign used two levers: monotonicity of the table and exact search for K3,3-free matrices. Each new witness was then propagated through the grid where adding a zero row or column preserves validity.

Method steps:
- Read the known lower and upper bounds cell by cell.
- Applied monotonicity: increasing m or n cannot make the best lower bound smaller.
- Solved selected cells with an OR-Tools CP-SAT model for K3,3-free matrices.
- Verified every saved matrix and excluded z(12,17), where the saved witness did not beat the baseline.

Verification: The checker counts 1s and tests every triple of rows. If any three rows share three columns, the matrix contains a forbidden 3 by 3 all-ones block and the witness fails. The 31 reported matrices pass this direct check.

Scope: Except for z(10,20)=102 and z(11,18)=101, these are improved lower bounds, not exact values. The baseline numbers are the previously published lower and upper bounds used for comparison.

Baseline sources:
- New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search, arXiv:2605.01120. (https://arxiv.org/abs/2605.01120)
- OR-Tools CP-SAT. (https://developers.google.com/optimization/cp/cp_solver)

## First-known no-five-on-sphere values for [n]^3

URL: https://numaro.tech/research/no-5-on-sphere-2026/
Report: NUMARO-2026-009
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Discrete geometry
Hero metric: 13 first-known values

Plain-language summary: This result matters because the object is concrete. For each n, the claim is just a list of integer grid points. The set counts only if every group of five points avoids both failure modes: lying on one plane or lying on one sphere.

Key details:
- AlphaEvolve published anchors through n=12. The campaign reproduced those anchors, but did not improve them.
- n=13 reached only the monotone floor from C(12), so it is not claimed.
- The first claimed row is C(14) >= 34, and the largest saved witness is C(26) >= 56.

Hard part: For n=26, the cube has 17,576 possible grid points , and the search is choosing 56 of them. That choice space is astronomically large. The checker is binary, but each candidate has to survive C(56,5) = 3,819,816 five-point tests. One coplanar or cospherical group destroys the whole set .

Technical claim: For grid sizes n=14 through n=26, we give explicit point sets in [n]^3 with no five points on one sphere or plane. AlphaEvolve's problem portfolio published anchors through C(12) >= 33; the n >= 13 range was open. These entries extend the public frontier and are checked from raw integer coordinates.

Result rows:
- C(14): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=34; delta first-known; note: first claimed row
- C(15): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=35; delta first-known; note: integer-coordinate witness
- C(16): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=39; delta first-known; note: integer-coordinate witness
- C(17): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=40; delta first-known; note: integer-coordinate witness
- C(18): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=40; delta first-known; note: integer-coordinate witness
- C(19): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=43; delta first-known; note: integer-coordinate witness
- C(20): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=45; delta first-known; note: integer-coordinate witness
- C(21): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=46; delta first-known; note: integer-coordinate witness
- C(22): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=50; delta first-known; note: integer-coordinate witness
- C(23): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=51; delta first-known; note: integer-coordinate witness
- C(24): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=53; delta first-known; note: integer-coordinate witness
- C(25): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=55; delta first-known; note: integer-coordinate witness
- C(26): baseline none listed (https://arxiv.org/abs/2511.02864); Numaro >=56; delta first-known; note: largest listed n

Visual notes:
- Open frontier: Muted points are the published AlphaEvolve anchors through n=12. White points are Numaro first-known rows for n=14 through n=26. (https://numaro.tech/assets/research/no-5-on-sphere/frontier-sequence.svg)
- Actual C(26) witness: The 56 points are rendered from the saved n=26 coordinate file. The projection is visual only; the check uses exact 3D integer coordinates. (https://numaro.tech/assets/research/no-5-on-sphere/n26-witness.svg)
- Check load: For the n=26 row, the exact check has to reject 3,819,816 possible five-point groups. (https://numaro.tech/assets/research/no-5-on-sphere/subset-scale.svg)

Method: The campaign treated the count of bad five-point groups as the search signal. A candidate is valid only when that count reaches zero.

Method steps:
- Reproduced the published AlphaEvolve anchors C(11)=31 and C(12)=33 for calibration.
- For each open n, searched random integer point sets of increasing size.
- Used the number of coplanar or cospherical five-point subsets as feedback during search.
- Recorded only coordinate sets whose bad-five count reached zero.

Verification: The check looks at every group of five saved points. It uses an exact integer determinant: if the determinant is zero, those five points lie on one plane or one sphere. The reported sets have zero such groups; no floating-point geometry is used.

Scope: These are first-known lower bounds for open n values, not optimality proofs. They do not beat AlphaEvolve's published C(11) or C(12) values, and n=13 is not claimed because the search did not move past the monotone floor.

Baseline sources:
- Georgiev, Gomez-Serrano, Tao, Wagner, AlphaEvolve problem portfolio, arXiv:2511.02864, Problem 6.60. (https://arxiv.org/abs/2511.02864)
- DeepMind AlphaEvolve overview. (https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/)

## First-known no-isosceles grid sets for open n values

URL: https://numaro.tech/research/no-isosceles-grid-2026/
Report: NUMARO-2026-010
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Discrete geometry
Hero metric: 54 first-known grid values

Plain-language summary: This result matters because the rule is local and exact. A set is valid only if no chosen point sees two other chosen points at the same squared distance. That one repeated distance would immediately create an isosceles triangle.

Key details:
- AlphaEvolve published strong sets at n=64 and n=100, but not the open n values reported here.
- The 54 claimed rows are the evolve_sweep records in RECORDS.jsonl.
- The largest reported sets are n=77 and n=79, each with 116 selected points.

Hard part: For n=77, the grid has 5,929 possible points and the set keeps 116. The verifier is fast, but the search is brittle: around every selected point, all 115 squared distances to the other selected points must be different. One repeated distance around any anchor creates an isosceles triangle .

Technical claim: We report no-isosceles-triangle subsets of the n by n grid for 54 open n values from 22 through 79. These are first-known lower bounds for untabulated sizes, with sizes up to 116. The published AlphaEvolve n=64 and n=100 results are not beaten.

Result rows:
- n=22: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 36; delta first-known; note: first claimed row
- n=23: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 40; delta first-known; note: integer-coordinate set
- n=24: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 40; delta first-known; note: integer-coordinate set
- n=25: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 40; delta first-known; note: integer-coordinate set
- n=26: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 42; delta first-known; note: integer-coordinate set
- n=27: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 44; delta first-known; note: integer-coordinate set
- n=29: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 48; delta first-known; note: integer-coordinate set
- n=31: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 52; delta first-known; note: integer-coordinate set
- n=32: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 52; delta first-known; note: integer-coordinate set
- n=33: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 52; delta first-known; note: integer-coordinate set
- n=35: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 56; delta first-known; note: integer-coordinate set
- n=36: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 56; delta first-known; note: integer-coordinate set
- n=37: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 64; delta first-known; note: integer-coordinate set
- n=38: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 60; delta first-known; note: integer-coordinate set
- n=39: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 60; delta first-known; note: integer-coordinate set
- n=40: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 64; delta first-known; note: integer-coordinate set
- n=41: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 64; delta first-known; note: integer-coordinate set
- n=42: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 64; delta first-known; note: integer-coordinate set
- n=43: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 68; delta first-known; note: integer-coordinate set
- n=44: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 68; delta first-known; note: integer-coordinate set
- n=45: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 72; delta first-known; note: integer-coordinate set
- n=46: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 70; delta first-known; note: integer-coordinate set
- n=47: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 72; delta first-known; note: integer-coordinate set
- n=48: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 74; delta first-known; note: integer-coordinate set
- n=49: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 76; delta first-known; note: integer-coordinate set
- n=50: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 78; delta first-known; note: integer-coordinate set
- n=51: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 80; delta first-known; note: integer-coordinate set
- n=52: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 80; delta first-known; note: integer-coordinate set
- n=53: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 84; delta first-known; note: integer-coordinate set
- n=54: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 80; delta first-known; note: integer-coordinate set
- n=55: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 84; delta first-known; note: integer-coordinate set
- n=56: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 84; delta first-known; note: integer-coordinate set
- n=57: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 88; delta first-known; note: integer-coordinate set
- n=58: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 92; delta first-known; note: integer-coordinate set
- n=59: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 92; delta first-known; note: integer-coordinate set
- n=60: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 92; delta first-known; note: integer-coordinate set
- n=61: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 92; delta first-known; note: integer-coordinate set
- n=62: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 92; delta first-known; note: integer-coordinate set
- n=63: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 96; delta first-known; note: neighbor of published n=64
- n=65: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 96; delta first-known; note: neighbor of published n=64
- n=66: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 100; delta first-known; note: integer-coordinate set
- n=67: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 104; delta first-known; note: integer-coordinate set
- n=68: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 104; delta first-known; note: integer-coordinate set
- n=69: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 104; delta first-known; note: integer-coordinate set
- n=70: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 104; delta first-known; note: integer-coordinate set
- n=71: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=72: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=73: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=74: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=75: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=76: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=77: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 116; delta first-known; note: largest value in table
- n=78: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 112; delta first-known; note: integer-coordinate set
- n=79: baseline none listed (https://arxiv.org/abs/2511.02864); Numaro 116; delta first-known; note: largest value in table

Visual notes:
- Open grid sizes: White points are Numaro first-known rows. Hollow points are AlphaEvolve's published n=64 and n=100 rows, which are not beaten. (https://numaro.tech/assets/research/no-isosceles-grid/open-frontier.svg)
- Actual n=77 set: The 116 points are rendered from RECORDS.jsonl. The dashed midlines show the symmetry-preserving search space. (https://numaro.tech/assets/research/no-isosceles-grid/n77-witness.svg)
- Distance test: For one anchor, each squared distance is unique. The checker repeats the same exact test around every selected point. (https://numaro.tech/assets/research/no-isosceles-grid/distance-rings.svg)

Method: The campaign improved the generator of point sets, not just one fixed set. The best rows came from an evolved construction that keeps symmetry while adding and removing whole orbits of points.

Method steps:
- Loaded and checked the published AlphaEvolve n=64 and n=100 reference sets for calibration.
- Observed that direct point-level search plateaued at lower density.
- Evolved construction parameters for symmetry group, axis, ruin fraction, local search, and candidate count.
- Kept the 54 evolve_sweep rows whose saved point sets pass the exact no-isosceles check.

Verification: For each selected point P, the checker computes the squared distance from P to every other selected point. If a squared distance appears twice, P and those two points form an isosceles triangle, so the set fails. All 54 reported evolve_sweep rows pass this exact integer check.

Scope: These are first-known lower bounds for open n values, not optimality proofs. They are not beats of AlphaEvolve's published n=64 and n=100 values. The missing open sizes in this sweep are n=28, n=30, and n=34; n=64 already had a published AlphaEvolve row.

Baseline sources:
- Georgiev, Gomez-Serrano, Tao, Wagner, AlphaEvolve problem portfolio, arXiv:2511.02864, Problem 6.59. (https://arxiv.org/abs/2511.02864)
- DeepMind AlphaEvolve overview. (https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/)

## DNA storage codes with larger codebooks

URL: https://numaro.tech/research/dna-storage-codes-2026/
Report: NUMARO-2026-011
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Coding theory
Hero metric: 16 larger DNA codebooks

Plain-language summary: A DNA storage code is useful only if many words can be packed into the same constraint set. More words means more possible stored symbols at the same word length and error tolerance. The hard part is that the literature mixes two GC rules: strict GC uses exactly n/2 G/C bases, while GC bands allow a wider set of words. Those are different problems, so this page separates them instead of treating every larger number as comparable.

Key details:
- The largest margin is GCNL (10,4): 581 to 657 under the strict-GC no-runlength/Hamming convention.
- The largest all-convention row is King's reverse-complement (8,3): 390 to 426.
- Eight rows beat every published value under any GC convention. The remaining eight are strict-GC improvements only, and the table says so.

Hard part: The search object is not one DNA word; it is a whole set of words that must all remain compatible at once. For the saved GCNL (10,4) code, the checker compares C(657,2)=215,496 word pairs, and one bad pair would invalidate the code. The subtle part is comparison: a strict-GC improvement can be real and still not be comparable to a wider-GC-band table.

Technical claim: DNA storage codes are sets of A/C/G/T words that must stay far apart under Hamming distance while also obeying biochemical constraints such as fixed GC content, no repeated adjacent bases, or reverse-complement separation. We improve 16 published best-known code sizes: 3 in King's reverse-complement family and 13 in the constant-GC no-runlength Hamming family. Eight rows beat every published value under any GC convention; the other eight improve the strict GC = n/2 table. Two strict-GC cells are proven optimal.

Result rows:
- GCNL (10,4): baseline 581 (https://doi.org/10.1109/LCOMM.2018.2865308); Numaro 657; delta +76; note: strict GC; largest margin
- GCRC (8,3): baseline 390 (https://doi.org/10.37236/1726); Numaro 426; delta +36; note: all conventions; King's reverse-complement table
- GCNL (10,5): baseline 168 (https://pmc.ncbi.nlm.nih.gov/articles/PMC8129200/); Numaro 190; delta +22; note: strict GC
- GCNL (12,9): baseline 9 (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1158337/full); Numaro 20; delta +11; note: all conventions
- GCNL (10,6): baseline 57 (https://doi.org/10.1109/LCOMM.2018.2865308); Numaro 67; delta +10; note: strict GC
- GCNL (8,3): baseline 373 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/); Numaro 383; delta +10; note: strict GC
- GCNL (8,4): baseline 119 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/); Numaro 128; delta +9; note: strict GC
- GCNL (10,8): baseline 10 (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1158337/full); Numaro 16; delta +6; note: all conventions
- GCNL (10,7): baseline 23 (https://pmc.ncbi.nlm.nih.gov/articles/PMC8129200/); Numaro 28; delta +5; note: all conventions
- GCNL (12,10): baseline 6 (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1158337/full); Numaro 9; delta +3; note: all conventions
- GCNL (6,3): baseline 60 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/); Numaro 62; delta +2; note: strict GC; proven optimal
- GCNL (8,5): baseline 40 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/); Numaro 42; delta +2; note: strict GC; ties widest band
- GCRC (6,3): baseline 41 (https://doi.org/10.37236/1726); Numaro 43; delta +2; note: all conventions; King's reverse-complement table
- GCRC (8,5): baseline 26 (https://doi.org/10.37236/1726); Numaro 28; delta +2; note: all conventions; King's reverse-complement table
- GCNL (6,4): baseline 27 (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1158337/full); Numaro 28; delta +1; note: all conventions; proven optimal strict
- GCNL (8,6): baseline 16 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/); Numaro 17; delta +1; note: strict GC

Visual notes:
- Record margins: White bars are rows that beat every published GC convention. Gray bars are strict-GC improvements, where wider GC-band tables are a different comparison. (https://numaro.tech/assets/research/dna-codes/record-margins.svg)
- The GC convention trap: For length 10, strict GC uses only GC=5 words. A GC band also includes GC=4 and GC=6 words, so it is a larger search space. (https://numaro.tech/assets/research/dna-codes/gc-convention.svg)
- Actual codewords: The largest saved code in the set has 657 length-10 DNA words. Each row is one word, drawn base by base. (https://numaro.tech/assets/research/dna-codes/codeword-barcode.svg)

Method: Each cell becomes a graph problem. A vertex is an admissible DNA word. Two vertices are connected only if the two words can live in the same code. A large code is a large clique in that graph.

Method steps:
- Compiled the prior values while recording the exact GC rule used by each source.
- Generated every admissible word for the requested length, distance, GC rule, and no-runlength or reverse-complement rule.
- Searched the compatibility graph with a bitmask clique solver; small cells were also closed with exact CP-SAT.
- Dropped apparent wins when the baseline used a wider GC band and the numbers were not comparable.

Verification: The checker rereads each saved code and tests the rules directly: every word has the right length and alphabet, the GC count is correct, no adjacent bases repeat where that rule applies, all words are distinct, every pair is far enough apart, and reverse-complement distances are checked for the reverse-complement family.

Scope: Only rows compared under the same convention are claimed. Most rows are best-known improvements, not optimality proofs. The proven-optimal statements are only for the strict-GC n=6 cells. Wider GC bands are a different, easier problem because they allow more candidate words.

Baseline sources:
- King, Bounds for DNA codes with constant GC-content, 2003. (https://doi.org/10.37236/1726)
- Limbachiya, Dey, Gupta, DNA codes with constant GC-content and no runlength. (https://doi.org/10.1109/LCOMM.2018.2865308)
- CLGBO, DNA storage coding through graph-based optimization. (https://pmc.ncbi.nlm.nih.gov/articles/PMC8129200/)
- LEO, DNA coding with constant GC-content and no-runlength constraints. (https://pmc.ncbi.nlm.nih.gov/articles/PMC9671426/)
- MFOS, Frontiers in Genetics 2023, for banded comparisons. (https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1158337/full)

## Constant-weight codes: one propagation beat and exact DCW fills

URL: https://numaro.tech/research/constant-weight-codes-2026/
Report: NUMARO-2026-012
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Coding theory
Hero metric: 442 exact DCW values

Plain-language summary: A constant-weight code is a set of binary words with the same number of ones, kept far apart by Hamming distance. The main table is strong: direct search did not beat it. The one improvement came from a simple rule the table had not propagated, while the related doubly-constant-weight object had upper bounds but no comparable construction table.

Key details:
- The record beat is A(29,8,6) >= 131, compared with the listed 130.
- The construction is simple: append a zero coordinate to Rosin's A(28,8,6)=131 code.
- The 442 doubly-constant-weight values are first-recorded exact values, not beats of a lower-bound table.

Hard part: The verifier is simple but the search space is combinatorial. A code of size 131 requires all C(131,2)=8,515 word pairs to stay at distance at least 8 while every word has exactly six ones. The honest result is that the main table was mostly too strong to out-search.

Technical claim: We improve one listed constant-weight code value, A(29,8,6), by propagating a fresh A(28,8,6)=131 result that had not flowed into a frozen sub-table. We also materialize and prove exact 442 doubly-constant-weight values for an object that had upper bounds but no construction table.

Result rows:
- A(29,8,6): baseline 130 (https://www.win.tue.nl/~aeb/codes/Andw.html); Numaro >=131; delta +1; note: append-0 from A(28,8,6)=131
- T(2,10,2,10,6): baseline upper bound 35 (https://codes.se/bounds/dcw.html); Numaro 25 exact; delta first-recorded; note: proven below the published upper bound
- T(2,6,2,15,4): baseline upper bound 120 (https://codes.se/bounds/dcw.html); Numaro 105 exact; delta first-recorded; note: proven below the published upper bound
- T exact fills: baseline upper bounds only (https://codes.se/bounds/dcw.html); Numaro 442 values; delta new table; note: 404 meet upper bound; 38 below it
- Main A(n,d,w) sweep: baseline Brouwer (https://www.win.tue.nl/~aeb/codes/Andw.html); Numaro 0 search beats; delta match-hard; note: record is propagation only

Visual notes:
- Propagation lag: The record beat is not a mysterious search jump: A(28,8,6)=131 lifts to A(29,8,6) by appending a zero coordinate. (https://numaro.tech/assets/research/constant-weight/propagation-lag.svg)
- DCW gaps: The doubly-constant-weight rows are fills, not beats of a lower-bound table. These examples are exact values below the published upper bound. (https://numaro.tech/assets/research/constant-weight/dcw-gaps.svg)

Method: The campaign first closed simple monotonicity relations in the constant-weight table, then used exact max-clique on the doubly-constant-weight object.

Method steps:
- Checked A(n,d,w) propagation by appending constant 0 or 1 coordinates.
- Found the single unpropagated A(29,8,6) cell.
- Modeled doubly-constant-weight cells as maximum cliques and solved them with CP-SAT.
- Ran a broader exact/heuristic sweep on the main table and recorded that it did not beat.

Verification: verify.py brute-force checks binary length, weights, distinctness, and all pairwise Hamming distances. Exactness of the DCW values rests on CP-SAT OPTIMAL status.

Scope: A(29,8,6)>=131 is best-known, not optimal. The 442 DCW values are fills for an object without prior lower-bound tables, not record beats.

Baseline sources:
- Brouwer, Table of constant-weight binary codes A(n,d,w). (https://www.win.tue.nl/~aeb/codes/Andw.html)
- Rosin, improved constant-weight code lower bounds, 2026. (https://arxiv.org/abs/2603.00174)
- Agrell et al., doubly-constant-weight code upper bounds. (https://codes.se/bounds/dcw.html)

## qLDPC distances: exact certificates, zero corrections

URL: https://numaro.tech/research/qldpc-distance-certificates-2026/
Report: NUMARO-2026-013
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Quantum error correction
Hero metric: 19 exact certificates

Plain-language summary: Quantum LDPC distance estimates are often produced by heuristic decoders. Exact certification is useful because it separates a proved distance from a decoder estimate. Here the risk was overstatement, but the accessible published coprime BB/GB rows held.

Key details:
- The campaign found no overstated distances in this target set.
- For 19 codes, CP-SAT proves the exact distance equals the published distance.
- For nine harder codes, the report gives only the proven lower bound and the best logical operator found.

Hard part: Minimum distance asks for the lightest nontrivial logical operator. A heuristic decoder can miss a low-weight logical, so an exact certificate must prove that no smaller one exists. Nine larger codes timed out; those rows are reported as bounds, not guessed exact values.

Technical claim: We certify exact minimum distances for 19 published coprime bivariate/generalized-bicycle quantum LDPC codes whose distances were previously BP-OSD estimates. Every certified value matched the published value. Nine larger codes remain bounded but not fully certified.

Result rows:
- [[18,4,4]]: baseline 4 BP-OSD (https://arxiv.org/abs/2408.10001); Numaro 4 exact; delta confirmed; note: BB family
- [[48,4,8]]: baseline 8 BP-OSD (https://arxiv.org/abs/2408.10001); Numaro 8 exact; delta confirmed; note: GB family
- [[66,4,10]]: baseline 10 BP-OSD (https://arxiv.org/abs/2408.10001); Numaro 10 exact; delta confirmed; note: GB family
- [[196,18,8]]: baseline 8 BP-OSD (https://arxiv.org/abs/2408.10001); Numaro 8 exact; delta confirmed; note: BB family
- 9 harder codes: baseline published (https://arxiv.org/abs/2408.10001); Numaro bounds only; delta uncertified; note: no overstatement evidence

Visual notes:
- Certificate status: The exact solver confirmed 19 published distances, timed out on 9 larger cases, and found zero lower-distance corrections. (https://numaro.tech/assets/research/qldpc/certificate-status.svg)
- Distance ladder: Each point is a published coprime BB/GB code whose BP-OSD distance estimate was proved exact. (https://numaro.tech/assets/research/qldpc/distance-ladder.svg)

Method: The campaign rebuilt each CSS code from published construction polynomials and solved the minimum logical-operator problem exactly where feasible.

Method steps:
- Implemented GF(2) linear algebra for CSS validity, ranks, and logical spaces.
- Calibrated on textbook codes and known qLDPC anchors.
- Modeled minimum-weight logical search with CP-SAT.
- Separated exact certificates from timed-out bound-only cases.

Verification: verify.py rebuilds H_X and H_Z, checks CSS validity and parameters, reruns the distance proof, and re-exhibits each witnessing logical operator.

Scope: This is not a new-code result and not a record beat. It confirms 19 distances and reports bounds for unresolved larger cases.

Baseline sources:
- Wang and Mueller, Coprime Bivariate Bicycle Codes. (https://arxiv.org/abs/2408.10001)
- IBM qLDPC / LLM-evolution calibration reference. (https://arxiv.org/abs/2606.02418)
- QDistSAT exact qLDPC distance certification benchmark. (https://arxiv.org/abs/2606.12445)

## Learned mRNA coding-sequence designers are Pareto-suboptimal on their own objective

URL: https://numaro.tech/research/mrna-codon-structure-design-2026/
Report: NUMARO-2026-014
Date: 2026-07-03
Authors: Numaro Autoresearch Team
Domain: Sequence design
Hero metric: 90/90 dominations reverified

Plain-language summary: This matters because learned sequence designers are often evaluated on deterministic proxy objectives before wet-lab validation. If a simple deterministic baseline dominates a learned method on its own proxy, the learned method is leaving objective value on the table. The result stays inside that proxy boundary.

Key details:
- The comparison uses translation correctness, MFE/EFE, CAI, uridine fraction, and forbidden motifs as deterministic scores.
- The strongest warning is also in the result: better proxy values do not imply better expression in cells.
- LinearDesign is treated as an honest wall. It is exact on its own MFE/CAI objective and is not dominated.

Hard part: A protein has many synonymous mRNA encodings, but every candidate must translate back to the same protein. A win only counts if the sequence is no worse on every scored axis and strictly better on at least one. The biological caveat is load-bearing: this is a proxy-objective result, not an expression claim.

Technical claim: On the public 55-human-protein LinearDesign/CodonRL benchmark, deterministic sequence design plus verified polish produces translation-correct mRNA coding sequences that Pareto-dominate CodonRL on 54/54 usable proteins and GEMORNA on 35/54. LinearDesign itself is not dominated.

Result rows:
- CodonRL: baseline released soup55 (https://github.com/MinkaiXu/CodonRL); Numaro 54/54 dominated; delta all usable proteins; note: MFE, CAI, uridine
- GEMORNA: baseline repo sequences (https://github.com/RainaBio/GEMORNA); Numaro 35/54 dominated; delta proxy improvement; note: 4-axis objective
- EnsembleDesign: baseline EFE specialist (https://github.com/LinearFold/EnsembleDesign); Numaro 1 dominated; delta mostly ties; note: honest wall
- LinearDesign: baseline exact MFE/CAI (https://www.nature.com/articles/s41586-023-06127-z); Numaro 0 dominated; delta not beaten; note: exact wall
- Verifier: baseline recorded dominations; Numaro 90/90 pass; delta confirmed; note: translation and scores rechecked

Visual notes:
- Domination counts: CodonRL and GEMORNA are dominated on their stated proxy axes; LinearDesign is not, because it is exact on its own MFE/CAI objective. (https://numaro.tech/assets/research/mrna/domination-summary.svg)
- Average movement: On dominated GEMORNA rows, the proxy movement is not one-dimensional: MFE, CAI, uridine, and motifs all move in the allowed direction on average. (https://numaro.tech/assets/research/mrna/gemorna-average-gains.svg)

Method: The campaign combined deterministic seeds from exact or simple objectives with a ViennaRNA-in-the-loop local polish, accepting only candidates that re-scored as true Pareto improvements.

Method steps:
- Reproduced the scoring conventions used by the compared tools.
- Generated seeds from maximum-CAI, minimum-uridine, and LinearDesign-guided sequences.
- Polished candidates while rechecking translation and all objective values.
- Separated proxy-objective statements from biological-expression claims.

Verification: verify.py checks translation to the target protein, recomputes all deterministic scores, and rechecks each Pareto-domination claim from raw FASTA sequences.

Scope: This is a computational proxy result only. It is not an in-cell expression claim, not a safety claim, and not a claim that lower MFE is always biologically better.

Baseline sources:
- LinearDesign, Nature 2023. (https://www.nature.com/articles/s41586-023-06127-z)
- CodonRL benchmark and released checkpoint. (https://github.com/MinkaiXu/CodonRL)
- GEMORNA repository. (https://github.com/RainaBio/GEMORNA)
- EnsembleDesign repository. (https://github.com/LinearFold/EnsembleDesign)

## Walshaw graph partitioning: no record beat, useful calibration

URL: https://numaro.tech/research/walshaw-graph-partitioning-2026/
Report: NUMARO-CAL-2026-001
Date: 2026-06-30
Authors: Numaro Autoresearch Team
Domain: Graph partitioning
Hero metric: 0 record beats claimed

Plain-language summary: Negative results matter when they explain the boundary of the system. Here the benchmark is not soft: it is maintained by strong search methods, and the same operators that set the records find stable local optima. The useful output is a calibrated verifier and a map of why the table did not move.

Key details:
- The verifier reproduces downloaded archive partitions exactly, including cuts and balance.
- The archive is closed under the simple monotonicity checks that produced free wins in other domains.
- The report points to future soft targets but does not package any cell as a win.

Hard part: A balanced graph partition has two constraints at once: small cut and legal part sizes. The archive records are search-dominated, so reproducing them is easy evidence of calibration but not evidence of a soft target. The honest output is zero claimed beats.

Technical claim: We attacked the Walshaw Graph-Partitioning Archive with KaHIP-family search and exact boundary ILP refinement. No record was beaten. The campaign reproduced more than 35 records exactly, calibrated a standalone verifier, and mapped why the benchmark is hard.

Result rows:
- 3elt k=2 eps=5%: baseline 87 (https://chriswalshaw.co.uk/partition/); Numaro 87; delta matched; note: archive partition verified
- 4elt k=8 eps=0%: baseline 545 (https://chriswalshaw.co.uk/partition/); Numaro 545; delta matched; note: perfect balance
- whitaker3 k=32 eps=0%: baseline 1668 (https://chriswalshaw.co.uk/partition/); Numaro 1668; delta matched; note: archive partition verified
- large Jet cells: baseline archive (https://chriswalshaw.co.uk/partition/); Numaro +1-4.5%; delta not beaten; note: frontier remained out of reach

Visual notes:
- Exact reproductions: The verifier reproduces archive cuts on known partitions. That calibrates the checker before any attack claim. (https://numaro.tech/assets/research/walshaw/archive-reproductions.svg)
- No-beat routes: Monotonicity checks, record-seeded refinement, and exact local search found no soft cell. The remaining frontier gaps stayed positive. (https://numaro.tech/assets/research/walshaw/negative-routes.svg)

Method: The pipeline combined archive parsing, KaHIP/KaFFPaE memetic partitioning, seeded reruns, monotonicity checks, and exact boundary-region CP-SAT refinement.

Method steps:
- Rebuilt the exact Walshaw balance rule from archive tables.
- Reproduced downloaded best partitions with an independent verifier.
- Ran strong partitioners and exact local ILP refinement.
- Recorded why each route matched or undershot rather than beat.

Verification: verify.py recomputes edge cuts and epsilon-balance directly from .graph and partition files, calibrated against archive downloads.

Scope: This is a calibration and negative-result report. It is intentionally not presented as a record win.

Baseline sources:
- Walshaw Graph-Partitioning Archive. (https://chriswalshaw.co.uk/partition/)
- KaHIP / KaFFPaE graph partitioning methods. (https://github.com/KaHIP/KaHIP)
- Henzinger, Noe, and Schulz ILP local search approach. (https://arxiv.org/search/?query=Henzinger+Noe+Schulz+ILP+graph+partitioning&searchtype=all)