Plain language
What this result means
A DNA storage code is useful only if many words can be packed into the same constraint set. More words means more possible stored symbols at the same word length and error tolerance. The hard part is that the literature mixes two GC rules: strict GC uses exactly n/2 G/C bases, while GC bands allow a wider set of words. Those are different problems, so this page separates them instead of treating every larger number as comparable.
- The largest margin is GCNL (10,4): 581 to 657 under the strict-GC no-runlength/Hamming convention.
- The largest all-convention row is King's reverse-complement (8,3): 390 to 426.
- Eight rows beat every published value under any GC convention. The remaining eight are strict-GC improvements only, and the table says so.
Visual notes
How to read the result
Result table
Sixteen DNA code-size improvements, split by the GC convention they beat.
| Cell | Baseline | Numaro | Delta | Note |
|---|---|---|---|---|
| GCNL (10,4) | 581 | 657 | +76 | strict GC; largest margin |
| GCRC (8,3) | 390 | 426 | +36 | all conventions; King's reverse-complement table |
| GCNL (10,5) | 168 | 190 | +22 | strict GC |
| GCNL (12,9) | 9 | 20 | +11 | all conventions |
| GCNL (10,6) | 57 | 67 | +10 | strict GC |
| GCNL (8,3) | 373 | 383 | +10 | strict GC |
| GCNL (8,4) | 119 | 128 | +9 | strict GC |
| GCNL (10,8) | 10 | 16 | +6 | all conventions |
| GCNL (10,7) | 23 | 28 | +5 | all conventions |
| GCNL (12,10) | 6 | 9 | +3 | all conventions |
| GCNL (6,3) | 60 | 62 | +2 | strict GC; proven optimal |
| GCNL (8,5) | 40 | 42 | +2 | strict GC; ties widest band |
| GCRC (6,3) | 41 | 43 | +2 | all conventions; King's reverse-complement table |
| GCRC (8,5) | 26 | 28 | +2 | all conventions; King's reverse-complement table |
| GCNL (6,4) | 27 | 28 | +1 | all conventions; proven optimal strict |
| GCNL (8,6) | 16 | 17 | +1 | strict GC |
Method
How it was found
Each cell becomes a graph problem. A vertex is an admissible DNA word. Two vertices are connected only if the two words can live in the same code. A large code is a large clique in that graph.
- Compiled the prior values while recording the exact GC rule used by each source.
- Generated every admissible word for the requested length, distance, GC rule, and no-runlength or reverse-complement rule.
- Searched the compatibility graph with a bitmask clique solver; small cells were also closed with exact CP-SAT.
- Dropped apparent wins when the baseline used a wider GC band and the numbers were not comparable.
Verification
How it was checked
The checker rereads each saved code and tests the rules directly: every word has the right length and alphabet, the GC count is correct, no adjacent bases repeat where that rule applies, all words are distinct, every pair is far enough apart, and reverse-complement distances are checked for the reverse-complement family.
Scope
What is not being claimed
Only rows compared under the same convention are claimed. Most rows are best-known improvements, not optimality proofs. The proven-optimal statements are only for the strict-GC n=6 cells. Wider GC bands are a different, easier problem because they allow more candidate words.
References
Baseline sources
- King, Bounds for DNA codes with constant GC-content, 2003.
- Limbachiya, Dey, Gupta, DNA codes with constant GC-content and no runlength.
- CLGBO, DNA storage coding through graph-based optimization.
- LEO, DNA coding with constant GC-content and no-runlength constraints.
- MFOS, Frontiers in Genetics 2023, for banded comparisons.
Citation
How to cite
Numaro Autoresearch Team. "DNA storage codes with larger codebooks." Numaro Research Report NUMARO-2026-011, 2026.
@techreport{numaro2026DnaStorageCodes,
title = {DNA storage codes with larger codebooks},
author = {Numaro Autoresearch Team},
institution = {Numaro},
number = {NUMARO-2026-011},
year = {2026},
url = {https://numaro.tech/research/dna-storage-codes-2026/}
}