Plain language

What this result means

A DNA storage code is useful only if many words can be packed into the same constraint set. More words means more possible stored symbols at the same word length and error tolerance. The hard part is that the literature mixes two GC rules: strict GC uses exactly n/2 G/C bases, while GC bands allow a wider set of words. Those are different problems, so this page separates them instead of treating every larger number as comparable.

  • The largest margin is GCNL (10,4): 581 to 657 under the strict-GC no-runlength/Hamming convention.
  • The largest all-convention row is King's reverse-complement (8,3): 390 to 426.
  • Eight rows beat every published value under any GC convention. The remaining eight are strict-GC improvements only, and the table says so.

Visual notes

How to read the result

Horizontal bar chart of sixteen DNA storage code improvements, with white bars for all-convention rows and gray bars for strict-GC rows.
Record marginsWhite bars are rows that beat every published GC convention. Gray bars are strict-GC improvements, where wider GC-band tables are a different comparison.
Bar chart showing how many length ten no-runlength words exist at each GC count, with GC equals five highlighted for strict GC and GC equals four through six highlighted for the band.
The GC convention trapFor length 10, strict GC uses only GC=5 words. A GC band also includes GC=4 and GC=6 words, so it is a larger search space.
Dense barcode-style visualization of the 657 codewords in the GCNL length ten distance four DNA code.
Actual codewordsThe largest saved code in the set has 657 length-10 DNA words. Each row is one word, drawn base by base.

Result table

Sixteen DNA code-size improvements, split by the GC convention they beat.

CellBaselineNumaroDeltaNote
GCNL (10,4)581657+76strict GC; largest margin
GCRC (8,3)390426+36all conventions; King's reverse-complement table
GCNL (10,5)168190+22strict GC
GCNL (12,9)920+11all conventions
GCNL (10,6)5767+10strict GC
GCNL (8,3)373383+10strict GC
GCNL (8,4)119128+9strict GC
GCNL (10,8)1016+6all conventions
GCNL (10,7)2328+5all conventions
GCNL (12,10)69+3all conventions
GCNL (6,3)6062+2strict GC; proven optimal
GCNL (8,5)4042+2strict GC; ties widest band
GCRC (6,3)4143+2all conventions; King's reverse-complement table
GCRC (8,5)2628+2all conventions; King's reverse-complement table
GCNL (6,4)2728+1all conventions; proven optimal strict
GCNL (8,6)1617+1strict GC

Method

How it was found

Each cell becomes a graph problem. A vertex is an admissible DNA word. Two vertices are connected only if the two words can live in the same code. A large code is a large clique in that graph.

  • Compiled the prior values while recording the exact GC rule used by each source.
  • Generated every admissible word for the requested length, distance, GC rule, and no-runlength or reverse-complement rule.
  • Searched the compatibility graph with a bitmask clique solver; small cells were also closed with exact CP-SAT.
  • Dropped apparent wins when the baseline used a wider GC band and the numbers were not comparable.

Verification

How it was checked

The checker rereads each saved code and tests the rules directly: every word has the right length and alphabet, the GC count is correct, no adjacent bases repeat where that rule applies, all words are distinct, every pair is far enough apart, and reverse-complement distances are checked for the reverse-complement family.

Scope

What is not being claimed

Only rows compared under the same convention are claimed. Most rows are best-known improvements, not optimality proofs. The proven-optimal statements are only for the strict-GC n=6 cells. Wider GC bands are a different, easier problem because they allow more candidate words.

References

Baseline sources

Citation

How to cite

Numaro Autoresearch Team. "DNA storage codes with larger codebooks." Numaro Research Report NUMARO-2026-011, 2026.

@techreport{numaro2026DnaStorageCodes,
  title = {DNA storage codes with larger codebooks},
  author = {Numaro Autoresearch Team},
  institution = {Numaro},
  number = {NUMARO-2026-011},
  year = {2026},
  url = {https://numaro.tech/research/dna-storage-codes-2026/}
}