The Codon Stamp

Three substrate observables from one set of per-base inputs

Anatomy of the Codon’s Seal. Three stacked aromatic vortices write a word in the substrate’s five-letter alphabet, and every word knows its mirror among the sixty-four. Left: the per-base profiles \phi_B — pyrimidines (C, U, T) carry one toroidal vortex; purines (A, G) carry two on the fused 6-5 ring. Center: a codon stamp \Phi_C(\vec{r}) — three bases stacked along \hat{z}, each turned \Delta\theta \approx 34.3° from the last, rise h \approx 3.4 Å, total 6.8 Å. Right: the 64×64 codon-anticodon binding matrix, reordered so the cognate diagonal runs corner to corner — and lights up red against a blue off-diagonal field. Cognate ranks #1 for 64 of 64 codons; the discrimination gap is structural, not subtle. The pitch that holds the seal (10.5 bp/turn) descends from \alpha_\text{mf} in the [bridge equation](bridge-equation.qmd) — eight domains, the Weinberg angle, \hbar, c — through one number.

A codon is three stacked aromatic bases — six toroidal vortices for purines and pyrimidines on a 6.8 Å column twisted 34.3° per step. The channel-with-memory and nested-modon sections set up the substrate framework’s reading of this column as a coherent 3D stamp on the substrate’s flow geometry, not just three written letters. If that picture is right, the 64 codons are 64 distinct stamps with a measurable similarity relation among them, and synonymous codons (codons that translate to the same amino acid) carry different stamps despite sharing an amino-acid label.

This section builds the stamp explicitly. The per-base aromatic-vortex profiles from Aromatic Rings as Toroidal Vortices define the alphabet; convolving three of them on the polar axis gives the codon’s 3D substrate fingerprint; an L2 metric on that space gives the 64×64 distance matrix. The same per-base inputs feed two other observables — pair-level Watson-Crick binding (built up in the base-pairing section) and the 64×64 codon-anticodon cognate-recognition matrix. One set of inputs; one overall scaling factor; three independent classes of substrate prediction.

The per-base vortex profile

Aromatic-rings physics treats each π-aromatic ring as a closed-loop substrate raceway above and below the molecular plane. For each of the five nucleic-acid bases (A, G, C, T, U), this gives one (pyrimidines) or two (purines, with fused 6-5 rings) toroidal vortices with a definite spatial profile of substrate displacement:

\phi_B(\vec{r}) \;\equiv\; \text{substrate-displacement field of base } B,\;\text{centered at ring centroid, in the molecular frame.}

What goes into \phi_B in practice:

Input	Source	Status
Ring-current strength	NICS(0), NICS(1), NICS_zz scans (DFT)	Published for all five bases; methods vary, ~0.5 ppm scatter
π-electron density profile	Computed electron density (B3LYP or higher)	Published
Heteroatom positions	Crystal structure	Standard
Static dipole moment	Spectroscopy or DFT	Published
Aromatic-vortex coupling strength	Framework parameter	One overall scaling factor

The pyrimidines (C, T, U) each carry one toroidal vortex over their 6-electron ring. The purines (A, G) carry a more complex profile: a 10-electron ring current distributed over the fused 6-5 system, with the dominant contribution from the larger ring and a secondary lobe over the 5-membered ring. Standard NICS work (originated by Schleyer and refined by many groups since) places purines as more aromatic than pyrimidines, with within-class orderings that depend on basis set and probe location — orderings that require primary-source verification before they enter a quantitative calculation.

Stacking three bases into a codon stamp

A codon is three stacked bases. The stacking geometry depends on context:

In B-DNA storage: rise h \approx 3.4 Å, twist \Delta\theta \approx 34.3°/\text{bp} (the 10.5 bp/turn pitch developed in B-DNA’s Pitch from the Packing Fraction).
At the ribosomal A-site: the codon-anticodon mini-duplex is in A-form RNA geometry, with h \approx 2.8 Å and \Delta\theta \approx 32.7°/\text{bp}.

Both geometries appear in the cellular lifecycle of any codon. The calculations below use B-DNA values for the codon-distance matrix (the storage form where the stamp persists between expression events) and A-form values for the codon-anticodon binding matrix (the reading form at the ribosome).

The codon stamp is

\Phi_C(\vec{r}) \;=\; \sum_{n=0}^{2}\,R_{n\Delta\theta}\;\phi_{B_n}\!\bigl(\vec{r}-n\,h\,\hat{z}\bigr)

where C = B_0 B_1 B_2 is the codon read 5' \to 3', \hat{z} is the polar axis, and R_{n\Delta\theta} is rotation about \hat{z}. Off-axis, \Phi_C(\vec{r}) is a 3D function with three peaks at z = 0,\,h,\,2h, each rotated relative to the last, each carrying the in-plane fingerprint of its base. A codon’s stamp is therefore a function on \mathbb{R}^3, not a number — and the substrate’s claim is that this function (not the amino-acid label) is what the next residue’s local field sees as the peptide bond forms.

Distance on stamp space

The natural metric is L2 on the displacement-field space, normalized so identical codons sit at distance 0:

d(C_1, C_2) \;=\; \left[\frac{\int|\Phi_{C_1}-\Phi_{C_2}|^2\,d^3r}{\tfrac{1}{2}\!\int(|\Phi_{C_1}|^2+|\Phi_{C_2}|^2)\,d^3r}\right]^{1/2}.

It is symmetric, scale-invariant, and identically zero on coincident stamps. The equivalent cosine form

d_\text{cos}(C_1, C_2) \;=\; 1 - \frac{\langle \Phi_{C_1},\,\Phi_{C_2}\rangle}{\|\Phi_{C_1}\|\,\|\Phi_{C_2}\|}

reads as a similarity score and is easier to relate to existing codon-usage correlation tooling.

Two simplifications fall out of the convolution structure:

Single-position substitutions (e.g., CUU vs CUC, differing only at position 3): the integral over the unchanged positions cancels, and the distance reduces to a per-position contribution determined by the difference of two per-base profiles: d\bigl(B_0 B_1 B^{(1)},\,B_0 B_1 B^{(2)}\bigr)^{2} \;\propto\; \int\bigl|\phi_{B^{(1)}}-\phi_{B^{(2)}}\bigr|^2\,d^3 r. Per-base distances are the building blocks of the whole 64×64 matrix.
Cross-family substitutions (e.g., serine UCC vs serine AGC) differ at two positions, so two per-base distances stack. Because purine-vs-pyrimidine differences are the largest per-base distances in the alphabet (one vortex vs two on a fused ring), any pair with two purine/pyrimidine swaps sits near the top of the distance distribution.

Worked example: leucine CUU vs CUC

Both CUU and CUC code for leucine; they differ only at the third (wobble) position. By simplification 1,

d(\text{CUU}, \text{CUC})^2 \;\propto\; \int\bigl|\phi_U(\vec{r}) - \phi_C(\vec{r})\bigr|^2\,d^3 r.

The U-vs-C base distance has three contributions:

Ring core: both pyrimidines; both 6-electron rings; both have nitrogens at positions 1 and 3. The toroidal vortex over the ring is similar in shape; magnitudes differ by the NICS ratio between C and U, which DFT places at \mathcal{O}(1) rather than \mathcal{O}(\text{a few}).
Exocyclic substituent at C4: amino group (–NH₂) on C, carbonyl (=O) on U. This shifts the local dipole magnitude and orientation, and changes the electron density at the spatial location where the next-base stack sees it.
Position 5: both C and U carry an H at C5 (no methyl — that’s thymine). No additional contribution.

The qualitative prediction: the two leucine codons sit close in stamp space (same first two bases, same pyrimidine class at the wobble), and should be among the most freely interchangeable synonymous pairs across orthologs.

Cross-family example: the six serines

Serine has six codons: UCU, UCC, UCA, UCG (the UCx family) and AGU, AGC (the AGy family). Intra-family distances are all single-position wobble substitutions (small). Inter-family distances involve substitutions at both positions 1 and 2 — U→A and C→G, each a one-vortex → two-vortex transition — so they sit at the high end of the stamp-distance distribution.

The framework’s prediction is sharp: UCx and AGy serines are biologically interchangeable at the amino-acid level but substrate-distinguishable. They should not be evolutionarily interchanged freely in conserved structural contexts; codon usage between the two families should track tissue-specific or structure-specific demands rather than be uniform. Mammalian and bacterial codon-usage tables already show strong context-specific biases between the UCx and AGy serines; the question the metric answers is whether the direction and magnitude of those biases track the predicted stamp distance once tRNA abundance is controlled for.

Cognate anticodon recognition: 64 of 64

The per-base profiles \phi_B also define the cross-bridge pair-interaction energy the base-pairing section builds on. A codon-anticodon mini-duplex at the ribosomal A-site is three stacked Watson-Crick pairs in A-RNA geometry, and its binding energy — in the nearest-neighbor convention biophysics uses for duplex stability — is the sum of three per-pair WC energies, each computable from the same Gaussian-overlap machinery the pair test uses. The framework’s “opposites attract” claim (channel-with-memory) makes a concrete prediction: for every codon, the position-wise complementary anticodon should be the strongest binder among all 64 possible triplets.

The script compute_codon_anticodon.py builds the 64×64 codon-anticodon binding matrix from exactly the per-base profiles that pass the pair-level tests and the codon-stamp matrix sanity checks. No parameter tuning beyond the inherited pair model:

Test	Result
Cognate anticodon ranks #1 by binding	64 / 64 codons
Mean selectivity gap (E of 2nd-best minus E of cognate)	+3.33 (arb. units)
Wobble (G:U with +2 Å lateral shift at position 2) remains attractive	16 / 16 codons ending in C
Mean codon-anticodon binding, canonical (G:C at position 2)	−10.77
Mean codon-anticodon binding, wobble (G:U with shift)	−7.84
Mean wobble softening	+2.93 (about 27% weakening)

64×64 heatmap of codon-anticodon binding energies in A-RNA geometry. The matrix is reordered so cognate complementary pairings lie on the main diagonal, which appears as a clear red line of attractive binding running through the figure. Black boxes along the diagonal mark amino-acid groups of synonymous codons. Off-diagonal cells are mostly light blue (mildly repulsive) with strongly-blue cross-bands indicating purine-purine clashes between purine-rich codons and purine-rich anticodons. — **The codon-anticodon binding matrix in A-RNA geometry.** Rows index codons; columns are reordered so column j is the cognate (Watson-Crick complementary) anticodon of row j’s codon. The cognate diagonal therefore runs from top-left to bottom-right, and the framework’s “opposites attract” claim is the prediction that this diagonal lights up as the most attractive cells of the matrix. It does: every one of the 64 diagonal cells sits in the attractive (red) range, with discrete banding tracking GC content (3 G/C pairs → −12.5; 0 → −7.4 arb. units). Black boxes group synonymous codons by amino acid. Deep-blue off-diagonal bands are the framework-predicted purine-purine clashes (Pu:Pu shell repulsion, Proposition 1); the color scale is clamped at ±12.5 so the diagonal saturates the attractive end and outlier clashes (up to +280) clip to deepest blue. The mean attraction gap between cognate diagonal (−9.9) and off-diagonal mean (+32.5) is +42.5 — structural discrimination, not subtlety.

Every codon has a uniquely best partner. The selectivity gap is structural and consistent rather than thermally dominant — a sequence-recognition prediction at the pair level, not an absolute-binding-strength prediction. The wobble extension (G:U pair shifted laterally by ~2 Å, the geometry observed in real wobble pairs) retains most of the canonical G:C binding, dropping from −4.15 (arb.) at G:C to −1.22 at G:U wobble. At the codon-anticodon level the binding weakens from −10.77 to −7.84 — the 27% loss biology tolerates, and the loss the genetic code routes onto the position-3 degeneracy of synonymous codons.

A subtler consequence of binding being a sum of three independent pair energies is that the model is position-blind: a substitution at the 5′ end of the codon and the 3′ end give identical mean energy penalties when averaged across all sequences. The substrate gives the same binding strength to any pair at any stack position. The genetic code’s position-3 degeneracy is therefore not, in this framework, a property of substrate softness at the third base — it is a property of the wobble pair’s own geometric freedom (G:U is allowed to slide laterally; G:C, A:T, A:U are not), and of the genetic code’s choice to route synonymous-codon redundancy onto the position where that freedom is realized. A direct corollary: engineered tRNAs with G:U at the 1st or 2nd codon position should suffer the same binding-energy penalty as at the 3rd position; they are disfavored in biology not because the substrate forbids them but because mistranslating at the first two positions usually changes the amino-acid identity, while at the third position it usually does not.

This sharpens the claim the base-pairing section made on its own. There, the framework noted that “the position in the codon where the substrate’s grip is weakest — the third position, where wobble is allowed — is exactly the position where the genetic code is most degenerate.” The codon-anticodon calculation refines that: the substrate’s grip strength is in fact position-independent at the pair level; what makes position 3 special is the geometric admissibility of G:U, not anything intrinsic about position 3 itself.

Three observables, one set of inputs

The codon stamp and the codon-anticodon binding share a chain of observables, all from one set of per-base inputs with one overall scaling factor:

Observable	Validation	Source
Pair-level WC binding, 4 framework propositions	4 / 4 passing in arb units	`validate_pairs.py`
64×64 codon distance matrix, 4 sanity checks	4 / 4 passing; syn / non-syn ratio = 1.75	`compute_codon_matrix.py`
64×64 codon-anticodon binding, 3 predictions	3 / 3 passing	`compute_codon_anticodon.py`

Three independent classes of substrate observable — pair energies, codon-stamp similarities, codon-anticodon binding — all derived from one set of per-base profiles with no per-observable tuning. The G:C / A:T ratio (1.69, in the measured 1.8–2.0 range, above the additive H-bond-counting prediction of 1.5), the synonymous codon-stamp distance ratio (1.75), and the codon-anticodon cognate selectivity (64 / 64 rank #1) all emerge from the same inputs.

The pair-level wobble and G:C non-additivity results feed back into the base-pairing section’s framework predictions: the G:U wobble pair recovers from a repulsive +3.50 at canonical placement to an attractive −1.22 with the +2 Å lateral shift — softer than G:C at −4.15, in the predicted direction with no parameter tuning. What remains is absolute calibration (one overall scaling factor against measured pair free energies) before the arb-unit numbers can be compared to experiment in kcal/mol.

What this still predicts (and what would falsify it)

The three observables above are computed; their qualitative directions and orderings land where the framework expects. Three further predictions extend the picture into evolutionary and biomedical data:

Synonymous codon usage bias correlates with stamp distance. Codons separated by larger stamp distances should show stronger context-specific usage; codons close in stamp space should be more freely interchanged. The correlation should survive after controlling for tRNA-abundance effects.
Synonymous SNPs disrupting protein function track stamp distance. “Silent” mutations cause disease at a rate that is not zero. The framework predicts that the disease rate per synonymous substitution should correlate with the stamp distance between original and substituted codon, not just with codon-usage frequency.
The genetic code’s non-randomness reflects stamp similarity, not only error-minimization. The codon table groups physicochemically similar amino acids near each other in codon space. In this framework, that pattern partly reduces to stamp similarity — codons close in stamp space code for amino acids whose substrate environments are also close. This is a falsifiable competitor to the standard error-minimization explanation.

The picture is falsified if, with chemistry-only inputs and one scaling factor, the 64×64 distance matrix shows no correlation with synonymous-codon usage variation across orthologs once tRNA abundance is controlled. It is supported if the correlation appears and matches the predicted direction. The calculation is concrete; the data is public; the answer is binary.

Honest assessment

What is solid at this stage: the framework, the per-base inputs (all five bases have published NICS, electron density, and dipole data), the convolution geometry, the metric definition, and three independent classes of substrate observable — pair-level WC binding, the 64×64 codon-stamp distance matrix, and the 64×64 codon-anticodon binding matrix — each landing in the framework-predicted direction without per-observable tuning. The substrate prediction holds at the level of qualitative orderings and directional energy gaps; absolute magnitudes still ride on one unset scaling factor.

What is not yet solid:

The absolute magnitudes of the distances. One scaling factor must be set from data; nothing in the framework currently pins it.
The precise pairwise ordering when per-base differences are small (within-pyrimidine and within-purine distances).
Whether the small magnitude of synonymous-codon stamp differences is enough to bias evolution against the much larger noise from tRNA abundance, mRNA secondary structure, and translation context. This is the substantive open question.
The geometry choice (B-form storage vs A-form ribosomal A-site). The qualitative metric should be robust across both, but a quantitative result requires committing to a geometry; both passes are worth running.

The codon stamp is offered here as a falsifiable proposal whose three computed observables already land in the predicted direction. The remaining follow-up is mechanical: run the 64×64 matrix against published codon-usage data across orthologs, and against the AlphaFold-style structural datasets that look for residual codon-correlated variance at each amino-acid position. The pre-substrate ribosomal-RNA and tRNA-folding observations — patterns suggesting structural codon-stamp matching on rRNA stems and tRNA L-shapes — would, if the stamp works, finally have the physical model they were always missing.