Theory — UQA

Overview

UQA is built on five interconnected theoretical papers that establish a mathematical foundation spanning database theory, information retrieval, probabilistic inference, neural computation, and vector calibration. Each paper extends the previous one, forming a chain of derivations from first principles.

graph LR
    P1["Paper I
Unified Framework
(Dec 2023)"] --> P2["Paper II
Graph Extension
(Dec 2024)"]
    P1 --> P3["Paper III
Bayesian BM25
(Jan 2026)"]
    P3 --> P4["Paper IV
Neural Emergence
(Feb 2026)"]
    P2 --> P4
    P3 --> P5["Paper V
Vector Calibration
(Mar 2026)"]
    P1 --> P5

I. Unified Framework

Posting lists as universal abstraction. Complete Boolean algebra over relational, text, and vector paradigms. Category-theoretic soundness.

Read Paper →

II. Graph Extension

Graph-posting list isomorphism. Traversal, pattern matching, and regular path queries integrated into the Boolean algebra.

Read Paper →

III. Bayesian BM25

BM25 scores transformed to calibrated probabilities via Bayesian inference. Sigmoid likelihood, composite prior, base rate calibration.

Read Paper →

IV. Neural Emergence

Neural network structure emerges analytically from multi-signal Bayesian inference. Activation functions as probabilistic answers.

Read Paper →

V. Vector Calibration

Vector similarity scores calibrated to relevance probabilities via likelihood ratios over ANN index statistics. Structural identity with Bayesian BM25.

Read Paper →

I. A Unified Mathematical Framework for Query Algebras

Jaepil Jeong, Cognica, Inc. — December 2023 · PDF

The foundational paper establishes that posting lists — sorted sequences of $(id, payload)$ pairs — can serve as a universal abstraction for representing result sets across relational, text retrieval, and vector search paradigms. This observation enables a single algebraic structure that preserves the expressivity of each paradigm while supporting cross-paradigm composition.

Posting Lists as Universal Abstraction

A posting list is an ordered sequence where document identifiers are sorted in ascending order:

L = [(id_1, payload_1),\ (id_2, payload_2),\ \ldots,\ (id_n, payload_n)] \quad\text{where } id_i < id_j \text{ for all } i < j

A bijective mapping $PL: 2^{\mathcal{D}} \rightarrow \mathcal{L}$ converts document sets to posting lists and back. This bijection is the key insight: any operation on document sets has a corresponding operation on posting lists, and vice versa. The sorted-order invariant enables efficient two-pointer merge algorithms for all Boolean operations.

Complete Boolean Algebra

The structure $(\mathcal{L},\ \cup,\ \cap,\ \overline{\cdot},\ \emptyset,\ \mathcal{D})$ forms a complete Boolean algebra, satisfying:

Commutativity: $A \cup B = B \cup A$ and $A \cap B = B \cap A$
Associativity: $A \cup (B \cup C) = (A \cup B) \cup C$
Distributivity: $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$
Identity: $A \cup \emptyset = A$ and $A \cap PL(\mathcal{D}) = A$
Complement: $A \cup \overline{A} = PL(\mathcal{D})$ and $A \cap \overline{A} = \emptyset$

Completeness means that for any family of posting lists $\{L_i\}_{i \in I}$, both $\bigvee_{i \in I} L_i$ and $\bigwedge_{i \in I} L_i$ exist. This is critical: it guarantees that arbitrary Boolean combinations of queries across different paradigms are algebraically well-defined, and that optimization can exploit lattice-theoretic rewrite rules (e.g., De Morgan's laws, absorption, distributivity) to restructure query plans.

Primitive Operators

Each data paradigm maps into the posting list space through primitive operators. Because every operator produces a posting list, they compose freely:

Operator	Definition	Paradigm
$T(t)$	$PL(\lbrace d \in \mathcal{D} \mid t \in term(d, f) \rbrace)$	Text retrieval
$V_\theta(q)$	$PL(\lbrace d \in \mathcal{D} \mid sim(vec(d, f), q) \geq \theta \rbrace)$	Vector search
$KNN_k(q)$	$PL(D_k)$ where $\|D_k\| = k$, ranked by similarity	Vector search
$Filter_{f,v}(L)$	$L \cap PL(\lbrace d \in \mathcal{D} \mid d.f = v \rbrace)$	Relational
$Score_q(L)$	$(L, [s_1, \ldots, s_{\|L\|}])$	Scoring

A hybrid text + vector search is simply a Boolean intersection: $Hybrid_{t,q,\theta} = T(t) \cap V_\theta(q)$. More complex compositions — "documents matching query terms AND near a vector AND in category X" — are equally natural because all operators share the posting list type. The paper proves compositional completeness (Theorem 3.3.5): any query expressible as a combination of relational, text, and vector operations has a representation in the unified algebra.

Cross-Paradigm Join Operations

The framework extends posting lists to generalized posting lists with tuple-valued identifiers to support join operations:

\text{Join}_{f_1=f_2}(L_1, L_2) = PL'(\lbrace(d_1, d_2) \mid d_1 \in doc'(L_1),\ d_2 \in doc'(L_2),\ val(d_1, f_1) = val(d_2, f_2)\rbrace)

Inner, left outer, right outer, full outer, cross, and similarity joins are all defined within this framework. The paper proves commutativity, associativity, and distributivity of join operations over Boolean operations — properties essential for join order optimization.

Category-Theoretic Foundation

The paper establishes a category $\mathcal{C}$ where objects are posting lists and morphisms are query operators. A functor $F: \mathcal{C} \rightarrow \mathbf{Set}$ maps posting lists to their document sets and operators to functions on sets. A query monad $(T, \eta, \mu)$ provides the formal foundation for operator composition and optimization. This machinery guarantees that optimization rewrites (e.g., filter pushdown, intersection reordering) are semantics-preserving transformations, not ad-hoc heuristics.

II. Extending the Framework to Graph Data Structures

Jaepil Jeong, Cognica, Inc. — December 2024 · PDF

The second paper incorporates graph data structures into the unified algebra. The key challenge is showing that graph operations — traversal, pattern matching, regular path queries — can be represented within the existing Boolean algebra without breaking its completeness properties.

Graph-Posting List Isomorphism

A graph posting list pairs identifiers with graph structures: $L_G = [(id_1, G_1), (id_2, G_2), \ldots, (id_n, G_n)]$. The isomorphism $\Phi: \mathcal{L}_G \rightarrow \mathcal{L}$ maps graph posting lists to standard posting lists via a bijective mapping $\phi_{G \rightarrow D}: \mathcal{G} \rightarrow 2^{\mathcal{D}}$ between graphs and document sets:

\Phi(L_G) = PL\!\left(\bigcup_{i=1}^{n} \phi_{G \rightarrow D}(G_i)\right)

This isomorphism preserves Boolean operations: $\Phi(L_G^1 \cup_G L_G^2) = \Phi(L_G^1) \cup \Phi(L_G^2)$ and $\Phi(L_G^1 \cap_G L_G^2) = \Phi(L_G^1) \cap \Phi(L_G^2)$. The extended structure $(\mathcal{L}_G,\ \cup_G,\ \cap_G,\ \overline{\cdot}_G,\ \emptyset_G,\ \mathcal{G})$ forms a Boolean algebra isomorphic to the original. Graph operations thus integrate seamlessly with relational, text, and vector operations without requiring a separate query engine. Named graphs extend the isomorphism to per-graph partitioned adjacency: each named graph defines its own graph posting list space $\mathcal{L}_G^{(g)}$, and the isomorphism $\Phi^{(g)}: \mathcal{L}_G^{(g)} \rightarrow \mathcal{L}$ restricts to the partition's edge set while preserving all Boolean algebra properties.

Graph-Specific Operations

Three operators extend the algebra to handle graph-specific queries:

Graph Traversal: $Traverse_{v,l,k}(L_G)$ — returns all subgraphs reachable from vertex $v$ via edges with label $l$ up to $k$ hops. Cost: $O(\sum_{i=1}^{k} d^i)$ where $d$ is the average degree.
Pattern Matching: $GMatch_P(L_G)$ — finds all subgraph isomorphisms matching pattern $P$. NP-complete in general; $O(|V_G|^{|V_P|})$ worst case.
Vertex Aggregation: $VertexAgg_{p,agg}(L_G)$ — applies aggregation function $agg$ to property $p$ across all vertices.

Regular Path Queries

Regular path expressions extend graph traversal with the full power of regular languages. Given an expression $R$ over edge labels (supporting concatenation $R_1 \cdot R_2$, alternation $R_1 | R_2$, and Kleene star $R_1^*$), the operator $RPQ_R(L_G)$ returns all vertex pairs connected by paths matching $R$. The evaluation complexity is $O(|V_G|^2 \cdot |R|)$ using dynamic programming on the NFA structure.

The paper establishes an expressivity hierarchy: $\text{Relational} \subsetneq \text{Relational + Graph} \subsetneq \text{Relational + Graph + RPQ}$. Path reachability queries cannot be expressed in standard relational algebra with a fixed number of joins, so graph operations genuinely extend the algebra's expressive power.

Cross-Paradigm Integration

The graph extension defines integration operators for every paradigm combination:

Graph-Relational: $ToGraph_{v_f, e_f}$ constructs graphs from relational data; $FromGraph_{v_f, e_f}$ flattens graphs back to relations.
Graph-Vector: $VectorMatch_{P,q,\theta}$ finds subgraphs that both match a structural pattern $P$ and have vector similarity above $\theta$.
Graph-Text: $TextToGraph$ constructs co-occurrence graphs from text; $SemanticGraphSearch_{q,\theta}$ finds subgraphs semantically related to a query.

A category-theoretic functor $F_G: \mathcal{C}_G \rightarrow \mathcal{C}$ preserves the algebraic structure, and a graph monad $(T_G, \eta_G, \mu_G)$ provides the composition semantics. The adjunction $T_G \dashv F_G$ formally connects graph operations to the base query algebra.

III. Bayesian BM25: A Probabilistic Framework for Hybrid Search

Jaepil Jeong, Cognica, Inc. — January 2026 · PDF

The third paper addresses a fundamental problem in hybrid search: BM25 scores are unbounded real numbers that cannot be directly combined with bounded signals like cosine similarity. Bayesian BM25 transforms BM25 scores into calibrated probabilities in $[0, 1]$, enabling principled multi-signal fusion.

The BM25 Score Interpretation Problem

BM25 scores are defined as:

\text{BM25}(D, Q) = \sum_{t \in Q} \text{IDF}(t) \cdot \frac{f(t, D) \cdot (k_1 + 1)}{f(t, D) + k_1 \cdot B(D)}

These scores have range $[0, +\infty)$, vary with query length and term specificity, depend on corpus statistics, and cannot be meaningfully combined with bounded signals. The paper also demonstrates that Reciprocal Rank Fusion (RRF) — the most common alternative — suffers from information loss (discarding score magnitudes), rank sensitivity, non-commutativity with filtering, and an arbitrary constant $k=60$ with no theoretical justification.

Sigmoid Calibration via Bayesian Inference

The paper derives the sigmoid transformation from Bayes' theorem. Given a binary relevance variable $R \in \{0, 1\}$ and observed score $s$, the likelihood model $P(s \mid R = 1) = \sigma(\alpha(s - \beta))$ with the symmetric assumption $P(s \mid R = 0) = 1 - P(s \mid R = 1)$ yields the posterior:

P(R = 1 \mid s) = \frac{L \cdot p}{L \cdot p + (1 - L) \cdot (1 - p)}

where $L = \sigma(\alpha(s - \beta))$ is the likelihood and $p$ is the prior. The parameters $\alpha$ (steepness) and $\beta$ (midpoint) are estimated from the score distribution, requiring no relevance labels. A composite prior combines term frequency and document length signals to encode a document quality signal distinct from the BM25 relevance score — an empirical Bayes approach.

A key property is monotonicity preservation: for fixed prior $p$ and $\alpha > 0$, $s_1 > s_2 \implies P(R = 1 \mid s_1) > P(R = 1 \mid s_2)$. The Bayesian transformation preserves BM25's ranking while producing calibrated probability outputs.

Three-Term Posterior with Base Rate Prior

A corpus-level base rate $b_r$ captures the global probability that a random document is relevant to a random query. The full posterior decomposes into three additive terms in log-odds space:

\text{logit}(P(R = 1 \mid s, f, \hat{n})) = \underbrace{\text{logit}(L)}_{\text{score evidence}} + \underbrace{\text{logit}(b_r)}_{\text{corpus base rate}} + \underbrace{\text{logit}(p)}_{\text{document quality}}

The additivity in log-odds space is a consequence of sequential Bayes updates — the same property that the fourth paper identifies as the mathematical structure underlying neural network hidden layers. The base rate reduces expected calibration error by 68–77% without requiring relevance labels, and when $b_r = 0.5$ the term vanishes ($\text{logit}(0.5) = 0$), recovering the two-term posterior.

Hybrid Search Fusion

With all signals calibrated to probabilities, the paper defines Boolean combination operations:

Probabilistic AND: $P(\text{AND}) = \prod_{i=1}^{n} p_i$ (product rule under independence)
Probabilistic OR: $P(\text{OR}) = 1 - \prod_{i=1}^{n} (1 - p_i)$ (inclusion-exclusion)
Probabilistic NOT: $P(\text{NOT}) = 1 - p$ (complement)

All operations are performed in log-space for numerical stability. The fourth paper identifies a fundamental problem with the product rule (AND) — conjunction shrinkage — and derives the log-odds conjunction as the solution.

WAND and Block-Max WAND Compatibility

The paper proves that the Bayesian transformation preserves safe document pruning in WAND (Weak AND) and BMW (Block-Max WAND) algorithms. The key insight is that the sigmoid is a monotonically increasing function, so an upper bound on the raw score implies an upper bound on the posterior probability. Modified upper bounds account for the prior's variability across documents, ensuring that no relevant document is incorrectly pruned.

IV. From Bayesian Inference to Neural Computation

Jaepil Jeong, Cognica, Inc. — February 2026 · PDF

The fourth paper demonstrates that feedforward neural network structure emerges analytically from multi-signal Bayesian inference over binary relevance judgments. Starting from the question "what is the probability that a document is relevant given multiple evidence signals?", the paper applies Bayes' theorem and arrives at a neural network — not by design, but as a mathematical consequence.

The Conjunction Shrinkage Problem

The product rule from Paper III has a fundamental deficiency when applied to evidence accumulation. For $n$ independent signals each reporting probability $p$:

\prod_{i=1}^{n} p = p^n \xrightarrow{n \to \infty} 0

The probability monotonically decreases with the number of confirming signals. This violates the intuition of evidence accumulation: agreement among independent sources should increase confidence, not decrease it. The product rule answers "what is the probability that all conditions are simultaneously satisfied?" but the question a search system poses is "how confident should we be given that multiple signals concur?" These are semantically distinct questions.

Log-Odds Conjunction

The resolution works in the log-odds domain — the natural parameter space of the Bernoulli exponential family. The log-odds mean aggregates evidence without conjunction shrinkage:

\bar{\ell} = \frac{1}{n}\sum_{i=1}^{n} \text{logit}(P_i) = \frac{1}{n}\sum_{i=1}^{n} \log \frac{P_i}{1 - P_i}

The log-odds mean is proven to be the exact normalized form of the Logarithmic Opinion Pool (Log-OP), also known as the Product of Experts (Hinton, 2002). It is scale-neutral: if $P_i = p$ for all $i$, then $\sigma(\bar{\ell}) = p$ regardless of $n$. A confidence scaling factor $n^{\alpha}$ (with $\alpha = 0.5$ encoding the classical $\sqrt{n}$ law) amplifies the magnitude while preserving sign:

P_{\text{final}} = \sigma\!\left(\frac{1}{n^{1-\alpha}}\sum_{i=1}^{n} \text{logit}(P_i)\right)

Structural guarantee: The multiplicative form $\ell_{\text{adjusted}} = \bar{\ell} \cdot n^{\alpha}$ preserves the sign of $\bar{\ell}$. If all signals report irrelevance ($P_i < 0.5$ for all $i$), then $P_{\text{final}} < 0.5$ for all $n$ and all $\alpha \geq 0$. No amount of agreement among irrelevant signals can produce a relevance judgment. This guarantee is structural — it cannot be violated by any parameter choice.

$P_{\text{text}}$	$P_{\text{vec}}$	Product Rule	Log-Odds Conjunction	Interpretation
0.9	0.9	0.81	0.96	Agreement amplified
0.7	0.7	0.49	0.77	Moderate agreement preserved
0.7	0.3	0.21	0.50	Exact neutrality (logits cancel)
0.3	0.3	0.09	0.23	Irrelevance preserved

Emergence of Neural Network Structure

The full computation for multi-signal relevance estimation has three stages:

Calibration: Each signal produces a probability $P_i \in (0, 1)$ via sigmoid calibration (BM25), linear calibration (cosine similarity), or external calibration (neural rankers).
Log-Odds Aggregation: Probabilities are mapped to log-odds and aggregated as a weighted sum (normalized Log-OP).
Final Posterior: The result passes through a sigmoid to return to probability space.

graph LR
    S1["Signal 1
BM25 score"] -->|"sigmoid"| P1["P₁"]
    S2["Signal 2
cosine sim"] -->|"linear"| P2["P₂"]
    S3["Signal 3
graph score"] -->|"sigmoid"| P3["P₃"]
    P1 -->|"logit"| L1["l₁"]
    P2 -->|"logit"| L2["l₂"]
    P3 -->|"logit"| L3["l₃"]
    L1 --> SUM["Weighted Sum
1/n^(1-a)"]
    L2 --> SUM
    L3 --> SUM
    SUM -->|"sigmoid"| OUT["P_final"]

The paper identifies this as a feedforward neural network: inputs are calibrated to probabilities (first layer), pass through the logit nonlinearity (hidden layer), undergo linear aggregation, and pass through a sigmoid activation (output layer). This identification is stated as a theorem:

\boxed{P_{\text{final}} = \sigma\!\left(\frac{1}{n^{1-\alpha}} \sum_{i=1}^{n} \text{logit}(P_i)\right)}

When all signals share sigmoid calibration (e.g., multiple BM25 fields), the logit-sigmoid identity $\text{logit}(\sigma(x)) = x$ collapses the hidden layer, reducing the computation to logistic regression — the classical equivalence between Bayesian posterior computation and logistic regression, arrived at from a new direction. When signals have heterogeneous calibrations (the practically important case of hybrid search combining BM25 and cosine similarity), the logit performs a genuine nonlinear transformation, yielding a true two-layer network:

Linear pathway for sigmoid-calibrated signals (logit-sigmoid cancellation)
Nonlinear pathway for differently-calibrated signals (logit as genuine nonlinearity)

Activation Functions as Probabilistic Answers

The paper independently derives each major activation function from a complementary probabilistic question:

Activation	Probabilistic Question	Derivation
Sigmoid $\sigma(x) = \frac{1}{1 + e^{-x}}$	"How probable?" — posterior for binary relevance	Canonical link of Bernoulli exponential family. Uniquely satisfies: maps $\mathbb{R} \to (0,1)$, inverse of logit, self-referential derivative, symmetric evidence, maximum entropy.
ReLU $\max(0, x)$	"How much?" — MAP estimate under sparse non-negative prior	Given signal $z = x + \epsilon$ with exponential prior $p(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ and Gaussian noise, the MAP estimate is $\hat{x}_{MAP} = \max(0, z - \lambda\tau^2)$. With unit-absorbed parameters: $\text{ReLU}(z) = \max(0, z)$.
Swish $x \cdot \sigma(\beta x)$	"What is the expected relevant amount?" — Bayesian posterior mean	Under the same sparse gating structure as ReLU, the posterior mean (instead of posterior mode) yields $\mathbb{E}[x \mid z] = z \cdot \sigma(\beta z)$. The MAP-to-Bayes duality of classical statistics manifests as the ReLU-to-Swish transition.
GELU $x \cdot \Phi(x)$	Gaussian approximation of Swish	Replacing the logistic sigmoid with the probit function $\Phi(x)$ gives GELU. The well-known approximation $\Phi(x) \approx \sigma(1.702x)$ establishes GELU $\approx$ Swish$_{1.702}$.
Softmax	Multi-class posterior — categorical exponential family	Extension of the sigmoid to $K$-class classification via the categorical canonical link.

The sigmoid's dual appearance — as individual signal calibrator and as final output activation — is a consequence of the exponential family structure of Bernoulli random variables. Any system processing binary evidence under the natural constraints of probability will arrive at the same function.

Attention as Logarithmic Opinion Pooling

Relaxing the assumption of uniform signal reliability — allowing weights to depend on the query-signal interaction — extends the derived feedforward structure to the attention mechanism. Attention is identified as a form of context-dependent Logarithmic Opinion Pooling, mathematically equivalent to Hinton's Product of Experts (PoE, 2002).

This provides a probabilistic answer to why attention computes a weighted sum: Product-of-Experts combination is multiplicative in probability space, which is additive in log-odds space. The weighted sum is not an arbitrary aggregation — it is the mathematically necessary operation for Log-OP combination.

The paper further proves that WAND-style token-level pruning and BMW-style head-level pruning are provably exact for attention with sigmoid-bounded value vectors — the first theoretically grounded framework for exact sparse attention.

Depth as Recursive Bayesian Inference

The derived inference unit is the atomic building block of arbitrarily deep networks. Depth arises from iterated marginalization over latent variables: each layer constructs the evidence required by the next. A deep network is a chain of recursive Bayesian inference:

P(R \mid \text{data}) = \int P(R \mid z_L) \cdot P(z_L \mid z_{L-1}) \cdots P(z_1 \mid \text{data})\ dz_1 \cdots dz_L

Each layer constructs intermediate evidence that does not exist in the raw data but is needed for higher-level judgments. The correspondence between activation functions and probabilistic questions reframes architecture design as question sequencing — choosing the order of probabilistic questions posed to the data.

Convergence of three traditions: Probability theory asks "given evidence, what is the posterior?" Information retrieval asks "given query and document features, how relevant?" Neural computation asks "given inputs and parameters, what is the output activation?" All three produce the same core operation: linear combination followed by the logistic sigmoid. $$\text{inputs} \xrightarrow{\text{calibrate}} P_i \xrightarrow{\text{logit}} \ell_i \xrightarrow{\text{linear}} \sigma \xrightarrow{} \text{output}$$

Implementation

The deep_learn() training pipeline implements the analytical training approach: random multi-channel convolution as Bayesian prior, ridge regression as posterior, and Product of Experts local learning with shrinkage correction.

The attention() layer implements Theorem 8.3 with three modes: content ($Q = K = V = X$, parameter-free), random_qk (random projections as ELM prior), and learned_v (supervised search over orthogonal $W_v$ candidates). Each attention head is a parallel PoE aggregator (Remark 8.6). GPU optimization includes adaptive chunk sizing for the attention matrix, precomputed $Q$/$K$ reuse across V candidates, and hybrid ridge solve ($X^T X$ matmul on GPU, linalg.solve on CPU LAPACK).

V. Vector Scores as Likelihood Ratios

Jeong (2026) · Builds on Paper III (Bayesian BM25) and Paper I (Unified Framework)

The Gap: Vector Scores Are Not Probabilities

Paper III calibrates BM25 scores into probabilities via sigmoid transformation. But vector similarity scores — cosine similarity, inner product, Euclidean distance — are geometric quantities, not probabilities. A cosine similarity of 0.85 does not mean an 85% chance of relevance, yet hybrid systems routinely combine such scores with calibrated lexical signals through ad-hoc normalization.

Paper V closes this gap by deriving a principled Bayesian calibration for vector scores, structurally identical to the BM25 calibration of Paper III.

Likelihood Ratio Formulation

The calibration transforms a raw vector score into a posterior probability via a likelihood ratio:

\text{logit}\ P(R=1 \mid d) = \log \frac{f_R(d)}{f_G(d)} + \text{logit}\ P(R=1)

where $f_R(d)$ is the local distance density among relevant documents (how likely a relevant document produces this similarity score) and $f_G(d)$ is the global background density (how likely any document produces this score). The ratio $f_R / f_G$ measures how much the observed score favors relevance over the background — the same Bayesian structure that underlies BM25 calibration.

Index-Derived Statistics

Both densities are extracted from statistics already computed during ANN index construction and search, at negligible additional cost:

Statistic	IVF Source	HNSW Source
Local density $f_R(d)$	Intra-cluster distances within nearest IVF cell	Edge distances in the search trajectory neighborhood
Global density $f_G(d)$	Cross-cell population statistics	Edge distance distribution across all graph layers

No auxiliary training data or held-out relevance judgments are required. The index itself provides both the numerator and denominator of the likelihood ratio.

Unified Log-Odds Integration

The calibrated vector score integrates with Bayesian BM25 through additive log-odds — the same fusion mechanism derived in Papers III and IV:

\text{logit}\ P(R \mid d_{vec}, s_{bm25}) = \underbrace{\log \frac{\hat{f}_R(d)}{f_G(d)}}_{\text{calibrated vector}} + \underbrace{\alpha(s_{bm25} - \beta)}_{\text{calibrated lexical}} + \underbrace{\text{logit}\ P_{base}}_{\text{corpus prior}}

This completes the probabilistic unification of sparse and dense retrieval: both paradigms are calibrated through the same Bayesian likelihood ratio structure, each drawing on the statistics of its native index. The additive structure in log-odds space is the same log-odds conjunction that Paper IV identifies as a neural network layer.

Structural identity: BM25 calibration (Paper III) derives $\text{logit}\ P(R) = \alpha \cdot s_{BM25} - \beta + \text{logit}\ P_{base}$. Vector calibration (Paper V) derives $\text{logit}\ P(R) = \log(f_R / f_G) + \text{logit}\ P_{base}$. Both are additive in log-odds space. Both draw calibration parameters from their respective index structures. The mathematical framework is the same — only the source of the likelihood ratio differs.

Connections Across Papers

The five papers form a coherent theoretical arc. Here are the key connections between them.

From Algebra to Probability

Paper I establishes posting lists as the universal abstraction with Boolean algebra. Paper III asks: what happens when we attach calibrated probabilities to posting list entries? The answer is Bayesian BM25 — and the fusion operations from Paper I (union, intersection) acquire probabilistic semantics (OR = $1 - \prod(1-p_i)$, AND = $\prod p_i$).

From Probability to Neural Structure

Paper III's product-rule AND has a defect (conjunction shrinkage). Paper IV resolves it through log-odds conjunction and discovers that the end-to-end computation is a neural network. The fix for a practical information retrieval problem turns out to be the mathematical structure that a separate scientific tradition calls a "neuron."

Graph Operations in the Probabilistic Framework

Paper II integrates graph traversal and pattern matching into Paper I's Boolean algebra via the graph-posting list isomorphism. In UQA's implementation, graph traversal operators assign a probability score (0.9) to reachable vertices, allowing graph signals to participate in the log-odds fusion framework of Paper IV alongside BM25 and vector signals.

Coverage-Based Fusion Defaults

A practical consequence of the theory: when a fusion signal has zero coverage (e.g., BM25 cannot find a query term in its vocabulary), the default probability should be 0.5 — neutral in log-odds space ($\text{logit}(0.5) = 0$). A fixed default of 0.01 encodes strong negative evidence ($\text{logit}(0.01) = -4.6$), causing extreme score drops. UQA's implementation uses a coverage-based interpolation:

p_{\text{default}} = 0.5 \cdot (1 - r) + 0.01 \cdot r \quad\text{where } r = \frac{\text{signal hits}}{\text{total docs}}

At zero coverage, $p_{\text{default}} = 0.5$ (absence of evidence, not evidence of absence). At full coverage, $p_{\text{default}} = 0.01$ (a missing document is genuinely penalized).

From Lexical to Dense Calibration

Paper III calibrates BM25 (lexical) scores via sigmoid transformation using index-level statistics. Paper V applies the same Bayesian likelihood ratio structure to vector (dense) scores, drawing calibration parameters from ANN index statistics (IVF cell populations, HNSW edge distances). The structural identity between the two calibrations — both additive in log-odds space — means that sparse and dense retrieval signals combine naturally through the same fusion mechanism.

Robertson's Completed Circle

In 1977, Robertson introduced the Probability Ranking Principle, opening a door between probability theory and information retrieval. BM25 was derived from this probabilistic foundation, yet its scores are not probabilities. For nearly fifty years, this circle remained unclosed.

Paper III (Bayesian BM25) closes the circle by returning BM25 scores to probability space. Paper IV reveals what lies on the other side: the computational structure that a separate scientific tradition, developed over different decades and for different purposes, calls a neural network. Paper V extends the closure to vector similarity — the dense counterpart to BM25's sparse retrieval — completing the probabilistic unification of all retrieval paradigms.

The mathematics does not care what we call things. Whether we say "Bayesian posterior" or "sigmoid neuron," "sparse feature detector" or "ReLU unit," "evidence accumulation" or "attention" — the same structures appear wherever information is processed under uncertainty.