Mesh Storage Layer

The mesh storage layer provides fault-tolerant, encrypted data persistence across the Zentalk network. This chapter describes the complete architecture: from client-side encryption through Reed-Solomon erasure coding, Kademlia DHT-based distribution, capacity management, and automatic self-healing.

Zentamesh Data Flow

User Data

AES-256-GCM Encryption

Encrypted Ciphertext

Reed-Solomon (15,10)

Shard 1

Shard 2

...

Shard 14

Shard 15

Kademlia DHT Lookup

Node A

Node B

...

Node N

Node O

15 Shards Distributed

User data is encrypted client-side, split into 15 erasure-coded shards, and distributed across mesh nodes via Kademlia DHT. Any 10 of 15 shards can reconstruct the original data.

Design Goals

The mesh storage layer must satisfy five requirements simultaneously:

Confidentiality: Stored data must be unreadable by mesh node operators, even if they have full physical access to the storage hardware.
Availability: Data must remain accessible even if multiple nodes fail simultaneously.
Durability: Data must not be lost even under adverse network conditions (node departures, hardware failures, network partitions).
Efficiency: Storage overhead must be minimized relative to the fault tolerance provided.
Scalability: The system must handle growth from hundreds to millions of users without architectural changes.

Client-Side Encryption

All data is encrypted on the user's device before transmission to the mesh network. The mesh nodes never receive plaintext data.

Encryption Design

Algorithm: AES-256-GCM (Authenticated Encryption with Associated Data)

The encryption key is derived from the user's identity key through PBKDF2-SHA256 with a cryptographically random salt and high iteration count, ensuring that the key derivation is computationally expensive to brute-force. A fresh random salt is generated per encryption operation, preventing key reuse across different data objects.

The output is a self-contained binary blob comprising a version identifier, the salt (required for key re-derivation during decryption), an initialization vector (GCM nonce), the ciphertext, and an authentication tag. This format ensures that any mesh node or network observer receives only an opaque encrypted payload -- the mesh is a blind courier that stores and forwards data whose content it cannot read and whose integrity it cannot compromise without detection.

AES-256-GCM provides both confidentiality (the ciphertext reveals no information about the plaintext) and integrity (any modification of the ciphertext is detected upon decryption via the authentication tag). These properties hold regardless of the mesh node operator's intentions or capabilities.

Reed-Solomon Erasure Coding

Erasure Coding Principle

Reed-Solomon codes are maximum distance separable (MDS) error-correcting codes that achieve the theoretical optimum for the trade-off between redundancy and fault tolerance [Reed and Solomon, 1960]. An $(n, k)$ code can recover the original $k$ data symbols from any $k$ of the $n$ encoded symbols, tolerating the loss of up to $n - k$ symbols.

Zentalk uses a $(15, 10)$ Reed-Solomon code over $\text{GF}(2^8)$ : encrypted data is split into 10 data shards, 5 parity shards are computed through linear combination over the finite field, and the resulting 15 shards are distributed across 15 independent validator nodes via Kademlia. Any 10 of the 15 shards are sufficient to reconstruct the original data. This is mathematically optimal — no code with the same parameters can tolerate more than 5 simultaneous failures.

The encoding and recovery processes rely on the algebraic properties of Vandermonde matrices over $\text{GF}(2^8)$ , which guarantee that any $10 \times 10$ submatrix of the generator matrix is invertible. Recovery is therefore a deterministic linear algebra operation: select any 10 available shards, construct the corresponding submatrix, invert it, and multiply to recover all original data shards. The mathematical details are standard [see Plank, 2013 for a comprehensive treatment].

Efficiency

Strategy	Overhead	Fault Tolerance
No redundancy	$1.0\times$	0 failures
$3\times$ replication	$3.0\times$	2 failures
$\text{RS}(10,5)$	$1.5\times$	5 failures

Reed-Solomon $(10,5)$ tolerates 2.5x more failures than triple replication while using half the storage. This makes it the optimal choice for a decentralized network where node failures are routine.

Kademlia DHT Distribution

XOR Distance Metric

The Kademlia Distributed Hash Table uses the XOR metric to define "distance" between keys and node identifiers:

d(A, B) = A \oplus B

Properties:

$d(A, A) = 0$ (identity)
$d(A, B) > 0$ for $A \neq B$ (positivity)
$d(A, B) = d(B, A)$ (symmetry)
$d(A, B) \oplus d(B, C) = d(A, C)$ (XOR transitivity, NOT triangle inequality)

Note that XOR distance does not satisfy the standard triangle inequality $d(A,C) \leq d(A,B) + d(B,C)$ in the additive sense. Instead, it satisfies the ultrametric property: $d(A,C) \leq \max(d(A,B),\, d(B,C))$ . This stronger property is what makes Kademlia routing efficient -- the distance to any target can only decrease or stay the same as lookup progresses through the routing table [Maymounkov and Mazieres 2002].

A unique structural property of XOR distance is that for any point $A$ and any distance $\delta > 0$ , there exists exactly one point $B$ such that $d(A, B) = \delta$ . This ensures that lookups converge deterministically and uniquely.

Kademlia DHT Iterative Lookup

Requesting Node

Target Key (SHA-256 hash)

FIND_NODE query

Node A (d=128)

Node B (d=97)

Node C (d=156)

Closer Nodes Returned

Node D (d=42)

Node E (d=31)

Node F (d=58)

Converging

Node G (d=8)

Node H (d=3)

Target Found (d=0)

Each round queries alpha=3 nodes in parallel; the closest responder (lowest XOR distance d) seeds the next round, converging in O(log N) steps.

Storage Key Generation

Each piece of data to be stored is assigned a deterministic storage key:

\text{storage\_key} = \text{SHA-256}(\text{normalized\_address} \| \texttt{":"} \| \text{chunk\_id})

The deterministic key ensures that any node can independently compute where a piece of data should be stored, enabling decentralized routing without a central directory. Because the key is derived from the owner's address and a data-type identifier, storage locations are reproducible -- any node in the network can locate any user's data by computing the same hash.

Shard Placement

Given a storage key and 15 encoded shards, the placement algorithm queries the DHT for the nearest nodes to the storage key (querying more candidates than needed to account for offline or full nodes). Candidates are filtered by capacity state using a multi-tier fallback strategy: the algorithm prefers nodes with healthy capacity, falls back to moderately loaded nodes, and as a last resort stores shards locally. Each of the 15 shards is placed on a distinct node to maximize independence of failure domains. Shard locations are recorded in metadata so that any future retrieval or repair operation can locate them.

Node Discovery and Routing Table

Each node maintains a Kademlia routing table organized into $k$ -buckets:

One bucket per bit position in the $256$ -bit address space
Each bucket holds up to $k$ peers (a configurable protocol parameter)
Bucket $i$ contains peers with distance $d$ where $2^i \leq d < 2^{i+1}$

Nodes discover peers through an initial bootstrap phase (connecting to well-known entry points), periodic refresh (querying random keys in each bucket to discover new peers), and continuous connection maintenance (health-checking connected peers and replacing stale entries).

Capacity Management

Each node monitors its storage utilization and announces its state to the network through a graduated capacity model. As utilization increases, the node's behavior becomes progressively more restrictive -- from accepting all traffic at low utilization, through deprioritizing new shard placement, to rejecting all new data when near capacity. This graduated response ensures that the network sheds load gracefully rather than experiencing abrupt failures.

Nodes periodically publish their capacity state to the DHT, allowing other nodes to make informed placement decisions. When a node approaches its capacity limits, a rebalancing process migrates its least-accessed shards to healthier nodes, verifying integrity at each step. The rebalancing is rate-limited to prevent oscillation and network congestion.

Anti-Entropy and Self-Healing

Health Monitoring

The anti-entropy service continuously monitors shard health by periodically scanning registered chunks in batches, prioritizing those that have not been checked recently. For each chunk, the service queries all shard locations and counts available shards. Health is classified on a graduated scale: all shards present (no action), minor degradation (monitor), moderate degradation (queue repair), minimum viable shards (urgent repair), or below reconstruction threshold (unrecoverable; alert and log).

Automatic Repair

When a chunk's health degrades below the monitoring threshold, the repair process is triggered. The algorithm retrieves the minimum required shards from surviving nodes in parallel, uses Reed-Solomon decoding to reconstruct all $15$ shards (any $10$ sufficient due to the MDS property), and places the missing shards on new nodes selected from the DHT -- preferring nodes that are geographically diverse and in Normal capacity state. Hash integrity is verified after each transfer and chunk metadata is updated with the new shard locations. Failed repairs are retried with exponential backoff up to a bounded maximum number of attempts.

Adaptive Media Chunking

Large media files (images, videos, documents) use an additional chunking layer before Reed-Solomon encoding. The chunk size adapts to the file size -- smaller files are transmitted as a single unit, while larger files are split into chunks scaled to balance transfer efficiency against per-chunk overhead. A protocol-defined maximum file size prevents abuse.

Integrity Verification

Each chunked media upload produces a signed manifest that records the hash of every chunk and a checksum of the original file. The manifest is signed with the sender's Ed25519 identity key, preventing malicious mesh nodes from modifying it. This creates a three-layer integrity guarantee:

Manifest signature: detects manifest modification by any mesh node
Per-chunk hashes: detects individual chunk replacement or corruption
File checksum: detects reordering or partial substitution of chunks

During download, the recipient verifies the manifest signature, checks each chunk's hash, decrypts and reassembles the file, and verifies the final checksum -- ensuring end-to-end integrity from sender to recipient with no trust placed in any intermediate node.

Scalability Analysis

Steady-State Property

Due to TTL-based expiration, the mesh reaches a bounded steady state where data ingestion rate equals data expiration rate. After the longest TTL period has elapsed, every byte entering the network is matched by a byte expiring:

S_{\max} = D \times T_{\max}

where $D$ is the daily data ingestion rate and $T_{\max}$ is the longest TTL in the system. This remains constant regardless of how long the network operates.

Storage does not grow unboundedly. It stabilizes at approximately $D \times T_{\max}$ and remains there as long as usage patterns are stable. The per-node storage burden decreases as more nodes join the network, since the aggregate data is distributed across a larger set of participants. This bounded growth model ensures that the hardware requirements for node operation do not increase over time, preserving the economic accessibility of participation.

Design Goals​

Client-Side Encryption​

Encryption Design​

Reed-Solomon Erasure Coding​

Erasure Coding Principle​

Efficiency​

Kademlia DHT Distribution​

XOR Distance Metric​

Storage Key Generation​

Shard Placement​

Node Discovery and Routing Table​

Capacity Management​

Anti-Entropy and Self-Healing​

Health Monitoring​

Automatic Repair​

Adaptive Media Chunking​

Integrity Verification​

Scalability Analysis​

Steady-State Property​