Relay Routing Network
The relay routing network provides real-time message delivery between Zentalk users. This chapter describes the routing architecture: direct delivery for online users, durable offline queuing, multi-hop relay routing for metadata privacy, geographic optimization, and federation between relay servers.
Architecture Overview
Every Full Node in the Zentalk network operates a Relay Module responsible for:
- Real-time message delivery to connected recipients
- Offline message queuing for disconnected recipients
- Multi-hop relay routing for sender anonymity (1-5 hops)
- Group/channel message fan-out to multiple recipients
- E2EE key bundle distribution for new session establishment
- Federation between relay servers for cross-relay message delivery
The relay never has access to plaintext message content. All messages arrive pre-encrypted with the Signal Protocol (Chapter 3). The relay processes opaque encrypted blobs, routing them based on recipient addresses without knowledge of the content, sender identity (when sealed sender is used), or conversation context.
Message Delivery Flow
Online Delivery
When both sender and recipient are connected to relays, the message follows the shortest available path. If the recipient is connected to the same relay as the sender, the message is forwarded directly. If the recipient is connected to a different relay, the sender's relay queries the DHT for the recipient's home relay and forwards the message through federation. The relay identifies recipients only by their hashed addresses -- it never sees the plaintext content and, when sealed sender is enabled, does not know the sender's identity either.
Offline Delivery
When the recipient is not connected, the relay durably stores the encrypted message in an offline queue with a bounded TTL. The relay does not acknowledge receipt to the sender until the message has been committed to persistent storage, ensuring that no acknowledged message can be lost even if the relay crashes immediately after acknowledgment. When the recipient reconnects to any relay in the network, queued messages are drained and delivered in chronological order.
Durability Guarantees
The offline queue provides at-least-once delivery semantics through three properties:
Multi-Hop Relay Routing
Protocol Design
Multi-hop relay routing wraps each message in multiple encryption layers, one per relay hop. Each relay can decrypt only its own layer, revealing the next hop but not the ultimate destination (except for the final relay). The innermost layer contains the recipient's address; each outer layer contains only the address of the next relay in the circuit. This layered structure -- analogous to Tor's onion routing -- ensures that no single relay in the circuit can correlate the sender with the recipient.
Circuit Construction
The sender constructs the relay circuit by selecting 1-5 relay hops from the DHT peer list. Relay selection enforces diversity constraints: no two relays in the same circuit may share the same operator, and geographic diversity is required to span multiple jurisdictions. The circuit is built inside-out -- the innermost encryption layer (for the final relay) is constructed first, then each successive outer layer wraps the previous one. Each layer is encrypted with the corresponding relay's public key, ensuring that only that relay can decrypt its routing instructions.
A TTL field tracks the remaining hops, and a payload hash enables each relay to verify message integrity without decrypting the payload. Messages that fail integrity checks or exceed their hop limit are dropped immediately.
Security Properties
What each relay sees:
| Relay Position | Knows Sender? | Knows Recipient? | Knows Content? |
|---|---|---|---|
| First relay | Yes (direct connection) | No | No |
| Middle relay(s) | No | No | No |
| Last relay | No | Yes (delivers to recipient) | No |
| Any relay | — | — | Never (E2EE) |
Threat Analysis
Privacy-Latency Trade-off
Each additional hop adds incremental latency due to the extra network round-trip and decryption step. For most use cases, 3-hop routing provides a good balance between privacy and latency -- the added delay remains well within the bounds of interactive communication. The default configuration uses 1 hop (fastest), with users able to opt into 3 or 5 hops for enhanced privacy.
Relay Selection
When a Zentalk client initiates a connection, it must determine which validator node(s) to route through. This decision is not arbitrary. The client evaluates all candidate relays against a composite scoring function that accounts for four factors: measured latency, geographic proximity, operator reputation, and current load. The result is a ranked list of candidates from which the client selects using weighted random sampling.
Selection Criteria
Each candidate relay is evaluated on the following dimensions:
GeoRouter Scoring
The GeoRouter combines these criteria into a single composite score using a weighted formula. Region proximity carries the highest weight, followed by measured latency, reputation, current load, and health check freshness. Same-region relays receive the strongest preference, while high latency and high load are penalized.
The final selection uses weighted random sampling rather than deterministic best-score selection. This is a deliberate design choice: deterministic selection would cause all clients in a geographic region to converge on the same relay, creating a load hotspot and a single point of failure. Weighted random sampling distributes connections across multiple relays in proportion to their suitability, achieving load distribution while still favoring higher-quality relays.
Multi-Hop Diversity Constraints
When the client constructs a multi-hop relay circuit (as described in the preceding section), each hop is selected not only for individual quality but also for diversity relative to the other hops in the circuit. The circuit construction algorithm enforces three diversity constraints:
I. Operator diversity. No two relays in the same circuit may share the same validator, preventing a single operator from observing both entry and exit points.
II. Geographic diversity. No more than two relays may reside in the same region, ensuring the circuit spans multiple jurisdictions.
III. Network diversity. The circuit avoids relays sharing the same data center or upstream provider, ensuring infrastructure independence across hops.
These constraints may reduce the average score of the selected circuit relative to an unconstrained selection. This is an acceptable trade-off: the privacy properties of multi-hop routing depend on the independence of the relay operators, and diversity constraints are the mechanism that enforces that independence.
Cross-Region Routing
The GeoRouter targets low latency for same-region delivery, moderate latency for adjacent regions, and best-effort for intercontinental routes -- all within the bounds of perceptually responsive interactive communication.
Adaptive Routing
Relay selection is not a one-time decision. The network continuously monitors relay health and adjusts routing in response to changing conditions. This adaptive behavior ensures that transient failures, capacity saturation, and network topology changes are reflected in routing decisions within seconds, not minutes.
Health Monitoring
Every relay in the network participates in a heartbeat protocol. The health monitoring system tracks three metrics:
I. Availability. A relay that misses consecutive heartbeats is marked unhealthy and excluded from the candidate pool. Existing connections are migrated if the state persists.
II. Latency trend. Relays whose latency significantly exceeds their historical baseline are penalized, detecting degraded nodes suffering from overload or congestion.
III. Delivery success rate. The fraction of messages successfully delivered is tracked over a sliding window. Relays below a minimum threshold are temporarily excluded.
Load Balancing
When a relay's connection count approaches its announced capacity (published via the DHT), the scoring function progressively reduces its score for new connections. The penalty is applied as a continuous function rather than a hard cutoff -- a soft threshold begins redirecting new connections before the relay reaches full capacity, ensuring that no relay is driven to saturation. Existing connections on a loaded relay are not disturbed; only new connection requests are directed elsewhere.
Failure Recovery
When a relay becomes unreachable during an active session, the client initiates automatic reconnection: it re-evaluates the candidate relay pool (excluding the failed relay), selects a new relay using the standard scoring algorithm, reconnects, and replays any unacknowledged messages from its local outbox.
For multi-hop circuits, the failure of any relay in the circuit invalidates the entire circuit. The client constructs a new circuit from scratch, selecting entirely new relays. This is a deliberate security decision: reusing partial circuits after a failure could leak information about the circuit structure to an adversary who caused the failure.
Online versus Offline Routing: A Distinction
The relay selection and adaptive routing mechanisms described above apply to the online validator network — the internet-connected infrastructure through which Zentalk delivers messages in real time. It is important to distinguish these mechanisms from the routing approach used in the offline Zentanode mesh (described in Chapter 4, Section 7).
The two environments present fundamentally different routing problems, and they employ fundamentally different solutions.
Online Relay Routing
Online relay routing operates in an environment where the network topology is known. Every validator publishes its presence, capacity, and health metrics to the DHT. The client has a global view of available relays and can compute an optimal selection before sending a single packet. In this environment, deterministic scoring algorithms — the GeoRouter, health checks, load balancing formulas — are the appropriate tool. They are predictable, auditable, and computationally inexpensive. There is no need for learning or adaptation beyond straightforward metric tracking, because the information required for good decisions is directly observable.
Offline Mesh Routing
Offline mesh routing operates in an environment where no node has a global view. Zentanodes communicate over short-range radio links and can observe only their immediate neighbors. The network topology is dynamic: nodes move, lose power, encounter interference. A node that needs to forward a message to a distant destination cannot query a directory of available routes — it must make a forwarding decision based on incomplete local information. In this environment, reinforcement learning (Q-learning) is the appropriate tool. Each node maintains a Q-table that maps (destination, neighbor) pairs to quality scores learned from the outcomes of previous forwarding decisions. The node learns which neighbors are effective relays for which destinations through experience, not through broadcast topology data. Neural network augmentation provides predictive capabilities — anticipating node failures from beacon patterns, detecting congestion before it manifests — that further compensate for the absence of global state.
The distinction is not a matter of preference or implementation stage. It reflects a fundamental difference in the information available to the routing agent:
| Property | Online Relay Network | Offline Zentanode Mesh |
|---|---|---|
| Topology knowledge | Global (via DHT) | Local only (neighbors) |
| Network state | Observable in real time | Inferred from experience |
| Routing algorithm | Deterministic scoring (GeoRouter) | Reinforcement learning (Q-learning) |
| Adaptation mechanism | Health checks, load metrics | Q-table updates, neural prediction |
| Bandwidth for routing overhead | Abundant (internet) | Scarce (LoRa radio) |
| Why this approach | Information is available; compute the answer | Information is unavailable; learn the answer |
This separation of concerns ensures that each network layer uses the routing strategy best suited to its operating constraints. The online network exploits the availability of global state to make fast, deterministic routing decisions. The offline mesh exploits the power of reinforcement learning to make intelligent routing decisions in the absence of global state. Neither approach would perform well if applied to the other's environment.
Federation
Hybrid Federation Design
Zentalk implements a hybrid federation design that separates persistent data from ephemeral routing state. Membership data (contacts, group memberships) is stored encrypted in the mesh DHT, where no server can read it. Online routing state -- which users are currently connected and to which relay -- is maintained ephemerally by relay servers and refreshed periodically. This separation means that relay servers know who is currently online (necessary for real-time routing) but cannot decrypt messages, view membership lists, or read stored data.
Server-to-server federation messages are authenticated with Ed25519 signatures, preventing relay impersonation and replay attacks. The federation protocol supports three delivery states: delivered (recipient online), queued (recipient offline, stored with bounded TTL), and error (routing failure).
User Routing
When a user connects, their home relay is recorded in the DHT and refreshed periodically. When Alice sends a message to Bob, her relay first checks whether Bob is connected locally (direct delivery). If not, it queries the DHT for Bob's home relay and forwards the message through federation. If Bob is offline, the message is queued with a bounded TTL at his home relay.
Connection Maintenance
Relay connections are maintained through a periodic heartbeat protocol. Each relay exchanges ping/pong messages with its peers, tracking round-trip times via an exponential moving average. Peers that fail consecutive health checks are marked as unhealthy and eventually disconnected.
New nodes progress through a discovery lifecycle: initial bootstrap (connecting to known entry points), DHT population (building the routing table), peer connection establishment, and full mesh formation. Once integrated, nodes participate in ongoing health check cycles and periodic capacity announcements to the DHT.
Performance Characteristics
The relay architecture is designed so that cryptographic operations do not constitute a throughput bottleneck. AES-256-GCM encryption and RSA-4096-OAEP decryption (the per-hop relay operation) complete in sub-millisecond time on modern hardware. The aggregate processing overhead for a multi-hop relay circuit is negligible relative to network latency, ensuring that the privacy benefits of layered encryption come at minimal performance cost.