Skip to main content

Relay Routing Network

The relay routing network provides real-time message delivery between Zentalk users. This chapter describes the routing architecture: direct delivery for online users, durable offline queuing, multi-hop relay routing for metadata privacy, geographic optimization, and federation between relay servers.

Architecture Overview

Every Full Node in the Zentalk network operates a Relay Module responsible for:

  1. Real-time message delivery to connected recipients
  2. Offline message queuing for disconnected recipients
  3. Multi-hop relay routing for sender anonymity (1-5 hops)
  4. Group/channel message fan-out to multiple recipients
  5. E2EE key bundle distribution for new session establishment
  6. Federation between relay servers for cross-relay message delivery

The relay never has access to plaintext message content. All messages arrive pre-encrypted with the Signal Protocol (Chapter 3). The relay processes opaque encrypted blobs, routing them based on recipient addresses without knowledge of the content, sender identity (when sealed sender is used), or conversation context.

Message Delivery Flow

Online Delivery

When both sender and recipient are connected to relays, the message follows the shortest available path. If the recipient is connected to the same relay as the sender, the message is forwarded directly. If the recipient is connected to a different relay, the sender's relay queries the DHT for the recipient's home relay and forwards the message through federation. The relay identifies recipients only by their hashed addresses -- it never sees the plaintext content and, when sealed sender is enabled, does not know the sender's identity either.

Offline Delivery

When the recipient is not connected, the relay durably stores the encrypted message in an offline queue with a bounded TTL. The relay does not acknowledge receipt to the sender until the message has been committed to persistent storage, ensuring that no acknowledged message can be lost even if the relay crashes immediately after acknowledgment. When the recipient reconnects to any relay in the network, queued messages are drained and delivered in chronological order.

Durability Guarantees

The offline queue provides at-least-once delivery semantics through three properties:

Atomicity
Each message is written to persistent storage as a single atomic unit with integrity verification. Partial writes are detected and discarded during recovery.
Durability
No message is acknowledged to the sender until it has been committed to durable storage. This guarantee holds even through process crashes and power failures.
Recovery
On startup, the relay replays any messages that were persisted but not yet confirmed as delivered.

Multi-Hop Relay Routing

Protocol Design

Multi-hop relay routing wraps each message in multiple encryption layers, one per relay hop. Each relay can decrypt only its own layer, revealing the next hop but not the ultimate destination (except for the final relay). The innermost layer contains the recipient's address; each outer layer contains only the address of the next relay in the circuit. This layered structure -- analogous to Tor's onion routing -- ensures that no single relay in the circuit can correlate the sender with the recipient.

Multi-Hop Relay Encryption Layers
Each message is wrapped in three layers of RSA-4096-OAEP encryption. Each relay peels exactly one layer, learning only the next hop. No single relay can correlate sender and recipient.

Circuit Construction

The sender constructs the relay circuit by selecting 1-5 relay hops from the DHT peer list. Relay selection enforces diversity constraints: no two relays in the same circuit may share the same operator, and geographic diversity is required to span multiple jurisdictions. The circuit is built inside-out -- the innermost encryption layer (for the final relay) is constructed first, then each successive outer layer wraps the previous one. Each layer is encrypted with the corresponding relay's public key, ensuring that only that relay can decrypt its routing instructions.

A TTL field tracks the remaining hops, and a payload hash enables each relay to verify message integrity without decrypting the payload. Messages that fail integrity checks or exceed their hop limit are dropped immediately.

Security Properties

What each relay sees:

Relay PositionKnows Sender?Knows Recipient?Knows Content?
First relayYes (direct connection)NoNo
Middle relay(s)NoNoNo
Last relayNoYes (delivers to recipient)No
Any relayNever (E2EE)

Threat Analysis

Single Compromised Relay
Cannot determine both sender and recipient. Knows at most one endpoint of the communication.
Two Non-Adjacent Compromised Relays
Still cannot correlate sender with recipient. The middle relay separates the two compromised endpoints.
All Relays Compromised
Can correlate sender and recipient through timing analysis, but still cannot read message content. E2EE remains intact regardless of infrastructure compromise.
Global Passive Adversary
Can observe all network traffic and correlate timing. Mitigation: traffic padding and cover traffic reduce the statistical reliability of timing correlation.

Privacy-Latency Trade-off

Each additional hop adds incremental latency due to the extra network round-trip and decryption step. For most use cases, 3-hop routing provides a good balance between privacy and latency -- the added delay remains well within the bounds of interactive communication. The default configuration uses 1 hop (fastest), with users able to opt into 3 or 5 hops for enhanced privacy.

Relay Selection

When a Zentalk client initiates a connection, it must determine which validator node(s) to route through. This decision is not arbitrary. The client evaluates all candidate relays against a composite scoring function that accounts for four factors: measured latency, geographic proximity, operator reputation, and current load. The result is a ranked list of candidates from which the client selects using weighted random sampling.

Selection Criteria

Each candidate relay is evaluated on the following dimensions:

Latency
The client maintains a running exponential moving average of round-trip times to each known relay. Relays with consistently low latency score higher, as latency is the strongest predictor of user-perceived responsiveness.
Geographic proximity
The GeoRouter computes great-circle distance between the client and each candidate relay using the haversine formula. Closer relays score higher, though proximity is weighted below measured latency since geographic distance is only an approximate proxy for network distance.
Reputation score
Each validator accumulates a reputation score from its on-chain staking status, historical uptime, and peer-reported reliability. Operators who invest more in infrastructure and maintain better service quality receive proportionally more routing traffic.
Current load
Each relay periodically publishes its connection count and message throughput to the DHT. The scoring function penalizes relays approaching their capacity threshold, preventing load hotspots.

GeoRouter Scoring

The GeoRouter combines these criteria into a single composite score using a weighted formula. Region proximity carries the highest weight, followed by measured latency, reputation, current load, and health check freshness. Same-region relays receive the strongest preference, while high latency and high load are penalized.

The final selection uses weighted random sampling rather than deterministic best-score selection. This is a deliberate design choice: deterministic selection would cause all clients in a geographic region to converge on the same relay, creating a load hotspot and a single point of failure. Weighted random sampling distributes connections across multiple relays in proportion to their suitability, achieving load distribution while still favoring higher-quality relays.

Multi-Hop Diversity Constraints

When the client constructs a multi-hop relay circuit (as described in the preceding section), each hop is selected not only for individual quality but also for diversity relative to the other hops in the circuit. The circuit construction algorithm enforces three diversity constraints:

I. Operator diversity. No two relays in the same circuit may share the same validator, preventing a single operator from observing both entry and exit points.

II. Geographic diversity. No more than two relays may reside in the same region, ensuring the circuit spans multiple jurisdictions.

III. Network diversity. The circuit avoids relays sharing the same data center or upstream provider, ensuring infrastructure independence across hops.

These constraints may reduce the average score of the selected circuit relative to an unconstrained selection. This is an acceptable trade-off: the privacy properties of multi-hop routing depend on the independence of the relay operators, and diversity constraints are the mechanism that enforces that independence.

Cross-Region Routing

The GeoRouter targets low latency for same-region delivery, moderate latency for adjacent regions, and best-effort for intercontinental routes -- all within the bounds of perceptually responsive interactive communication.

Adaptive Routing

Relay selection is not a one-time decision. The network continuously monitors relay health and adjusts routing in response to changing conditions. This adaptive behavior ensures that transient failures, capacity saturation, and network topology changes are reflected in routing decisions within seconds, not minutes.

Health Monitoring

Every relay in the network participates in a heartbeat protocol. The health monitoring system tracks three metrics:

I. Availability. A relay that misses consecutive heartbeats is marked unhealthy and excluded from the candidate pool. Existing connections are migrated if the state persists.

II. Latency trend. Relays whose latency significantly exceeds their historical baseline are penalized, detecting degraded nodes suffering from overload or congestion.

III. Delivery success rate. The fraction of messages successfully delivered is tracked over a sliding window. Relays below a minimum threshold are temporarily excluded.

Load Balancing

When a relay's connection count approaches its announced capacity (published via the DHT), the scoring function progressively reduces its score for new connections. The penalty is applied as a continuous function rather than a hard cutoff -- a soft threshold begins redirecting new connections before the relay reaches full capacity, ensuring that no relay is driven to saturation. Existing connections on a loaded relay are not disturbed; only new connection requests are directed elsewhere.

Failure Recovery

When a relay becomes unreachable during an active session, the client initiates automatic reconnection: it re-evaluates the candidate relay pool (excluding the failed relay), selects a new relay using the standard scoring algorithm, reconnects, and replays any unacknowledged messages from its local outbox.

For multi-hop circuits, the failure of any relay in the circuit invalidates the entire circuit. The client constructs a new circuit from scratch, selecting entirely new relays. This is a deliberate security decision: reusing partial circuits after a failure could leak information about the circuit structure to an adversary who caused the failure.

Online versus Offline Routing: A Distinction

The relay selection and adaptive routing mechanisms described above apply to the online validator network — the internet-connected infrastructure through which Zentalk delivers messages in real time. It is important to distinguish these mechanisms from the routing approach used in the offline Zentanode mesh (described in Chapter 4, Section 7).

The two environments present fundamentally different routing problems, and they employ fundamentally different solutions.

Online Relay Routing

Online relay routing operates in an environment where the network topology is known. Every validator publishes its presence, capacity, and health metrics to the DHT. The client has a global view of available relays and can compute an optimal selection before sending a single packet. In this environment, deterministic scoring algorithms — the GeoRouter, health checks, load balancing formulas — are the appropriate tool. They are predictable, auditable, and computationally inexpensive. There is no need for learning or adaptation beyond straightforward metric tracking, because the information required for good decisions is directly observable.

Offline Mesh Routing

Offline mesh routing operates in an environment where no node has a global view. Zentanodes communicate over short-range radio links and can observe only their immediate neighbors. The network topology is dynamic: nodes move, lose power, encounter interference. A node that needs to forward a message to a distant destination cannot query a directory of available routes — it must make a forwarding decision based on incomplete local information. In this environment, reinforcement learning (Q-learning) is the appropriate tool. Each node maintains a Q-table that maps (destination, neighbor) pairs to quality scores learned from the outcomes of previous forwarding decisions. The node learns which neighbors are effective relays for which destinations through experience, not through broadcast topology data. Neural network augmentation provides predictive capabilities — anticipating node failures from beacon patterns, detecting congestion before it manifests — that further compensate for the absence of global state.

The distinction is not a matter of preference or implementation stage. It reflects a fundamental difference in the information available to the routing agent:

PropertyOnline Relay NetworkOffline Zentanode Mesh
Topology knowledgeGlobal (via DHT)Local only (neighbors)
Network stateObservable in real timeInferred from experience
Routing algorithmDeterministic scoring (GeoRouter)Reinforcement learning (Q-learning)
Adaptation mechanismHealth checks, load metricsQ-table updates, neural prediction
Bandwidth for routing overheadAbundant (internet)Scarce (LoRa radio)
Why this approachInformation is available; compute the answerInformation is unavailable; learn the answer

This separation of concerns ensures that each network layer uses the routing strategy best suited to its operating constraints. The online network exploits the availability of global state to make fast, deterministic routing decisions. The offline mesh exploits the power of reinforcement learning to make intelligent routing decisions in the absence of global state. Neither approach would perform well if applied to the other's environment.

Federation

Hybrid Federation Design

Zentalk implements a hybrid federation design that separates persistent data from ephemeral routing state. Membership data (contacts, group memberships) is stored encrypted in the mesh DHT, where no server can read it. Online routing state -- which users are currently connected and to which relay -- is maintained ephemerally by relay servers and refreshed periodically. This separation means that relay servers know who is currently online (necessary for real-time routing) but cannot decrypt messages, view membership lists, or read stored data.

Server-to-server federation messages are authenticated with Ed25519 signatures, preventing relay impersonation and replay attacks. The federation protocol supports three delivery states: delivered (recipient online), queued (recipient offline, stored with bounded TTL), and error (routing failure).

User Routing

When a user connects, their home relay is recorded in the DHT and refreshed periodically. When Alice sends a message to Bob, her relay first checks whether Bob is connected locally (direct delivery). If not, it queries the DHT for Bob's home relay and forwards the message through federation. If Bob is offline, the message is queued with a bounded TTL at his home relay.

Message Routing Paths
Three delivery modes: direct (same relay, fastest), federated (cross-relay via DHT), and multi-hop relay (3-hop, maximum privacy). The relay never sees message content in any mode.

Connection Maintenance

Relay connections are maintained through a periodic heartbeat protocol. Each relay exchanges ping/pong messages with its peers, tracking round-trip times via an exponential moving average. Peers that fail consecutive health checks are marked as unhealthy and eventually disconnected.

New nodes progress through a discovery lifecycle: initial bootstrap (connecting to known entry points), DHT population (building the routing table), peer connection establishment, and full mesh formation. Once integrated, nodes participate in ongoing health check cycles and periodic capacity announcements to the DHT.

Performance Characteristics

The relay architecture is designed so that cryptographic operations do not constitute a throughput bottleneck. AES-256-GCM encryption and RSA-4096-OAEP decryption (the per-hop relay operation) complete in sub-millisecond time on modern hardware. The aggregate processing overhead for a multi-hop relay circuit is negligible relative to network latency, ensuring that the privacy benefits of layered encryption come at minimal performance cost.