Metadata Protection
End-to-end encryption protects message content, but without additional measures, the communication metadata -- who talks to whom, when, how often, and from where -- remains exposed to network infrastructure. This chapter describes Zentalk's multi-layered metadata protection architecture: address hashing, sealed sender encryption, encrypted presence indicators, traffic analysis resistance, and the threat model that defines what the system can and cannot protect against.
The Metadata Problem
Metadata Problem
Metadata is often more revealing than content. Research by MIT, Stanford, and the Electronic Frontier Foundation has demonstrated that communication metadata alone can reveal:
| Metadata Type | What It Reveals | Zentalk Protection |
|---|---|---|
| Sender address | Identity | Sealed sender protocol |
| Recipient address | Communication target | Address hashing |
| Timing | Activity patterns | No server-side logging |
| IP address | Physical location | Multi-hop relay routing (3-hop) |
| Message size | Content type hints | Padding |
The former NSA and CIA director Michael Hayden stated: "We kill people based on metadata." This is not an exaggeration -- military targeting decisions are regularly made based on communication pattern analysis rather than content intercept.
Metadata in Centralized Messaging
Even platforms with end-to-end encryption collect extensive metadata:
WhatsApp (Meta): Collects and shares with Meta: phone numbers, contact lists, device identifiers, IP addresses, connection timestamps, usage frequency, message timestamps, group membership, profile photos, status updates, and commercial transaction data. Meta's privacy policy explicitly permits using this data for advertising and analytics.
Signal: Collects less metadata than WhatsApp but still knows: phone numbers (required for registration), connection timestamps, IP addresses (visible to Signal servers), and which users communicate with which (the server routes all messages). Signal's sealed sender feature hides the sender from the server for some messages, but the recipient is always visible.
Telegram: Default chats are not encrypted; Telegram has access to full message content, metadata, and user data. Even "Secret Chats" reveal metadata to Telegram's servers.
Address Hashing
Mechanism
Before any wallet address is transmitted to the mesh network, it is hashed using SHA-256 with a protocol-specific salt. The address is normalized, concatenated with the salt, hashed, and then truncated to produce a fixed-length identifier with a protocol prefix. The result is a deterministic but irreversible mapping from wallet address to mesh identity.
Properties
One-way: Given a hashed address, an attacker cannot reverse the hash to recover the original wallet address. SHA-256 preimage resistance requires operations.
Deterministic: The same wallet address always produces the same hash. This is necessary for the mesh network to route messages and retrieve stored data.
Salted: A protocol-specific salt prevents rainbow table attacks and cross-protocol correlation. An attacker cannot use pre-computed hash tables from other systems.
Truncated: Only 128 bits of the 256-bit SHA-256 output are used. This provides ample collision resistance (birthday bound at , far exceeding the number of possible Ethereum addresses at ) while reducing storage.
What Address Hashing Protects Against
| Threat | Protected? | Explanation |
|---|---|---|
| Casual browsing of stored data | Yes | Mesh node operator sees only hashed identifiers, not wallet addresses |
| Correlation with blockchain identity | Yes | Cannot link mesh activity to on-chain transactions |
| Social graph reconstruction (unknown addresses) | Yes | Without knowing target addresses, cannot search for them |
| Rainbow table attack | Yes | Salt prevents pre-computation |
Honest Limitations
Address hashing is obfuscation, not perfect privacy. An adversary who knows a target's wallet address can:
- Compute the same hash (the salt is public and the algorithm is deterministic)
- Search the mesh for that hash
- Determine whether the target has stored data and observe access patterns
This is analogous to phone number hashing in contact discovery -- it prevents passive enumeration but not targeted surveillance by an adversary with specific knowledge. For protection against targeted adversaries, additional measures (sealed sender, multi-hop relay routing, Tor) are required.
Sealed Sender Protocol
Design Goal
The sealed sender protocol encrypts the sender's identity so that the mesh relay node cannot determine who sent a message. The relay sees only the recipient's (hashed) address for routing purposes.
Cryptographic Construction
The protocol uses ephemeral X25519 Elliptic Curve Diffie-Hellman (ECDH) combined with HKDF-SHA256 key derivation and AES-256-GCM authenticated encryption.
The sender generates a fresh ephemeral X25519 keypair and performs a Diffie-Hellman exchange with the recipient's public key. From the resulting shared secret, an AES-256 key is derived via HKDF. The sender's address is then encrypted under AES-256-GCM with the message identifier bound as additional authenticated data. The final sealed sender blob contains the ephemeral public key, the nonce, and the ciphertext -- everything the recipient needs to reverse the process, but nothing the relay can use.
Server-Side Handling
When the mesh node receives a message with a sealed sender field, it stores the encrypted sender blob as-is. The mesh node cannot decrypt it because it lacks the recipient's X25519 private key. The relay forwards the message based solely on the recipient's hashed address.
Recipient Decryption
The recipient detects the sealed sender, extracts the ephemeral public key, performs the reverse ECDH to recover the shared secret, derives the same AES key, and decrypts the sender's address. The sender's identity is revealed only to the intended recipient -- it was hidden from every relay and storage node along the path.
Security Properties
Forward secrecy: The ephemeral keypair is generated fresh for each sealed sender operation. Even if the recipient's long-term X25519 private key is later compromised, past sealed senders cannot be retroactively decrypted because the ephemeral private key was never stored.
Binding to message: The Additional Authenticated Data (AAD) includes the message identifier, preventing an attacker from detaching a sealed sender blob from one message and attaching it to another.
Replay prevention: Each sealed sender uses a unique random nonce and unique ephemeral key. Replaying the same sealed sender blob with a different message will fail AAD verification.
Stealth Addresses
The Recipient Identification Problem
Even with address hashing and sealed sender, a persistent observer monitoring the mesh network can correlate repeated communications to the same hashed address. If Alice sends messages to Bob's hashed address over multiple days, the observer learns that someone is repeatedly communicating with the entity behind that address -- even without knowing that the address belongs to Bob. Over time, this pattern constitutes a communication fingerprint.
Stealth addresses eliminate this correlation by generating a unique, one-time address for each message exchange.
Construction
The stealth address protocol operates as follows:
- Bob publishes a stealth meta-address -- a pair of public keys (viewing key and spending key ) derived from his identity key
- Alice generates an ephemeral keypair where
- Alice computes the stealth address: where is SHA-256 and is the Curve25519 base point
- Alice sends the message to address with the ephemeral public key attached
- Bob scans incoming messages: For each message with ephemeral key , Bob computes and checks if
- Only Bob can detect and decrypt messages addressed to his stealth addresses, because only he possesses the viewing key
Privacy Properties
- Unlinkability: Each message uses a different address -- no two messages to Bob share the same destination
- Sender anonymity: Combined with sealed sender, neither the sender identity nor the recipient address is reusable
- Observer resistance: A passive observer cannot determine that two stealth addresses belong to the same recipient
- Forward privacy: Compromising one stealth address does not reveal other stealth addresses for the same recipient
Scanning Efficiency
The computational cost of stealth address scanning is one elliptic curve scalar multiplication per incoming message. For a user receiving messages per day, this requires point multiplications -- approximately 0.2 milliseconds each on modern hardware, making scanning practical for thousands of messages per day without noticeable latency.
Encrypted Metadata Events
Encrypted Presence
Traditional messengers transmit presence information (online/offline/last seen) in plaintext, allowing the server to track user activity patterns. In Zentalk's mesh-only mode, presence updates are encrypted using the same Double Ratchet session used for messages. The relay processes the WebSocket frame without knowing its content -- it cannot distinguish a presence update from a typing indicator or a read receipt.
Encrypted Typing Indicators
Typing indicators are similarly encrypted end-to-end. The relay sees only an opaque ciphertext blob; it cannot determine who is typing to whom.
Encrypted Read Receipts
Read receipts follow the same pattern: encrypted under the Double Ratchet session, indistinguishable from other event types at the network level.
Enforcement
In mesh-only mode, the system enforces encrypted metadata through compile-time guards. Plaintext presence, typing, and receipt events are architecturally prohibited -- the build pipeline rejects any code path that would transmit these events in cleartext.
Traffic Analysis Resistance
Current Protections
Fixed-size relay cells. All data transmitted through the relay network is normalized into fixed-size cells, following established relay padding approaches [17]. Messages smaller than the cell size are padded with random bytes; messages larger are split into multiple cells. This prevents an observer from inferring message type (text vs. media, short vs. long) from packet sizes.
Constant-rate traffic padding. Relays generate dummy padding cells at a constant rate to maintain uniform traffic flow even when no real messages are being transmitted. This prevents an observer from determining when a user is actively communicating versus idle.
Multi-hop relay routing (Chapter 6): Three-hop routing (Guard -> Middle -> Exit) prevents any single relay from knowing both sender and recipient. The Guard relay knows the client's IP address but not the destination; the Exit relay knows the destination but not the client; the Middle relay knows neither.
Address hashing: Mesh nodes see hashed addresses, not wallet addresses, preventing casual traffic analysis.
Sealed sender: Relay nodes cannot identify the sender of DM messages when sealed sender is used.
Timing obfuscation: Random delays drawn from a memoryless exponential distribution are applied before relay forwarding to decorrelate message timing. The exponential distribution is chosen because it is memoryless: observing a delay of milliseconds provides no information about when the next message will be forwarded.
Planned Enhancements
Message padding: All messages will be padded to fixed-size buckets, preventing an observer from distinguishing text messages from media, short messages from long ones, or emoji from paragraphs.
Fuzzy timing: Presence updates and other periodic events will be batched with random jitter, so that the relay sees activity patterns only at coarse intervals rather than real-time updates.
Cover traffic: The client will generate decoy messages that are cryptographically indistinguishable from real ones. Network observers cannot determine which transmissions carry actual content and which are noise.
Threat Model
Adversary Classification
Zentalk's threat model considers four adversary types:
Type 1: Passive Mesh Node Operator
- Capabilities: Can read all data stored on their node; can observe network traffic
- Cannot: Decrypt E2EE messages; reverse address hashes (without target address); unseal sealed senders
- Protection level: Full content protection; partial metadata protection
Type 2: Active Mesh Node Operator
- Capabilities: Everything Type 1 can do, plus: can modify stored data; can drop or delay messages; can inject fake messages
- Cannot: Forge E2EE messages (no session keys); break AES-256-GCM encryption; forge Ed25519 signatures
- Protection level: Tampering detected by authentication tags and signatures; data loss mitigated by Reed-Solomon redundancy
Type 3: Network-Level Adversary (ISP, Government)
- Capabilities: Can observe all network traffic between users and relays; can correlate connection timing; can perform traffic analysis
- Cannot: Decrypt E2EE content; read data on mesh nodes (encrypted at rest)
- Protection level: Content fully protected; metadata partially protected (multi-hop relay routing, sealed sender); IP addresses visible (mitigated by Tor/VPN)
Type 4: Global Passive Adversary (Nation-State)
- Capabilities: Can observe all internet traffic worldwide; can correlate timing patterns globally; can perform advanced traffic analysis with machine learning
- Cannot: Break AES-256 or X25519 (classical); may eventually break X25519 (quantum)
- Protection level: Content protected; metadata protection depends on multi-hop relay routing, padding, and cover traffic; post-quantum hybrid protects against future quantum attacks
Protection Matrix
| Data Type | Type 1 (Passive Node) | Type 2 (Active Node) | Type 3 (ISP) | Type 4 (Global) |
|---|---|---|---|---|
| Message content | Protected | Protected | Protected | Protected |
| Sender identity (sealed) | Protected | Protected | Protected | Protected |
| Recipient identity (hashed) | Partially | Partially | Partially | Partially |
| Communication timing | Visible | Visible | Visible | Visible |
| IP addresses | N/A | N/A | Visible | Visible |
| Connection patterns | Visible | Visible | Visible | Visible |
| Group membership | Visible (IDs) | Visible (IDs) | Visible | Visible |
| Message sizes | Visible | Visible | Visible | Visible |
Mitigations for "Visible" items:
- Communication timing: Fuzzy timing + batching (planned)
- IP addresses: Tor/VPN (user responsibility)
- Connection patterns: Cover traffic (planned)
- Group membership: Sealed group messages with ZK proofs (Chapter 7)
- Message sizes: Message padding (planned)
Accepted Limitations
Zentalk explicitly acknowledges these limitations:
-
Timing correlation: If Alice goes offline the moment Bob comes online, an observer can infer they communicate. Mitigation: keep persistent connections alive even when "offline."
-
Group routing metadata: Group IDs must be visible to relays for message routing. Mitigation: Group IDs are hashed and context-specific.
-
IP address exposure: TCP/IP requires visible IP addresses. Zentalk recommends Tor or VPN for users with high privacy requirements.
-
Deterministic address hashing: The same address always produces the same hash. An adversary who knows a target's address can compute the hash and search for it. Mitigation: per-contact pseudonymous identifiers (planned).
Privacy Compliance
GDPR by Design
Zentalk implements privacy-by-design as required by GDPR Article 25:
- Data minimization: Only data necessary for communication is collected. No tracking, analytics, or behavioral profiling.
- Purpose limitation: Data is used exclusively for message delivery and encrypted storage. No secondary uses.
- Storage limitation: All mesh data has bounded retention periods. Data is automatically deleted after expiration.
- Encryption: All personal data is encrypted with keys held exclusively by the user. This satisfies GDPR Article 32 (security of processing).
Right to Erasure (Article 17)
When a user exercises their right to erasure:
- All mesh-stored data is deleted from all nodes
- Group memberships are revoked (membership tokens invalidated)
- Message history on other users' devices remains (E2EE prevents server-side deletion of received messages)
Data Portability (Article 20)
Users can export their data in a structured, machine-readable format as required by GDPR Article 20. All exported data is limited to what the user's client has decrypted locally -- the system never has access to plaintext data on the server side.
Operational Privacy Modes
For maximum privacy deployments, Zentalk provides configurable privacy modes that control the system's interaction with external services:
Mesh-only mode (the default production configuration) forces all data to flow exclusively through the decentralized mesh network. The client makes zero connections to external services -- no CDN requests, no external URL fetches, no analytics, no centralized fallback. If the mesh is unavailable, the system fails closed rather than degrading to a less private mode.
Selective feature disabling allows operators to individually control features that require external network connections: media previews from third-party CDNs, URL-based link preview generation, and external font or emoji loading. Each feature defaults to the privacy-preserving configuration (disabled) and must be explicitly enabled.
Tor enforcement optionally requires all client connections to route through the Tor network, providing network-layer anonymity in addition to the application-layer protections described in this chapter.
These privacy modes are enforced at the application level through compile-time guards that prevent accidental privacy regression.
The cryptographic and privacy protections described in the preceding parts guarantee that no infrastructure participant can read message content or reconstruct communication patterns. However, these guarantees depend on the continued honest operation of the network's relay and storage infrastructure. The following part addresses the economic layer that sustains this infrastructure: how validators are incentivized through CHAIN token staking and reward distribution, why rational self-interest aligns with honest operation, and how the resulting economic equilibrium produces a self-sustaining network without any central authority directing it.