Chapter 3: System Requirements Derived From Barriers

December 5, 2025

Chapter 2 identified four structural barriers preventing biodata RWA markets: provenance (the trust gap), consent (the permission paradox), quality (the metadata gap), and liability (the concentration risk). This chapter derives system requirements directly from those barriers. Each requirement maps to a specific barrier.

3.1 Attestation at Capture

What it is

Every biosignal must be cryptographically signed at the moment of capture by the device or institution that generated it. The signature proves the data came from certified hardware or a real human, not from a simulator, fabricator, or unauthorized source.

For consumer wearables: The device generates a hardware-backed signature using its secure enclave or trusted platform module. The signature includes device serial number, firmware version, timestamp, and a hash of the raw signal data. This signature is unforgeable without access to the device's private key.

For user-uploaded data: If a user retrieves clinical results from their doctor and uploads them, the system verifies any existing institutional signatures on the documents. If no signature exists, the data is labeled as "user-attested" with lower trust tier.

Why it solves the provenance barrier

Attestation provides cryptographic proof. A researcher analyzing 10,000 sleep datasets can programmatically verify that each came from a certified Apple Watch, Oura Ring, or accredited sleep clinic. They do not need to trust individual users. They verify signatures instead.

This transforms biodata from "trust-based" to "verification-based." Trust does not scale. Verification does.

3.2 Metadata-First Design

What it is

Every biosignal must carry machine-readable metadata describing signal type, source, quality, completeness, and context. The metadata is not an afterthought added during export. It is captured alongside raw signals and becomes inseparable from them.

The metadata includes, subject to change,:

1. Signal descriptors: Domain (sleep, cardiac, metabolic), specific measurements (REM duration, HRV, glucose levels), units, sampling rate

2. Source information: Device model and firmware version, lab facility and equipment used, capture timestamp

3. Quality metrics: Fidelity score (0-100), completeness percentage, noise level, artifact flags, missing data intervals

4. Temporal context: Single measurement, daily averages, weekly trends, longitudinal tracking duration

5. Demographic context (if consented): Age range, sex, location region, relevant health conditions

6. Usage permissions: Allowed purposes, allowed recipients, expiration dates, revocation status

Metadata is structured in standardized schemas so researchers can query programmatically: "Show me sleep datasets with fidelity >85, duration >90 days, age 30-50, no missing nights."

Why it solves the quality barrier

Metadata enables pre-purchase quality assessment. A device manufacturer needing high-fidelity cardiac data for algorithm calibration can filter datasets automatically. They see: "Dataset A: fidelity 92, completeness 98%, 180 days continuous. Dataset B: fidelity 68, completeness 80%, 45 days with gaps." They purchase Dataset A and skip Dataset B.

This creates efficient markets. Buyers find suitable data. Sellers with high-quality data command premium prices. Sellers with lower-quality data are priced accordingly but still find buyers with lower requirements.

3.3 Fragmented Decentralised Storage

What it is

Raw biosignal data is encrypted client-side (on the user's device before uploading) and split into multiple fragments. These fragments are distributed across independent storage nodes operated by different entities in different jurisdictions. No single storage node holds complete datasets.

Reassembling data requires M-of-N fragments. For example, data might be split into 5 fragments with a rule that any 3 fragments can reconstruct the original data. This means:

  • One storage node can be completely compromised and attackers still cannot reconstruct data (they need 3 fragments but only have 1)

  • Two storage nodes can go offline and data remains accessible (user can reconstruct from the remaining 3)

  • The platform operator does not control any storage nodes directly (they cannot be compelled to hand over data)

User holds decryption keys. The platform does not hold keys. Storage nodes hold encrypted fragments but cannot decrypt them. Only the user can authorize decryption by providing keys when granting permission tokens.

Why it solves the liability barrier

Centralized storage creates catastrophic breach risk. If one database holding data from 1 million users is breached, all 1 million users' data is exposed simultaneously. The platform operator faces billion-dollar liability.

Fragmented decentralized storage distributes risk. If one storage node is breached, attackers obtain only encrypted fragments from a subset of users—and those fragments are useless without keys from other nodes. The blast radius is limited.

More importantly, fragmented storage distributes legal liability. The platform operator is not the data controller in the traditional sense—they do not hold complete datasets. Storage node operators hold only encrypted fragments and are not liable for content they cannot decrypt. Users hold keys and are the ultimate controllers.

3.4 Portable Permissions

What it is

Permissions are not stored separately from data. They are cryptographically attached to the data itself through permission tokens. When a user shares data with a researcher, the researcher receives a token granting access under specific conditions:

  • Purpose-locked: Token specifies "circadian research" or "pharmaceutical validation" or "device calibration"—cannot be used for other purposes

  • Time-limited: Token expires automatically after 6 months, 1 year, or user-specified duration

  • Recipient-locked: Token is issued to specific researcher or institution, cannot be transferred

  • Revocable: User can invalidate token at any time, preventing future access

The token does not contain data. It contains cryptographic proof that the user authorized this specific recipient for this specific purpose during this specific timeframe. The recipient presents the token to access data from fragmented storage.

Permissions travel with metadata. If a user grants access to their sleep data to Stanford Sleep Lab and then separately grants access to a pharmaceutical company, both permissions are tracked in the same system. The user sees a unified dashboard showing all active permissions.

Why it solves the consent barrier

Portable permissions enable granular control. User can share sleep data with Stanford for circadian research while simultaneously denying access to insurance companies. User can share glucose data with a diabetes research lab for 6 months while withholding genetic data entirely.

This transforms consent from "trust me to only use data appropriately" to "cryptographic enforcement prevents inappropriate use." A researcher with a token for "sleep research" cannot access metabolic data. A token issued for 6 months automatically expires—no manual revocation needed.

3.5 Direct Compensation

What it is

When a researcher pays to access biodata, the majority of payment flows directly to the user who generated that data. The platform retains only what is necessary to sustain infrastructure, compliance, and operations.

Payment flows are transparent:

  • Researcher pays $50 to access one user's 90 days of sleep data

  • Platform fee: 15-20% ($7.50-10.00), tentatively, for infrastructure, attestation verification, permission management, storage, compliance, and support

  • User receives: 80-85% ($40.00-42.50)

Users see exactly how much they earned from each transaction. No hidden fees. No intermediary capture where the platform takes 80% and users get 20%. Value flows to those who created the value.

Why this solves the economic sustainability requirement

Direct compensation aligns incentives. Users have reason to maintain data quality (they earn more for high-quality data). Users have reason to share longitudinal data (they earn more over time). Users have reason to integrate multiple domains (multi-domain datasets command premium prices).

Platform sustainability comes from volume, not per-transaction extraction. A platform earning 15-20% from millions of transactions across hundreds of thousands of users generates substantial revenue while keeping users happy. A platform earning 80% from a few thousand transactions alienates users and never scales.

3.6 Neutral Interoperability

What it is

The platform does not favor specific researchers, institutions, or commercial entities. Any compliant buyer can participate. Any certified device can integrate. Any accredited lab can provide data.

The platform is infrastructure, not a gatekeeper. It provides:

  • Standards for device attestation (any device meeting standards can integrate)

  • Schemas for metadata (any institution following schemas can contribute data)

  • APIs for data access (any vetted researcher can build tools using APIs)

  • Smart contract templates (any compliant entity can transact)

The platform does not pick winners. It sets neutral rules and enforces them equally.

Why this solves the defensibility requirement

Neutral interoperability prevents platform power abuse. Researchers know they compete on quality of their work, not on their relationship with the platform operator. Device manufacturers know that integration is based on meeting standards, not on negotiating special deals. Users know they are not locked into one ecosystem.

This is how infrastructure achieves long-term value. SMTP (email) is neutral—anyone can run an email server following the protocol. HTTP (web) is neutral—anyone can host a website following standards. These protocols became foundational because they did not favor specific parties.

Biodata infrastructure must follow the same model. Matrix is not MySpace (walled garden that collapsed when users wanted to leave). Matrix is email (neutral protocol that became essential precisely because it locked no one in).

Last updated