Reading dfpn's threat model

dfpn team · May 26, 2026 ·

threat-modelsecurityprotocol

Threat models are the most useful document a protocol can publish and the least likely to be read. They are dense, they are uncomfortable, and they tend to be more honest about a system’s limits than any marketing site is going to be. The dfpn repository ships one. It is worth a closer read than it usually gets, because it both says useful things about how the protocol expects to be attacked and is unusually direct about what it does not try to defend against.

This post walks through the document section by section, in the spirit of “here is what the design team actually thinks about.” It is not a substitute for reading the source; it is a guided tour of the parts most worth slowing down on.

What dfpn says it is protecting

The threat model lists five assets explicitly:

Integrity of analysis results — the verdicts produced by the network must reflect what the workers actually computed.
Authenticity of model evaluations — the benchmarks used to score models must be trustworthy.
Funds in the treasury and reward pools — the on-chain accounts holding stake and rewards must be safe from theft.
Privacy of submitted media — clients should not have their media leaked or substituted.
Availability of the marketplace and registries — the protocol must keep serving requests under adversarial load.

That ordering matters. The first asset on the list is verdict integrity. Everything else exists to protect it. If verdicts can be corrupted — by collusion, by copying, by overfitting — the rest of the protocol’s value collapses. The mitigations later in the doc are best read as defenses of that first asset, with the others as consequences.

Six adversaries, named and unflattering

The doc enumerates six adversary classes. They are worth quoting directly because they cover most of the practical attack surface.

Malicious workers submit incorrect results for profit. The simplest version: a worker pockets the fee without actually running inference and submits a plausible-looking but unrelated verdict. The mitigation is commit-reveal plus multi-worker redundancy plus reputation-weighted aggregation. Submitting an incorrect result against a consensus of honest workers is detectable; the slashing schedule makes it expensive.

Malicious model developers ship biased or overfit models to game benchmarks. This is the subtler version of the same attack. A developer trains a detector to score artificially well on a leaked benchmark; the model then underperforms in real traffic, but the reputation system has already credited it. The mitigation is rotating, hidden test sets curated under governance. Static benchmarks become attack surfaces; rotation is part of the security model.

Sybil attackers create many worker identities to skew outcomes. The mitigation is the stake floor, reputation weighting (new identities start with no reputation), and per-epoch caps. Sybil swarms cost real DFPN to spin up, earn little until they have reputation, and cannot dominate any epoch even if well-funded.

Colluding cartels coordinate workers and models to manipulate consensus. This is the most concerning class because it cannot be defeated by stake alone — the cartel could just stake more. The mitigations are random assignment of requests, diversity constraints that prevent the same operator’s workers from dominating any single request, a challenge window before slashing, and the commit-reveal flow that makes coordination harder to execute in real time.

Spammers flood the network with low-value requests. The mitigations are dynamic fees that rise with congestion, rate limits, and priority fees for legitimate clients. Note that the threat model treats spam as an availability attack on the marketplace, not as an attempt to corrupt verdicts — a useful distinction.

External attackers go after off-chain storage, indexers, or APIs. These are the systems that are not themselves the source of truth but support clients in talking to the chain. The mitigation is to treat them as untrusted: indexers are convenience layers, never authority. The chain is the source of truth and is reachable independently.

Reading these six categories side by side, the design becomes legible. The protocol assumes adversaries are competent. It does not promise to make them go away; it promises to make them visible and expensive.

Trust assumptions, named explicitly

The threat model names three trust assumptions, which is more than most protocols bother to do.

Solana provides finality and liveness within expected parameters. dfpn is a Solana protocol. If Solana stops, dfpn stops. If Solana reorganizes, dfpn reorganizes. There is no claim of independence from the underlying L1; the dependency is owned.

Off-chain storage links are available but not trusted for integrity. Workers fetch media from IPFS, Arweave, or S3 (or whatever the request specifies). The protocol does not trust the storage layer to return the right bytes. Every request includes a content hash; workers verify against it. Storage substitution attacks are explicitly enumerated in the abuse playbook section, with the mitigation being content hashes plus multi-source retrieval.

Benchmarks are curated and updated via governance. This is the most interesting one because it makes governance a security primitive rather than just a coordination mechanism. The governance system has to actually rotate benchmarks faster than developers can overfit to them. If governance fails at that, the model-developer-overfitting attack succeeds. The threat model is upfront about this.

Attack vectors, with concrete mitigations

The middle of the document is a matrix of specific attack vectors and the protocol features that exist to counter them. A few stand out.

Result copying or front-running is exactly what commit-reveal exists to stop. Without the commit-reveal flow, a worker who saw another worker’s submission could just copy it. With commit-reveal, copying requires defeating the hash commitment, which it cannot.

Replay of old results is countered by per-request nonces and expiration windows. A worker cannot resubmit a stale verdict from an old request.

Censorship of requests is countered by the open worker pool. If one operator refuses to serve a particular request, the request is still visible to others. There is no gatekeeper.

Oracle or indexer compromise is countered by the rule that clients verify against on-chain state, not against indexer responses. Indexers can be useful for search and dashboards, but they are not authoritative. A client doing a verdict check should pull from the chain.

Token theft or treasury drain is countered by program audits, multisig governance over treasury operations, and time-locked upgrades. None of this is novel for Solana programs; the threat model treats it as the baseline rather than as a feature.

What is notable is that every mitigation in this section is a feature that already exists in the protocol, not a roadmap item. Either commit-reveal is implemented or it is not. Either the worker stake floor is enforced on-chain or it is not. The threat model is reporting on the current shape of the system, which is the only kind of threat model worth writing.

Residual risks, named honestly

Most threat models stop at “here are our defenses.” The dfpn document keeps going. It enumerates residual risks — things that are not solved and cannot be solved by the current design.

Adversarial examples may bypass detection in specific contexts. Adversarial robustness is an open problem in ML. dfpn does not claim to solve it. What it does instead is dilute the impact: an adversarial example that defeats one model in the pool is unlikely to defeat all of them, and reputation-weighted aggregation reduces the influence of any single bad call.

Deepfake generation improves faster than model updates. This is true on long time horizons and dfpn does not pretend otherwise. The mitigation is structural: rewarding new model registrations, slashing detectors that fall behind, and rotating benchmarks. The network is built to absorb new generators by absorbing new detectors. That is the best a detection system can offer; perfect defense is not on the table.

Privacy risks if media is shared insecurely off-chain. Storage providers and operators see the media being analyzed. If the storage layer leaks, the privacy of the media is compromised. dfpn’s job is the verdict, not the privacy of the artifact in transit. Clients with privacy-sensitive media need to think about how they store it.

This kind of honesty is the marker of a serious threat model. The point is not to claim victory; the point is to map the terrain.

What is out of scope, and why

The doc closes with a short list of things dfpn explicitly does not try to do.

Fully private on-chain inference is out of scope. dfpn coordinates inference; it does not run it. ZK-style private inference is its own research domain and not what this protocol is for.

Content takedown enforcement is out of scope. dfpn produces verdicts. Platforms decide what to do with them. The protocol does not adjudicate hosting, distribution, or removal.

Legal attribution or prosecution is out of scope. dfpn does not identify originators, attach legal liability, or hand evidence to authorities. It produces a technical verdict on a piece of media. Anything beyond that is a policy question, not a protocol question.

These exclusions are doing real work. A protocol that tried to be a detection layer, a moderation tool, a takedown service, and a court would do all four badly. Naming what is not on the table is a kind of design discipline. dfpn does the detection job and leaves the rest to the institutions whose job it is.

Why this matters when you integrate

If you are evaluating dfpn for production use, the threat model is the most important document in the repository. It tells you what the protocol will defend against, what assumptions it makes, what risks it acknowledges, and what it deliberately leaves for someone else to solve.

Read it before the marketing site. Read it before the architecture doc. Read it before integrating. A protocol that publishes a threat model this direct is signaling that it expects to be taken seriously; the right thing to do in return is take it that seriously.