fuzzytags-book/src/deploying-fuzzytags.md

## Deploying FuzzyTags Securely

The properties provided by this system are highly dependent on selecting a false positive rate _p_. In the following
sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger
privacy preserving application.

### How bad is it to let people select their own false-positive rates?

The short answer is "it depends". 

The longer answer:

When different parties have different false positive rates the server can calculate the skew between a party's ideal 
false positive rate and observed false positive rate. 

That skew leaks information, especially given certain message distributions. Specifically it leaks parties
 who receive a larger proportion of system messages than their ideal false positive rate.

i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server
 can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.

It *also* removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.
Effectively reducing the anonymity set for everyone else.

### Should Senders use an anonymous communication network?

If statistical & differential attacks are likely e.g. few parties download everything and 
multiple messages are expected to originate from a sender to a receiver or there
is other information that might otherwise link a set of messages to a sender or receiver then you may want to consider how
to remove that context.

One potential way of removing context is by having senders send their message to the server through some kind of anonymous
communication network e.g. a mixnet or tor.

Be warned: This may not eliminate all the context! 

### How bad is it to select a poor choice of _p_?

Consider a _pareto distribution_ where most users only receive a few messages, and small subset of users 
receive a large number of messages it seems that increasing the number of parties is
 generally more important to overall anonymity of the system than any individual selection of _p_. 
 
Under a certain threshold  of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern. 

Assuming we have large number of parties (_N_), the following heuristic emerges:

* Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up
to a threshold _θ_, where _θ > 2^-N_. The lower the value of _θ_ the greater the possibility of random trivial breaks for
the party.
* Parties who expect a large number of messages should choose to receive **all** messages for 2 reasons:
    1) Even high false positive rates for power users result in information leaks to the server (due to the large
    skew) i.e. a server can trivially learn what users are power users.
    2) By choosing to receive all messages, power users don't sacrifice much in terms of bandwidth, but will provide
    cover for parties who receive a small number of messages and who want a lower false-positive rate.

(We consider a pareto distribution here because we expect many applications to have parties that can be
modelled as such - especially over short-time horizons)
Initial Commit 2021-02-16 23:28:15 +00:00			`## Deploying FuzzyTags Securely`

			`The properties provided by this system are highly dependent on selecting a false positive rate _p_. In the following`
			`sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger`
			`privacy preserving application.`

			`### How bad is it to let people select their own false-positive rates?`

			`The short answer is "it depends".`

			`The longer answer:`

			`When different parties have different false positive rates the server can calculate the skew between a party's ideal`
			`false positive rate and observed false positive rate.`

			`That skew leaks information, especially given certain message distributions. Specifically it leaks parties`
			`who receive a larger proportion of system messages than their ideal false positive rate.`

			`i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server`
			`can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.`

			`It also removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.`
			`Effectively reducing the anonymity set for everyone else.`

			`### Should Senders use an anonymous communication network?`

			`If statistical & differential attacks are likely e.g. few parties download everything and`
			`multiple messages are expected to originate from a sender to a receiver or there`
			`is other information that might otherwise link a set of messages to a sender or receiver then you may want to consider how`
			`to remove that context.`

			`One potential way of removing context is by having senders send their message to the server through some kind of anonymous`
			`communication network e.g. a mixnet or tor.`

			`Be warned: This may not eliminate all the context!`

			`### How bad is it to select a poor choice of _p_?`

			`Consider a _pareto distribution_ where most users only receive a few messages, and small subset of users`
			`receive a large number of messages it seems that increasing the number of parties is`
			`generally more important to overall anonymity of the system than any individual selection of _p_.`

			`Under a certain threshold of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern.`

			`Assuming we have large number of parties (_N_), the following heuristic emerges:`

			`* Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up`
			`to a threshold _θ_, where _θ > 2^-N_. The lower the value of _θ_ the greater the possibility of random trivial breaks for`
			`the party.`
			`* Parties who expect a large number of messages should choose to receive all messages for 2 reasons:`
			`1) Even high false positive rates for power users result in information leaks to the server (due to the large`
			`skew) i.e. a server can trivially learn what users are power users.`
			`2) By choosing to receive all messages, power users don't sacrifice much in terms of bandwidth, but will provide`
			`cover for parties who receive a small number of messages and who want a lower false-positive rate.`

			`(We consider a pareto distribution here because we expect many applications to have parties that can be`
			`modelled as such - especially over short-time horizons)`