fuzzytags/ANONYMITY.md

## Integrating FuzzyTags

The properties provided by this system are highly dependent on selecting a false positive rate _p_. In the following
sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger
privacy preserving application.

### How bad is it to let people select their own false-positive rates?

The short answer is "it depends". 

The longer answer:

When different parties have different false positive rates the server can calculate the skew between a party's ideal 
false positive rate and observed false positive rate. 

That skew leaks information, especially given certain message distributions. Specifically it leaks parties
 who receive a larger proportion of system messages than their ideal false positive rate.

i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server
 can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.

It *also* removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.
Effectively reducing the anonymity set for everyone else.

Which brings us onto:

### Differential Attacks

Any kind of differential attacks break this scheme, even for a small number of messages i.e. if you learn (through
any means) that a specific set of messages are all likely for 1 party, you can diff them against all other parties keys and 
very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages  - even 
with everyone selecting fairly high false positive rates. 

The corollary of the above being that in differential attacks your anonymity set is basically the number of users 
who download all messages - since you can't diff them. This has the interesting side effect: the more parties who 
download everything, the more the system can safely tolerate parties with small false-positive rates.

To what extent you can actually account for this in your application is an open question.

### Should Senders use an anonymous communication network?

If differential attacks are likely e.g. few parties download everything and 
multiple messages are expected to originate from a sender to a receiver or there
is other information that might otherwise link a set of messages to a receiver then you may want to consider how
to remove that context.

One potential way of removing context is by having senders send their message to the server through some kind of anonymous
communication network e.g. a mixnet or tor.

Be warned: This may not eliminate all the context! 

### How bad is it to select a poor choice of _p_?

Consider a _pareto distribution_ where most users only receive a few messages, and small subset of users 
receive a large number of messages it seems that increasing the number of parties is
 generally more important to overall anonymity of the system than any individual selection of _p_. 
 
Under a certain threshold  of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern. 

Assuming we have large number of parties (_N_), the following heuristic emerges:

* Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up
to a threshold _θ_, where _θ > 2^-N_. The lower the value of _θ_ the greater the possibility of random trivial breaks for
the party.
* Parties who expect a large number of messages should choose to receive **all** messages for 2 reasons:
    1) Even high false positive rates for power users result in information leaks to the server (due to the large
    skew) i.e. a server can trivially learn what users are power users.
    2) By choosing to receive all messages, power users don't sacrifice much in term of bandwidth, but will provide
    cover for parties who receive a small number of messages and who want a lower false-positive rate.

(We consider a pareto distribution here because we expect many applications to have parties that can be
modelled as such - especially over short-time horizons)
New Integration Notes 2021-01-31 21:21:44 +00:00			`## Integrating FuzzyTags`

			`The properties provided by this system are highly dependent on selecting a false positive rate _p_. In the following`
			`sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger`
			`privacy preserving application.`

			`### How bad is it to let people select their own false-positive rates?`

			`The short answer is "it depends".`

			`The longer answer:`

Spelling and clarity fixes 2021-01-31 21:59:34 +00:00			`When different parties have different false positive rates the server can calculate the skew between a party's ideal`
New Integration Notes 2021-01-31 21:21:44 +00:00			`false positive rate and observed false positive rate.`

			`That skew leaks information, especially given certain message distributions. Specifically it leaks parties`
			`who receive a larger proportion of system messages than their ideal false positive rate.`

			`i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server`
Spelling and clarity fixes 2021-01-31 21:59:34 +00:00			`can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.`
New Integration Notes 2021-01-31 21:21:44 +00:00
			`It also removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.`
			`Effectively reducing the anonymity set for everyone else.`

			`Which brings us onto:`

			`### Differential Attacks`

			`Any kind of differential attacks break this scheme, even for a small number of messages i.e. if you learn (through`
doctests 2021-01-31 23:42:37 +00:00			`any means) that a specific set of messages are all likely for 1 party, you can diff them against all other parties keys and`
New Integration Notes 2021-01-31 21:21:44 +00:00			`very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages - even`
			`with everyone selecting fairly high false positive rates.`

			`The corollary of the above being that in differential attacks your anonymity set is basically the number of users`
			`who download all messages - since you can't diff them. This has the interesting side effect: the more parties who`
			`download everything, the more the system can safely tolerate parties with small false-positive rates.`

			`To what extent you can actually account for this in your application is an open question.`

			`### Should Senders use an anonymous communication network?`

			`If differential attacks are likely e.g. few parties download everything and`
doctests 2021-01-31 23:42:37 +00:00			`multiple messages are expected to originate from a sender to a receiver or there`
New Integration Notes 2021-01-31 21:21:44 +00:00			`is other information that might otherwise link a set of messages to a receiver then you may want to consider how`
			`to remove that context.`

			`One potential way of removing context is by having senders send their message to the server through some kind of anonymous`
			`communication network e.g. a mixnet or tor.`

			`Be warned: This may not eliminate all the context!`

			`### How bad is it to select a poor choice of _p_?`

			`Consider a _pareto distribution_ where most users only receive a few messages, and small subset of users`
			`receive a large number of messages it seems that increasing the number of parties is`
			`generally more important to overall anonymity of the system than any individual selection of _p_.`

			`Under a certain threshold of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern.`

			`Assuming we have large number of parties (_N_), the following heuristic emerges:`

			`* Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up`
Spelling and clarity fixes 2021-01-31 21:59:34 +00:00			`to a threshold _θ_, where _θ > 2^-N_. The lower the value of _θ_ the greater the possibility of random trivial breaks for`
New Integration Notes 2021-01-31 21:21:44 +00:00			`the party.`
			`* Parties who expect a large number of messages should choose to receive all messages for 2 reasons:`
			`1) Even high false positive rates for power users result in information leaks to the server (due to the large`
			`skew) i.e. a server can trivially learn what users are power users.`
			`2) By choosing to receive all messages, power users don't sacrifice much in term of bandwidth, but will provide`
			`cover for parties who receive a small number of messages and who want a lower false-positive rate.`

			`(We consider a pareto distribution here because we expect many applications to have parties that can be`
			`modelled as such - especially over short-time horizons)`