2021-01-31 21:21:44 +00:00
|
|
|
## Integrating FuzzyTags
|
|
|
|
|
|
|
|
The properties provided by this system are highly dependent on selecting a false positive rate _p_. In the following
|
|
|
|
sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger
|
|
|
|
privacy preserving application.
|
|
|
|
|
|
|
|
### How bad is it to let people select their own false-positive rates?
|
|
|
|
|
|
|
|
The short answer is "it depends".
|
|
|
|
|
|
|
|
The longer answer:
|
|
|
|
|
2021-01-31 21:59:34 +00:00
|
|
|
When different parties have different false positive rates the server can calculate the skew between a party's ideal
|
2021-01-31 21:21:44 +00:00
|
|
|
false positive rate and observed false positive rate.
|
|
|
|
|
|
|
|
That skew leaks information, especially given certain message distributions. Specifically it leaks parties
|
|
|
|
who receive a larger proportion of system messages than their ideal false positive rate.
|
|
|
|
|
|
|
|
i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server
|
2021-01-31 21:59:34 +00:00
|
|
|
can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.
|
2021-01-31 21:21:44 +00:00
|
|
|
|
|
|
|
It *also* removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.
|
|
|
|
Effectively reducing the anonymity set for everyone else.
|
|
|
|
|
|
|
|
Which brings us onto:
|
|
|
|
|
|
|
|
### Differential Attacks
|
|
|
|
|
|
|
|
Any kind of differential attacks break this scheme, even for a small number of messages i.e. if you learn (through
|
2021-01-31 23:42:37 +00:00
|
|
|
any means) that a specific set of messages are all likely for 1 party, you can diff them against all other parties keys and
|
2021-01-31 21:21:44 +00:00
|
|
|
very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages - even
|
|
|
|
with everyone selecting fairly high false positive rates.
|
|
|
|
|
|
|
|
The corollary of the above being that in differential attacks your anonymity set is basically the number of users
|
|
|
|
who download all messages - since you can't diff them. This has the interesting side effect: the more parties who
|
|
|
|
download everything, the more the system can safely tolerate parties with small false-positive rates.
|
|
|
|
|
|
|
|
To what extent you can actually account for this in your application is an open question.
|
|
|
|
|
|
|
|
### Should Senders use an anonymous communication network?
|
|
|
|
|
|
|
|
If differential attacks are likely e.g. few parties download everything and
|
2021-01-31 23:42:37 +00:00
|
|
|
multiple messages are expected to originate from a sender to a receiver or there
|
2021-01-31 21:21:44 +00:00
|
|
|
is other information that might otherwise link a set of messages to a receiver then you may want to consider how
|
|
|
|
to remove that context.
|
|
|
|
|
|
|
|
One potential way of removing context is by having senders send their message to the server through some kind of anonymous
|
|
|
|
communication network e.g. a mixnet or tor.
|
|
|
|
|
|
|
|
Be warned: This may not eliminate all the context!
|
|
|
|
|
|
|
|
### How bad is it to select a poor choice of _p_?
|
|
|
|
|
|
|
|
Consider a _pareto distribution_ where most users only receive a few messages, and small subset of users
|
|
|
|
receive a large number of messages it seems that increasing the number of parties is
|
|
|
|
generally more important to overall anonymity of the system than any individual selection of _p_.
|
|
|
|
|
|
|
|
Under a certain threshold of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern.
|
|
|
|
|
|
|
|
Assuming we have large number of parties (_N_), the following heuristic emerges:
|
|
|
|
|
|
|
|
* Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up
|
2021-01-31 21:59:34 +00:00
|
|
|
to a threshold _θ_, where _θ > 2^-N_. The lower the value of _θ_ the greater the possibility of random trivial breaks for
|
2021-01-31 21:21:44 +00:00
|
|
|
the party.
|
|
|
|
* Parties who expect a large number of messages should choose to receive **all** messages for 2 reasons:
|
|
|
|
1) Even high false positive rates for power users result in information leaks to the server (due to the large
|
|
|
|
skew) i.e. a server can trivially learn what users are power users.
|
|
|
|
2) By choosing to receive all messages, power users don't sacrifice much in term of bandwidth, but will provide
|
|
|
|
cover for parties who receive a small number of messages and who want a lower false-positive rate.
|
|
|
|
|
|
|
|
(We consider a pareto distribution here because we expect many applications to have parties that can be
|
|
|
|
modelled as such - especially over short-time horizons)
|