From c5a96de77d51cebbf5d8106a6072c07dc5aaa758 Mon Sep 17 00:00:00 2001
From: Sarah Jamie Lewis <sarah@openprivacy.ca>
Date: Fri, 19 Feb 2021 10:30:42 -0800
Subject: [PATCH] Updating Attacks

---
 src/SUMMARY.md              |  3 +++
 src/deploying-fuzzytags.md  | 29 -----------------------------
 src/intersection-attacks.md | 14 ++++++++++++++
 src/risk-model.md           | 11 +++++++++++
 src/statistical-attacks.md  | 22 ++++++++++++++++++++++
 5 files changed, 50 insertions(+), 29 deletions(-)
 create mode 100644 src/intersection-attacks.md
 create mode 100644 src/risk-model.md
 create mode 100644 src/statistical-attacks.md

diff --git a/src/SUMMARY.md b/src/SUMMARY.md
index b42ba1e..eb61b0c 100644
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -2,6 +2,9 @@
 
 - [Introduction](introduction.md)
 - [Terminology](./terminology.md)
+- [Risk Model](./risk-model.md)
+    - [Intersection Attacks](./intersection-attacks.md)
+    - [Statistical Attacks](./statistical-attacks.md)
 - [Deploying Fuzzytags Securely](./deploying-fuzzytags.md)
 - [Entangled Tags](./entangled.md)
     - [Entangling Strategies](./entangling-strategies.md)
diff --git a/src/deploying-fuzzytags.md b/src/deploying-fuzzytags.md
index a2e3299..c0a0158 100644
--- a/src/deploying-fuzzytags.md
+++ b/src/deploying-fuzzytags.md
@@ -22,35 +22,6 @@ i.e. for low false positive rates and high message volume for a specific receive
 It *also* removes those messages from the pool of messages that an adversarial server needs to consider for other receivers.
 Effectively reducing the anonymity set for everyone else.
 
-Which brings us onto:
-
-### Differential Attacks
-
-Any kind of differential attacks break this scheme, even for a small number of messages i.e. if you learn (through
-any means) that a specific set of messages are all likely for 1 party, you can diff them against all other parties keys and 
-very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages  - even 
-with everyone selecting fairly high false positive rates. 
-
-The corollary of the above being that in differential attacks your anonymity set is basically the number of users 
-who download all messages - since you can't diff them. This has the interesting side effect: the more parties who 
-download everything, the more the system can safely tolerate parties with small false-positive rates.
-
-To what extent you can actually account for this in your application is an open question.
-
-### Statistical Attacks
-
-Using some basic binomial probability we can use the false positive rate of reach receiver tag to calculate
-the probability of matching on at least X tags given the false positive rate. Using this we can find statistically
-unlikely matches e.g. a low-false positive key matching many tags in a given period.
-
-This can be used to find receivers who likely received messages in a given period.
-
-If it is possible to group tags by sender then we can perform a slightly better attack and ultimately learn the
-underlying social graph with fairly low false positive rates (in simulations we can learn 5-10% of the underlying
-connections with between 5-12% false positive rates.)
-
-For more information on statistical attacks please check out our [fuzzytags simulator](https://git.openprivacy.ca/openprivacy/fuzzytags-sim).
-
 ### Should Senders use an anonymous communication network?
 
 If statistical & differential attacks are likely e.g. few parties download everything and 
diff --git a/src/intersection-attacks.md b/src/intersection-attacks.md
new file mode 100644
index 0000000..299f2cb
--- /dev/null
+++ b/src/intersection-attacks.md
@@ -0,0 +1,14 @@
+# Intersection Attacks
+
+One of the most basic attacks primitives for any kind of false positive based scheme is an intersection attack, where
+the set of peers that match one tag is intersected against another set of peets that matches a different tag.
+
+Any kind of intersection attacks break this scheme, even for a small number of messages i.e. if you learn (through
+any means) that a specific set of messages are all likely for a single party, you can diff them against all other parties keys and 
+very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages  - even 
+with everyone selecting fairly high false positive rates. 
+
+The corollary of the above being that in intersection attacks your anonymity set is the number of users 
+who download all messages. This has the interesting side effect: the more parties who download everything, 
+the more the system can safely tolerate parties with small false-positive rates.
+
diff --git a/src/risk-model.md b/src/risk-model.md
new file mode 100644
index 0000000..5366b7b
--- /dev/null
+++ b/src/risk-model.md
@@ -0,0 +1,11 @@
+# Risk Model
+
+In this section we will document the risk model and attacks that will likely impact practical deployments 
+of fuzzytags.
+
+We assume a set of $n$ parties sending tags and messages through an untrusted server.
+
+For some of the analysis in this notebook we will assume that the server has knowledge of the sender of each tag. 
+
+While this is not desirable in real-world deployments, being able to match tags to senders does allow us to derive 
+bounds on worst case metadata leakage. 
\ No newline at end of file
diff --git a/src/statistical-attacks.md b/src/statistical-attacks.md
new file mode 100644
index 0000000..e4bcd0a
--- /dev/null
+++ b/src/statistical-attacks.md
@@ -0,0 +1,22 @@
+# Statistical Attacks
+
+Using some basic binomial probability we can use the false positive rate of reach receiver tag to calculate
+the probability of matching on at least X tags given the false positive rate. Using this we can find statistically
+unlikely matches e.g. a low-false positive key matching many tags in a given period.
+
+This can be used to find receivers who likely received messages in a given period. This attack works regardless
+of how many parties are downloading everything, and is entirely dependent on the receiver choosing $p$ that is 
+suboptimal for the number of messages they receive.
+
+## Deriving the Social Graph 
+
+If it is possible to group tags by sender then we can perform a slightly better attack and ultimately learn the
+underlying social graph with fairly low false positive rates (in [simulations](./simulations.md) we can learn 5-10% of 
+the underlying connections with between 5-12% false positive rates.)
+
+The method is the same as above, but look at the probability that a party would have matched at least X tags _from a 
+specific sender_ given their false positive rate.
+
+Notably, this latter attack reveals something important about choosing parameters for fuzzytags - _p_ must be chosen
+to take into account not just total number of messages, but total number of messages from a given source (or it
+must be assumed that the server will never be able to isolate tags from a given sender)
\ No newline at end of file