Intial Draft

2021-02-02 16:00:42 -08:00 · 2021-02-02 16:00:42 -08:00 · 8da3e31b5c
commit 8da3e31b5c
3 changed files with 156 additions and 0 deletions
--- a/fuzzy.bib
+++ b/fuzzy.bib
@ -0,0 +1,7 @@
@misc{cryptoeprint:2021:089,
    author = {Gabrielle Beck and Julia Len and Ian Miers and Matthew Green},
    title = {Fuzzy Message Detection},
    howpublished = {Cryptology ePrint Archive, Report 2021/089},
    year = {2021},
    note = {\url{https://eprint.iacr.org/2021/089}},
 }
--- a/paper.pdf
+++ b/paper.pdf
--- a/paper.tex
+++ b/paper.tex
@ -0,0 +1,149 @@
 \documentclass[10pt,a4paper,twocolumn]{article}
 \usepackage[utf8]{inputenc}
 \usepackage{url}
 \usepackage[n,
 advantage,
 operators,
 sets,
 adversary,
 landau,
 probability,
 notions,
 logic,
 ff,
 mm,
 primitives,
 events,
 complexity,
 keys,
 asymptotics]{cryptocode}
 \title{Entangled Fuzzy Tags and their Applications}
 \author{Sarah Jamie Lewis}
 \usepackage{draftwatermark}
 \SetWatermarkText{Draft}
 \SetWatermarkScale{5}
 \begin{document}
 \newcommand*{\Prob}{\mathsf{Pr}}
 \maketitle
 \section{Introduction}
 Fuzzy Message Detection\cite{cryptoeprint:2021:089} presents a mechanism
 for constructing probabilistic, cryptographic tags for use in metadata resistant
 systems.
 Crucially, unlike bucketing based mechanisms,
 these tags allow parties to set their own false positive rates, creating a number of
 distinct classes of parties (from those who match and download everything, to those with a very low false positive rate who match and download only messages intended for them).
 In this note we address several of the questions in the original paper, and outline a number of extensions to "fuzzytags" that allow the \texttt{FMD2} scheme to be used for multiparty broadcast and deniable sending through the construction of "entangled" tags, that are valid tags for more than one public key.
 \section{Choosing a false positive rate $p$}
 When different parties have different false positive rates the server can calculate the skew, $\Delta$,  between a party's ideal false positive rate $p$ and observed false positive rate $\phi$.
 That skew leaks information, especially given certain message distributions. Specifically it leaks parties who receive a larger proportion of system messages than their ideal false positive rate i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.
 It also removes those messages from the pool of messages that an adversarial server needs to consider for other receivers. Effectively reducing the anonymity set for everyone else.
 \subsection{Intersection Attacks}
 Without a significant number of parties downloading everything, any kind of differential attack breaks the fuzzy tag scheme, even for a small number of messages i.e. if you learn (through any means) that a specific set of messages are all likely for 1 party, you can compare them against the entire set of detection keys and very quickly isolate the intended recipient through intersection of sets of the keys that validate - in simulations\footnote{A playground simulator for fuzzytags based can be found at \url{https://git.openprivacy.ca/openprivacy/fuzzytags-sim}} of 100-1000 parties,$\gamma=24$, and $p=[1..4]$ it can take as little as 3 messages to isolate a detection key - even with all parties selecting fairly high false positive rates.
 The corollary of the above being that in differential attacks your anonymity set is basically the number of users who download all messages. This has the interesting side effect: the more parties who download everything, the more the system can safely tolerate parties with small false-positive rates.
 \subsection{Parties that download everything}
 As such, parties who expect a large number of messages should choose to receive all messages for 2 reasons:
 \begin{enumerate}
    \item Even high false positive rates for power users result in information leaks to the server (due to the large $\Delta$) i.e. a server can trivially learn what users are "power" users.
    \item By choosing to receive all messages, power users don't sacrifice much in terms of bandwidth, but will provide cover traffic for parties who receive a small number of messages and who want a lower false-positive rate.
 \end{enumerate}
 \section{Entangled Fuzzy Tags}
 Due to their unique properties, it is possible to forge tags that validate against multiple distinct detection keys.
 The probability of generating a tag that validates against $n$ distinct detection keys of max length $l$ is $(2^{-l})^{n-1}$, i.e. a sender should expect to find a suitable tag in $\frac{1}{(2^{-l})^{n-1}}$ generations (see Figure:\ref{fig:code} for a method to find suitable tags given a set of public keys.).
 Because each party determines their own false positive rate, it may not always be necessary for a sender to generate a tag that will be guaranteed to match up to the system parameter $\gamma$, instead they could generate a partially entangled fuzzy tag that would validate against \textit{any} detection key from one party, and only detection keys with high false positive tolerances for other parties.
 \begin{figure}[h!]
 \begin{pchstack}[[boxed , center , space=1em]
 \procedure[linenumbering] {\texttt{FlagEntangled}($\kappa = \set{\pk_1 \ldots \pk_n}$)} {
 	g, h_{i1} \ldots h_{i\gamma}  \leftarrow \pk_i  : \forall \pk_i \in \kappa \\
 	r \sample \ZZ_q \\
 	u \leftarrow g^u \\
 	z \sample \ZZ_q \\
 	w \leftarrow g^z \\
 	\pcfor h \in \kappa : \\
 	\t[1] \pcfor j \in [\gamma] :\\
 	\t[2] k_{ij} = H(u \concat h_{j}^r \concat w) \\
 	\t[1] \pcendfor \\
 	\pcendfor \\
 	\pcif k_{0} = k_{1} = k_{2} \ldots = k_{n} \\
 	\pccomment{derive the tag as in the original protocol}.\\
 	\pccomment{using $k_{0j}$ as the key for part $j$.} \\
 	\pcelse \\
 	\pccomment{goto 4}  \\
 	\pcendif 
 }
 \end{pchstack}
 \caption{Pseudocode for deriving \texttt{FMD2}-compatible tags that can be verified by multiple detection keys, each related to a distinct public key. All functions as defined in \cite{cryptoeprint:2021:089}. Several performance improvements are possible e.g. caching the result of $h_{j}^r$ to avoid duplicate group operations, and testing key equality earlier.}
 \label{fig:code}
 \end{figure}
 \subsection{Applications}
 In this section we outline a number of applications enabled by entangled fuzzy tags.
 \subsection{Multiparty Broadcast}
 Alice wants to send a message to Bob and Carol. She constructs a single tag that will validate against detection keys generated by both of them.
 When an adversarial server matches the tag against all the keys it knows about it will discover that the tag matches both Bob and Carol (in addition to some number of false positives depending on the false positive rates of all the other parties using the server).
 To construct such a tag Alice runs \texttt{FlagEntangled(}$\set{\pk_{\text{bob}}, \pk_{\text{carol}}}$\texttt{)}.
 The adversarial server will match the tag to the detection keys of both Bob and Carol. The server has no way of determining if the match is a broadcast to both parties, a unique message to one of Bob or Carol or a false positive for both.
 \subsection{Deniable Sending}
 Alice wants to send a message to Carol, but is concerned that Carol may have a detection key with too low false positive rate. Alice knows of a set of parties (and their public keys) who also use the adversarial server to send privacy messages. Alice searches for a tag that will validate against detection keys generated not only by Carol but a randomly selected party e.g. Eve.
 When an adversarial server matches the tag against all the keys it knows about it will discover that the tag matches both Carol and Eve (in addition to some number of false positives depending on the false positive rates of all the other parties using the server).
 Even if the server was to isolate this specific message as originating from Alice, they would not be able to derive the recipient through any kind of differential attack (as all attacks would also implicate Eve).
 To construct such a tag Alice runs \texttt{FlagEntangled(}$\set{\pk_{\text{carol}}, \pk_{\text{eve}}}$\texttt{)}.
 Alice could choose to entangle all of her messages to Carol in this way, fully implicating Eve in her message sending regardless of Eve's false positive rate. If Eve attempted to decrypt the message she would not be able to and might assume that the tag was an unlikely false positive - as such too many of these messages might cause Eve to be suspicious. However, Eve might be a well known service or bot integrated with the privacy preserving application - allowing Alice cover without worrying about triggering suspicion.
 Alice could also choose to entangle each message with a different random party.
 While this strategy is, by itself, vulnerable to intersection attacks; it increases the number of potential relationships any adversary needs to rule out in order to derive the resulting metadata from the communication.
 When combined with a large number of parties downloading all messages (or even downloading with high false positive rates) this strategy has the effect of increasing the anonymity set of the entire system.
 It is worth nothing at this point that strategies can be combined, and their effects compound. When given a tag and a set of matches an adversarial server cannot distinguish between a true and false positive, an entangled deniable send or a group broadcast - or a combination!
 \subsection{Forging False Positives}
 Alice wants to send a message to Carol, but also wants to implicate Eve to the server.  However Alice doesn't have enough time or computing power to generate a tag that will fully match against Eve's full $\gamma$-length key.
 Instead Alice forges an entangled tag by running  \texttt{FlagEntangled(}$\set{\pk_{\text{carol}}, \pk_{\text{eve}}}$\texttt{)}, how instead of checking all parts of the key at line $11$ she instead only checks up to a value $l$ that she believes is greater than or equal to the false positive rate of Eves detection key.
 To the server the tag would match both to the detection key of both Carol and Eve, but when fetched Eve would assume it was a false positive (a much more likely one than in our previous example).
 \bibliography{fuzzy}
 \bibliographystyle{plain}
 \end{document}