feed + labels

This commit is contained in:
Sarah Jamie Lewis 2021-08-11 11:40:40 -07:00
parent bf95da4be1
commit 34770915e8
7 changed files with 94 additions and 34 deletions

20
feed.xml Normal file
View File

@ -0,0 +1,20 @@
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>pseudorandom</title>
<link href="https://pseudorandom.resistant.tech"/>
<link rel="self" href="https://pseudorandom.resistant.tech/feed.xml" />
<updated>2021-08-10T14:30:00Z</updated>
<author>
<name>Sarah Jamie Lewis</name>
</author>
<id>urn:uuid:699b0ba2-2fbf-4f9d-b5ac-4a7e044be3c6</id>
<entry>
<id>urn:uuid:fc37f259-004e-406b-addb-85cda6107e7b</id>
<title>Obfuscated Apples</title>
<link href="https://pseudorandom.resistant.tech/obfuscated_apples.html"/>
<updated>2021-08-10T14:30:00Z</updated>
<summary>Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise....</summary>
</entry>
</feed>

View File

@ -13,6 +13,7 @@
<meta property="og:title" content="$TITLE" />
<meta name="twitter:image" content="https://pseudorandom.resistant.tech/$LINK.png">
<link rel="alternate" type="application/atom+xml" href="/feed.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="styles.css">
@ -33,6 +34,7 @@
<a href="./index.html">home</a>
<a href="mailto:sarah@openprivacy.ca">email</a>
<a href="cwtch:icyt7rvdsdci42h6si2ibtwucdmjrlcb2ezkecuagtquiiflbkxf2cqd">cwtch</a>
<a href="/feed.xml">atom</a>
</nav>
</header>
<article>

View File

@ -9,10 +9,11 @@
<meta name="twitter:site" content="@sarahjamielewis" />
<meta name="twitter:creator" content="@sarahjamielewis" />
<meta property="og:url" content="https://pseudorandom.resistant.tech/index.html" />
<meta property="og:description" content="A site by Sarah Jamie Lewis" />
<meta property="og:description" content="A site by Sarah Jamie L" />
<meta property="og:title" content="Welcome!" />
<meta name="twitter:image" content="https://pseudorandom.resistant.tech/index.png">
<link rel="alternate" type="application/atom+xml" href="/feed.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="styles.css">
@ -33,6 +34,7 @@
<a href="./index.html">home</a>
<a href="mailto:sarah@openprivacy.ca">email</a>
<a href="cwtch:icyt7rvdsdci42h6si2ibtwucdmjrlcb2ezkecuagtquiiflbkxf2cqd">cwtch</a>
<a href="/feed.xml">atom</a>
</nav>
</header>
<article>
@ -44,7 +46,7 @@
<h2>
Recent Articles
</h2>
<p>2021-08-10 <a href="obfuscated_apples.html">Obfuscated Apples</a><br></p>
<p>2021-08-11 <a href="obfuscated_apples.html">Obfuscated Apples</a><br></p>
<footer>
Sarah Jamie Lewis
</footer>

View File

@ -9,10 +9,11 @@
<meta name="twitter:site" content="@sarahjamielewis" />
<meta name="twitter:creator" content="@sarahjamielewis" />
<meta property="og:url" content="https://pseudorandom.resistant.tech/obfuscated_apples.html" />
<meta property="og:description" content="Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise. " />
<meta property="og:description" content="Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise" />
<meta property="og:title" content="Obfuscated Apples" />
<meta name="twitter:image" content="https://pseudorandom.resistant.tech/obfuscated_apples.png">
<link rel="alternate" type="application/atom+xml" href="/feed.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="styles.css">
@ -33,31 +34,32 @@
<a href="./index.html">home</a>
<a href="mailto:sarah@openprivacy.ca">email</a>
<a href="cwtch:icyt7rvdsdci42h6si2ibtwucdmjrlcb2ezkecuagtquiiflbkxf2cqd">cwtch</a>
<a href="/feed.xml">atom</a>
</nav>
</header>
<article>
<h1 id="obfuscated-apples">Obfuscated Apples</h1>
<p>Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise.</p>
<p>Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise<em class="footnotelabel"></em>.</p>
<p class="sidenote">
if you take anything away from this article please let it be this fact.
</p>
<p>Sadly, most people operate under the assumption that adding noise to a system is all that it takes to make the signal unrecoverable. This logic is very clearly in operation in Apples new proposal for on-device scanning <a class="sidenote" href="https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf">technical summary</a> which, among other things, proposes generating <em>synthetic</em> matches to hide the true number of <em>real</em> matches in the system.</p>
I want to take this opportunity to break down how this kind of obfuscation can be defeated even when not considering the fact that it is <strong>Apple themselves who are charged with generating and maintaining the safety parameters of the system</strong>.
<p>Sadly, most people operate under the assumption that adding noise to a system is all that it takes to make the signal unrecoverable. This logic is very clearly in operation in Apples new proposal for on-device scanning<em class="footnotelabel"></em> <a class="sidenote" href="https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf">technical summary</a> which, among other things, proposes generating <em>synthetic</em> matches to hide the true number of <em>real</em> matches in the system.</p>
I want to take this opportunity to break down how this kind of obfuscation can be defeated even when not considering the fact that it is <strong>Apple themselves who are charged with generating and maintaining the safety parameters of the system</strong><em class="footnotelabel"></em>.
<p class="sidenote">
i.e. even if we treat the people who design and build this system as honest adversaries.
</p>
<h2 id="sketching-a-basic-scheme">Sketching a Basic Scheme</h2>
<p>For the sake of clarity I will omit the technical details of the private set intersection protocol, and the threshold scheme, and we will operate under the assumption that both are cryptographically secure. We will also assume that the database of images to compare is <em>incorruptible</em> <span class="sidenote">This is clearly not the case.</span>.</p>
<p>At the heart of system is a (mostly) black box that contains a <strong>perceptual</strong> hash function that analyzes an image and spits out a hash, this hash is then compared against a database of known hashes and if a match is found the system reports <code>true</code> and otherwise reports <code>false</code>. <span class="sidenote">As we will see later on, perceptual hashes are <strong>not</strong> cryptographic hashes.<span></p>
<p>For the sake of clarity I will omit the technical details of the private set intersection protocol, and the threshold scheme, and we will operate under the assumption that both are cryptographically secure. We will also assume that the database of images to compare is <em>incorruptible</em><em class="footnotelabel"></em> <span class="sidenote">This is clearly not the case.</span>.</p>
<p>At the heart of system is a (mostly) black box that contains a <strong>perceptual</strong> hash function that analyzes an image and spits out a hash, this hash is then compared against a database of known hashes and if a match is found the system reports <code>true</code> and otherwise reports <code>false</code><em class="footnotelabel"></em>. <span class="sidenote">As we will see later on, perceptual hashes are <strong>not</strong> cryptographic hashes.<span></p>
<p>Throughout this article I will use the term <strong>match</strong> when talking about both true and false positives, though I will mostly assume any matches are false positives.</p>
<p>According to documentation provided by Apple, the server learns any matches occurred on the phone during the PSI protocol.</p>
<p>According to documentation provided by Apple, the server learns any matches occurred on the phone during the PSI protocol.<em class="footnotelabel"></em></p>
<blockquote class="sidenote">
“The output of PSI protocol on the server reveals whether there is a match or not” - Apple Technical Summary
</blockquote>
<p>When a certain threshold of matches are reached, the server gains the ability to decrypt all data associated , a human reviews that data, and a determination is made.</p>
<p>As presented the system above has one major flaw <span class="sidenote">(besides the gross nature of co-opting a personal device as a surveillance system)</span>: the server learns how many matches the device has reported prior to being able to decrypt those matches.</p>
<p>This is obviously very important metadata in the context of the system and as such needs to be protected - if it is not then Apple, or someone who can compel Apple to release the data, can identify potential targets based on this metadata. <span class="sidenote">As we shall soon discuss targeting people in way would be highly irrational if your goal was to actually hunt people doing harm, but people are not rational actors.</span></p>
<p>As presented the system above has one major flaw<em class="footnotelabel"></em> <span class="sidenote">(besides the gross nature of co-opting a personal device as a surveillance system)</span>: the server learns how many matches the device has reported prior to being able to decrypt those matches.</p>
<p>This is obviously very important metadata in the context of the system and as such needs to be protected - if it is not then Apple, or someone who can compel Apple to release the data, can identify potential targets based on this metadata.<em class="footnotelabel"></em> <span class="sidenote">As we shall soon discuss targeting people in way would be highly irrational if your goal was to actually hunt people doing harm, but people are not rational actors.</span></p>
<p>To protect this data Apple relies on the invocation of so-called “Synthetic Vouchers” with the following property:</p>
<blockquote>
“The probability that a device uploads a synthetic voucher instead of a real voucher for an image is calibrated to ensure the total number of synthetics is of the same order of magnitude as the threshold”
@ -66,7 +68,7 @@ i.e. even if we treat the people who design and build this system as honest adv
<p>So, that is it right? Problem averted? Lets not be too hasty…</p>
<p>From what we know so far there are a few interesting parameters in this system that Apple must determine values for.</p>
<p>There is the threshold <span class="math inline"><em>t</em></span> of matches necessary to decrypt the data, there is the probability of a device generating a synthetic match <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> and there is the probability of a false positive match <span class="math inline"><em>P</em>(<code>falsepositive</code>)</span>.</p>
<p>We also know that Apple has constructed these parameters such that the probability of an account being flagged for human review (i.e. when number of matches <span class="math inline"><em>M</em>&gt;<em>t</em></span> is <span class="math inline"><em>P</em>(<em>f</em><em>l</em><em>a</em><em>g</em>)=1<em>e</em><sup>12</sup></span> or one in one trillion.</p>
<p>We also know that Apple has constructed these parameters such that the probability of an account being flagged for human review (i.e. when number of matches <span class="math inline"><em>M</em>&gt;<em>t</em></span> is <span class="math inline"><em>P</em>(<em>f</em><em>l</em><em>a</em><em>g</em>)=1<em>e</em><sup>12</sup></span> or one in one trillion.<em class="footnotelabel"></em></p>
<blockquote class="sidenote">
“The threshold is selected to provide an extremely low (1 in 1 trillion) probability of incorrectly flagging a given account.” - Apple Technical Summary
</blockquote>
@ -74,7 +76,7 @@ i.e. even if we treat the people who design and build this system as honest adv
<p><br /><span class="math display">$$P(\texttt{flag}) = \sum_{\substack{x = t}}^T {T \choose x} \cdot P(\texttt{falsepositive})^x \cdot P(\texttt{falsepositive})^{T - x} \approx 1\mathrm{e}^{-12}$$</span><br /></p>
<p>In order to finalize this we only need to make educated guesses about 2 parameters: the threshold value, <span class="math inline"><em>t</em></span>, and the total number of photos checked per year, <span class="math inline"><em>T</em></span>. Apple throws out the number <span class="math inline"><em>t</em>=10</span> in their technical summary, which seems like a good place to start.</p>
<p>Assuming an average account generates 3-4 pictures a day to be checked then <span class="math inline"><em>T</em>1278</span> over a year. Plugging in those numbers, and we get <span class="math inline"><em>P</em>(<code>falsepositive</code>)0.00035</span> or <strong>1 in 2858</strong>.</p>
<p>Does that number have any relation to reality? There is evidence to suggest <span class="sidenote"><a href="https://arxiv.org/abs/2106.09820">Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning.</a> Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye</span> that the false acceptance rate for common perceptual hashing algorithms is between 0.001-0.01 for a database size of 500K.</p>
<p>Does that number have any relation to reality? There is evidence<em class="footnotelabel"></em> to suggest <span class="sidenote"><a href="https://arxiv.org/abs/2106.09820">Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning.</a> Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye</span> that the false acceptance rate for common perceptual hashing algorithms is between 0.001-0.01 for a database size of 500K.</p>
<p>That makes our guesstimate of 0.00035 an order of magnitude smaller than the most generous empirical estimate. We will be generous and assume Apple broke some new ground with NeuralHash and 0.00035 represents a major improvement in perceptual hashing false acceptance rates.</p>
<p>Given that we can go back and calculate the probability of observing, <span class="math inline"><em>P</em>(<code>match</code>)</span>, a match each day…</p>
<p><br /><span class="math display">$$P(\texttt{match}) = 1 - (( 1 - {0.00035})^{3.5}) \approx {0.001225} \approx \frac{1}{{816}}$$</span><br /></p>
@ -82,7 +84,7 @@ i.e. even if we treat the people who design and build this system as honest adv
<p>Not everybody is every person though, if we applied the same <span class="math inline"><em>P</em>(<code>falsepositive</code>)</span> to a new parent who takes upwards of 50 photos per day, then their <span class="math inline"><em>P</em>(<code>match</code>)</span> is:</p>
<p><br /><span class="math display">$$P(\texttt{match}) = 1 - (( 1 - {0.00035})^{50}) \approx {0.01735} \approx \frac{1}{{57}}$$</span><br /></p>
<p>Or, a match on average every 57 days.</p>
<p>At this point I feel compelled to point out that these are <strong>average</strong> match probabilities. For the prolific photo taking parent who takes 18250 photos a year, the probability that they actually exceed the threshold in false matches is 6% <span class="sidenote">assuming <span class="math inline"><em>t</em></span> is 10</span>.</p>
<p>At this point I feel compelled to point out that these are <strong>average</strong> match probabilities. For the prolific photo taking parent who takes 18250 photos a year, the probability that they actually exceed the threshold in false matches is 6%<em class="footnotelabel"></em> <span class="sidenote">assuming <span class="math inline"><em>t</em></span> is 10</span>.</p>
<p>It is also worth mentioning that even though we ballparked <span class="math inline"><em>t</em></span> and <span class="math inline"><em>T</em></span> there are explicit constraints on what their values can be. If Apple generates a single <span class="math inline"><em>t</em></span> for all accounts, then <span class="math inline"><em>T</em></span> needs to be an approximation on the average number of photos an account stores per year. If Apple generates a different <span class="math inline"><em>t</em></span> value for every account, then it has enough information already to derive <span class="math inline"><em>P</em>(<code>observation</code>)</span> and break its own obfuscation.</p>
<hr/>
<p>Using what we now know we can assess the server side operations show how the observer can calculate the probability of a real match given the probability of any q observation and the probability of a synthetic match.</p>
@ -112,14 +114,14 @@ i.e. even if we treat the people who design and build this system as honest adv
<p>But, what about our prolific “parent” account that stores 50 photos per day? We know that <span class="math inline"><em>P</em>(<code>match</code>)0.01735</span>, allowing Apple, who defines <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> to calculate:</p>
<p><br /><span class="math display">$$P(\texttt{match}| \texttt{observation}) = \frac{(0.01735 \cdot 0.99)}{(0.01735 \cdot 0.99) + (0.98265 \cdot 0.01))} \approx 0.63 $$</span><br /></p>
<p>That is a 63% probability that any reported match is a real match and not a synthetic one!</p>
If Apple define a global <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> then different accounts will naturally have different server-side distributions of observations, and these can be used to tighten the estimates of true matches.
If Apple define a global <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> then different accounts will naturally have different server-side distributions of observations, and these can be used to tighten the estimates of true matches.<em class="footnotelabel"></em>
<p class="sidenote">
And, again, if Apple can define <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> on a per-account basis then they have <strong>more</strong> information to use when tightening these estimates
</p>
<p>The secrecy of this metadata is then is <strong>paradoxically dependent</strong> on both Apple never deriving <span class="math inline"><em>P</em>(<code>observation</code>)</span> for themselves <em>and</em> on Apple generating a distinct <span class="math inline"><em>P</em>(<code>synthetic</code>)</span> for each account. Or rather, the privacy of one of the most sensitive aspects of this system requires Apple both collecting no information on accounts, and also on Apple knowing enough about accounts to derive the parameters necessary to keep the information private.</p>
<hr/>
<p>It is actually much worse than that though.</p>
<p>While priors may start as an unknown <span class="sidenote">(setting aside the fact that Apple already has enough data to derive this themselves)</span>, Apple quickly generates a large amount of data relating to when new observations are made. Since people are different in the ways these take and store photos and live in different parts of the world, the exact probability of them triggering a check is dependent on them.</p>
<p>While priors may start as an unknown<em class="footnotelabel"></em> <span class="sidenote">(setting aside the fact that Apple already has enough data to derive this themselves)</span>, Apple quickly generates a large amount of data relating to when new observations are made. Since people are different in the ways these take and store photos and live in different parts of the world, the exact probability of them triggering a check is dependent on them.</p>
<p>There are also additional discriminating events in the system itself.</p>
<h3 id="matches-over-threshold-without-decryption">Matches over Threshold without Decryption</h3>
<p>One of the most obvious sources of discriminating information is built explicitly into the design. The threshold scheme as proposed requires <span class="math inline"><em>t</em></span> real matches in order to decrypt the inner envelopes containing the matches images.</p>
@ -133,14 +135,14 @@ And, again, if Apple can define <span class="math inline"><em>P</em>(<code>synth
<p>In that kind of environment, no amount of server-defined obfuscation is enough to protect the metadata that the server holds.</p>
<p>In this case that metadata is a rather controversial number i.e. <strong>the number of possible matches to illegal material detected on the device</strong>.</p>
<p>That is interesting metadata to countless entities including the law enforcement and intelligence agencies of multiple jurisdictions and states.</p>
<p>Even if we strictly limit the type of material that Apple is searching for, the high likelihood of false positive events combined with the ease at which Apple can likely distinguish true matching events from synthetic events <span class="sidenote"> (as worked through above)</span> should concern any potential subject of the system.</p>
<p>Even if we strictly limit the type of material that Apple is searching for, the high likelihood of false positive events combined with the ease at which Apple can likely distinguish true matching events from synthetic events<em class="footnotelabel"></em> <span class="sidenote"> (as worked through above)</span> should concern any potential subject of the system.</p>
<p><strong>Innocence is no defense against judgements made using derived metadata</strong>.</p>
</article>
<hr/>
<h2>
Recent Articles
</h2>
<p>2021-08-10 <a href="obfuscated_apples.html">Obfuscated Apples</a><br></p>
<p>2021-08-11 <a href="obfuscated_apples.html">Obfuscated Apples</a><br></p>
<footer>
Sarah Jamie Lewis
</footer>

View File

@ -1,47 +1,47 @@
# Obfuscated Apples
Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise.
Generating noise in a way which is indistinguishable from real signal is a ridiculously hard problem. Obfuscation does not hide signal, it only adds noise@@^.
<p class="sidenote">if you take anything away from this article please let
it be this fact.</p>
Sadly, most people operate under the assumption that adding noise to a system is all that it takes to make the
signal unrecoverable. This logic is very clearly in operation in Apple's new proposal for on-device scanning
signal unrecoverable. This logic is very clearly in operation in Apple's new proposal for on-device scanning@@^
<a class="sidenote" href="https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf">technical summary</a> which, among other things, proposes generating *synthetic* matches to hide the true number of *real* matches in the system.
I want to take this opportunity to break down how this kind of obfuscation can be defeated even when not considering
the fact that it is **Apple themselves who are charged with generating and maintaining the safety parameters of the system**.
the fact that it is **Apple themselves who are charged with generating and maintaining the safety parameters of the system**@@^.
<p class="sidenote">i.e. even if we treat the people who design and build this system as honest adversaries.</p>
## Sketching a Basic Scheme
For the sake of clarity I will omit the technical details of the private set intersection protocol, and the threshold
scheme, and we will operate under the assumption that both are cryptographically secure. We will also
assume that the database of images to compare is *incorruptible* <span class="sidenote">This is clearly not the
assume that the database of images to compare is *incorruptible*@@^ <span class="sidenote">This is clearly not the
case.</span>.
At the heart of system is a (mostly) black box that contains a **perceptual** hash function that analyzes an image and
spits out a hash, this hash is then compared against a database of known hashes and if a match is found the system
reports `true` and otherwise reports `false`. <span class="sidenote">As we will see later on, perceptual hashes
reports `true` and otherwise reports `false`@@^. <span class="sidenote">As we will see later on, perceptual hashes
are **not** cryptographic hashes.<span>
Throughout this article I will use the term **match** when talking about both true and false positives, though I will
mostly assume any matches are false positives.
According to documentation provided by Apple, the server learns any matches occurred on the phone during the PSI protocol.
According to documentation provided by Apple, the server learns any matches occurred on the phone during the PSI protocol.@@^
<blockquote class="sidenote">"The output of PSI protocol on the server reveals whether there is a match or not" - Apple Technical Summary</blockquote>
When a certain threshold of matches are reached, the server gains the ability to decrypt all data associated , a human
reviews that data, and a determination is made.
As presented the system above has one major flaw <span class="sidenote">(besides the gross nature of co-opting a personal device
As presented the system above has one major flaw@@^ <span class="sidenote">(besides the gross nature of co-opting a personal device
as a surveillance system)</span>: the server learns how many matches the device has reported prior to being able to
decrypt those matches.
This is obviously very important metadata in the context of the system and as such needs to be
protected - if it is not then Apple, or someone who can compel Apple to release the data, can identify potential
targets based on this metadata. <span class="sidenote">As we shall soon discuss targeting people in way would be highly
targets based on this metadata.@@^ <span class="sidenote">As we shall soon discuss targeting people in way would be highly
irrational if your goal was to actually hunt people doing harm, but people are not rational actors.</span>
To protect this data Apple relies on the invocation of so-called "Synthetic Vouchers" with the following property:
@ -58,7 +58,7 @@ values for.
There is the threshold $t$ of matches necessary to decrypt the data, there is the probability of a device generating
a synthetic match $P(\texttt{synthetic})$ and there is the probability of a false positive match $P(\texttt{falsepositive})$.
We also know that Apple has constructed these parameters such that the probability of an account being flagged for human review (i.e. when number of matches $M > t$ is $P(flag) = 1\mathrm{e}^{-12}$ or one in one trillion.
We also know that Apple has constructed these parameters such that the probability of an account being flagged for human review (i.e. when number of matches $M > t$ is $P(flag) = 1\mathrm{e}^{-12}$ or one in one trillion.@@^
<blockquote class="sidenote">"The threshold is selected to provide an extremely low (1 in 1 trillion) probability of incorrectly flagging a given account." - Apple Technical Summary</blockquote>
@ -73,7 +73,7 @@ like a good place to start.
Assuming an average account generates 3-4 pictures a day to be checked then $T \approx {1278}$ over a year. Plugging in those
numbers, and we get $P(\texttt{falsepositive}) \approx 0.00035$ or **1 in 2858**.
Does that number have any relation to reality? There is evidence to suggest <span class="sidenote"><a href="https://arxiv.org/abs/2106.09820">Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning.</a> Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye</span> that the false acceptance rate for
Does that number have any relation to reality? There is evidence@@^ to suggest <span class="sidenote"><a href="https://arxiv.org/abs/2106.09820">Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning.</a> Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye</span> that the false acceptance rate for
common perceptual hashing algorithms is between 0.001-0.01 for a database size of 500K.
That makes our guesstimate of 0.00035 an order of magnitude smaller than the most generous empirical estimate. We will
@ -94,7 +94,7 @@ Or, a match on average every 57 days.
At this point I feel compelled to point out that these are **average** match probabilities. For the prolific photo
taking parent who takes 18250 photos a year, the probability that they actually exceed the threshold in false matches
is 6% <span class="sidenote">assuming $t$ is 10</span>.
is 6%@@^ <span class="sidenote">assuming $t$ is 10</span>.
It is also worth mentioning that even though we ballparked $t$ and $T$ there are explicit constraints on what their
values can be. If Apple generates a single $t$ for all accounts, then $T$ needs to be an approximation on the average
@ -171,7 +171,7 @@ That is a 63% probability that any reported match is a real match and not a synt
If Apple define a global $P(\texttt{synthetic})$ then different accounts will
naturally have different server-side distributions of observations, and these can be used to tighten the estimates of
true matches. <p class="sidenote">And, again, if Apple can define $P(\texttt{synthetic})$ on a per-account basis then
true matches.@@^ <p class="sidenote">And, again, if Apple can define $P(\texttt{synthetic})$ on a per-account basis then
they have **more** information to use when tightening these estimates</p>
The secrecy of this metadata is then is **paradoxically dependent** on both Apple never deriving $P(\texttt{observation})$ for themselves *and* on Apple generating a distinct $P(\texttt{synthetic})$ for each account. Or rather, the privacy of one of the most sensitive aspects of this system requires Apple both collecting no information on accounts, and also on Apple knowing enough about accounts to derive the parameters necessary to keep the information private.
@ -180,7 +180,7 @@ The secrecy of this metadata is then is **paradoxically dependent** on both Appl
It is actually much worse than that though.
While priors may start as an unknown <span class="sidenote">(setting aside the fact that Apple already has enough data to derive this themselves)</span>, Apple quickly generates a large amount of data relating to when new observations
While priors may start as an unknown@@^ <span class="sidenote">(setting aside the fact that Apple already has enough data to derive this themselves)</span>, Apple quickly generates a large amount of data relating to when new observations
are made. Since people are different in the ways these take and store photos and live in different parts of the world,
the exact probability of them triggering a check is dependent on them.
@ -222,7 +222,7 @@ That is interesting metadata to countless entities including the law enforcement
jurisdictions and states.
Even if we strictly limit the type of material that Apple is searching for, the high likelihood of false positive events
combined with the ease at which Apple can likely distinguish true matching events from synthetic events <span class="sidenote">
combined with the ease at which Apple can likely distinguish true matching events from synthetic events@@^ <span class="sidenote">
(as worked through above)</span> should concern any potential subject of the system.
**Innocence is no defense against judgements made using derived metadata**.

4
ssb
View File

@ -86,9 +86,9 @@ function make_html_files
file_base=`basename $md_file .md`
output_file="$OUTPUT_DIR/$file_base.html"
post_title=`grep -m 1 "^# .*" $md_file | cut -c 3-`
description=`head -3 $md_file | tail -1`
description=`head -3 $md_file | tail -1 | rev | cut -c 5- | rev`
append_posts_list $posts | cat $md_file - | $MD_RENDERER > $output_file
cat $HEADER_PATH $output_file $FOOTER_PATH | sed -e "s/\$TITLE/$post_title/g" | sed -e "s/\$LINK/$file_base/g" | sed -e "s/\$DESCRIPTION/$description/g" | tee $output_file
cat $HEADER_PATH $output_file $FOOTER_PATH | sed -e "s/\$TITLE/$post_title/g" | sed -e "s/\$LINK/$file_base/g" | sed -e "s/\$DESCRIPTION/$description/g" | sed -e "s/@@^/<em class=\"footnotelabel\"><\/em>/g" | tee $output_file
done
}

View File

@ -19,6 +19,40 @@ article {
margin-left: 0px;
margin-right: auto;
max-width: 60%;
counter-reset: footnotes;
counter-reset: footnotelabel;
}
.sidenote {
counter-increment: footnotes; /* 1 */
}
.footnotelabel {
counter-increment: footnotelabel; /* 1 */
}
.footnotelabel::after {
content: '[' counter(footnotelabel) ']'; /* 1 */
vertical-align: super; /* 2 */
font-size: 0.5em; /* 3 */
margin-left: 2px; /* 4 */
color: #aaa; /* 5 */
}
.sidenote {
content: '[' counter(footnotes) ']'; /* 1 */
vertical-align: super; /* 2 */
font-size: 0.5em; /* 3 */
margin-left: 2px; /* 4 */
color: #aaa; /* 5 */
}
.sidenote::before {
content: '[' counter(footnotes) ']'; /* 1 */
vertical-align: super; /* 2 */
font-size: 0.5em; /* 3 */
margin-left: 2px; /* 4 */
color: #aaa; /* 5 */
}
.sidenote {