fuzzytags-book/src/simulation-college-im.md


In this section we will document simulations performed on the College Msg Core dataset (details below). In particular, we assess the worst-case scenario of a server with access to a sender-oracle (i.e. able to attribute tags to a
particular sender) to understand how much information is leaked by fuzzytags without [appropriate deployment mitigations.](./deploying-fuzzytags.md)

# College IM Dataset Simulations

    Nodes 	1899
    Temporal Edges 	59835
    Time span 	193 days

Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. "Patterns and dynamics of users' behavior and interaction: Network analysis of an online community." Journal of the American Society for Information Science and Technology 60.5 (2009): 911-932.

![](./simulations/college-actual.jpeg)

## Scenario 1

Setup:  20k events (7330 links). False positive rates: [0.007812, 0.5]. No entangling.

Result: Server can identify ~4.3% of original graph (313 links) with a 12% false positive rate at threshold: 0.0001.

![](./simulations/college-derived.jpeg)

## Scenario 2

Setup:  20k events (7330 links). False positive rates: [0.007812, 0.5]. Every tag entangled to one random node (as before).

Result: Server can identify ~3.95% of original graph (290 links) with a ~15% false positive rate.

![](./simulations/college-derived-entangled.jpeg)

# Discussion

A very similar result to our observations on the EU Core email dataset, entangled tags increase the false positive
rate, although overall it requires non-naive entangling strategies to push the false positive rate of the derived graph
to a place where it would not be useful for an adversary.