Arch: File Sharing #110

Closed
opened 2021-07-13 18:12:31 +00:00 by sarah · 3 comments
Owner

Needed to supported images / custom profiles / stickers / attachements.

High level goals (needs quantification):

  • Efficient / verifiable for groups - i.e. peers should not be able to disclose one file to some group members and another file to other group members .
  • Secure in the presence of attribution style attacks e.g. https://github.com/ricochet-im/ricochet/issues/15 (link also includes some prevous design and security consideration work)
  • We could also label this: Metadata resistant in an active setting (i.e. no hosting publicly reachable endpoints, group members should not be able to prove someone is hosting a file).

Some rough ideas:

A mini-bittorrent like protcol where the top hash is published by the sender and then each block is verifiable - for p2p this would collapse to a simple file transer, for groups it would be mean they could distribute the load of hosting the file over time / offline / online

For groups we don't want to do this all in-band because otherwise the server effectively hosts the file, as such this overlay should be strictly limited to p2p.

Another option is to have the peer host a http server over Cwtch with content addressable files - this requires some amount of authentication over who can fetch the file to prevent trivial hosting attribtion attacks.

Needed to supported images / custom profiles / stickers / attachements. High level goals (needs quantification): * Efficient / verifiable for groups - i.e. peers should not be able to disclose one file to some group members and another file to other group members . * Secure in the presence of attribution style attacks e.g. https://github.com/ricochet-im/ricochet/issues/15 (link also includes some prevous design and security consideration work) * We could also label this: Metadata resistant in an active setting (i.e. no hosting publicly reachable endpoints, group members should not be able to prove someone is hosting a file). Some rough ideas: A mini-bittorrent like protcol where the top hash is published by the sender and then each block is verifiable - for p2p this would collapse to a simple file transer, for groups it would be mean they could distribute the load of hosting the file over time / offline / online For groups we don't want to do this all in-band because otherwise the server effectively hosts the file, as such this overlay should be strictly limited to p2p. Another option is to have the peer host a http server over Cwtch with content addressable files - this requires some amount of authentication over *who* can fetch the file to prevent trivial hosting attribtion attacks.
sarah added the
arch
cwtch-beta-1.2
labels 2021-07-13 18:12:31 +00:00
sarah added this to the Cwtch Beta 1.2 project 2021-07-13 18:12:31 +00:00
Owner

some thoughts/notes on

file download support in cwtch

Must support:

  • resumption
    • bittorrent-esque chunking with hash manifests would support this and also allow us to set chunk size under the cwtch message size limits
    • maybe there's a good lib for this?
  • background DL on android via flwtchworker, with status notification
  • peer and group convos
  • progress updates in UI
    • including error states (full disk, manifest doesnt match ticket, stalled/no seeds, etc)
  • "internal" use cases eg we should be able to use this to support custom profile pics

Nice to have:

  • remember old downloads & locations (potential privacy impact)
  • "all downloads" view per convo a la signal
  • images, audio clips (potentially a phase 2 thing)
  • file servers (ls) - point to dir(s), configure auth
  • re-hosting
  • multi-hosting (a la bittorrent)
  • trackers (ask peer for file, get list of rehosters, potentially report progress to allow rehosting incentives)
  • pre-dl thumbnails?
  • stats on progress updates, eg speed, peers, chunks, etc

Questions

  • anonymity and metadata impacts?
  • should DLs be authenticated, or use new/ephemeral cnxns?
  • may want to interact with blocking?
  • DoS risks?

Config

  • save to folder x or ask
  • auto-dl
  • image previews/thumbnails
  • auto-rehost in groups (file size limit?)

Misc

  • dl "ticket" overlays shouldnt be solely file hash-based, or adversaries will be able to make blind queries for known files (ticket=hash+nonce?)
  • chunk hashes should be verifiable against master hash to prevent malicious rehosters sending bad manifests or bad chunks
  • should validate hashes on completion
  • since we cant (yet) attach custom strings to message history, i suggest we have a "downloads" subdir that contains files and/or shortcuts to external file locations
    • can also have a ".part" version for manifests and unfinished DLs to support resumption
  • auto-rehost would make it easy to add a "rehost bot" to a convo

Design

Sending a file: should present a file picker, and then post a "download" overlay.

For rehosting, can repost download overlay from yourself (maybe with cumulative tracker/rehoster list?). Makes downloading easier for cases where original sender isnt always online.

If we support torrentiness, can rehost immediately (before dl is complete)

file "ticket" overlays

{
   o: ##,
   d: filename_suggestion.ext,
   h: master-hash,
   n: nonce,
   s: size_in_bytes,
   t?: optional tracker/rehoster info,
}

download protocol

  • connect?
  • send h||n, get file manifest of chunk hashes
  • optionally, get list of rehosters
  • send h||n||chunk-request, get chunks
    • for bittorrentiness, some chunks may not be available right away, so expect to need to make re-requests
    • could report "unavailable" chunks to display in UI

chunk request format

  • single chunks, eg: 71
  • ranges, eg: 0:7295
  • reverse ranges? eg: 7295:0 (for "meet in the middle")
  • concats, eg: 0:5,7,9:15,7295:20
  • order randomizer, eg: r0:5,7 (makes rehosting/torrentiness more efficient; only one "r" per chunk-request)
some thoughts/notes on ## file download support in cwtch ### Must support: * resumption * bittorrent-esque chunking with hash manifests would support this and also allow us to set chunk size under the cwtch message size limits * maybe there's a good lib for this? * background DL on android via flwtchworker, with status notification * peer and group convos * progress updates in UI * including error states (full disk, manifest doesnt match ticket, stalled/no seeds, etc) * "internal" use cases eg we should be able to use this to support custom profile pics ### Nice to have: * remember old downloads & locations (potential privacy impact) * "all downloads" view per convo a la signal * images, audio clips (potentially a phase 2 thing) * file servers (ls) - point to dir(s), configure auth * re-hosting * multi-hosting (a la bittorrent) * trackers (ask peer for file, get list of rehosters, potentially report progress to allow rehosting incentives) * pre-dl thumbnails? * stats on progress updates, eg speed, peers, chunks, etc ### Questions * anonymity and metadata impacts? * should DLs be authenticated, or use new/ephemeral cnxns? * may want to interact with blocking? * DoS risks? ### Config * save to folder x or ask * auto-dl * image previews/thumbnails * auto-rehost in groups (file size limit?) ### Misc * dl "ticket" overlays shouldnt be solely file hash-based, or adversaries will be able to make blind queries for known files (ticket=hash+nonce?) * chunk hashes should be verifiable against master hash to prevent malicious rehosters sending bad manifests or bad chunks * should validate hashes on completion * since we cant (yet) attach custom strings to message history, i suggest we have a "downloads" subdir that contains files and/or shortcuts to external file locations * can also have a ".part" version for manifests and unfinished DLs to support resumption * auto-rehost would make it easy to add a "rehost bot" to a convo ---- ## Design Sending a file: should present a file picker, and then post a "download" overlay. For rehosting, can repost download overlay from yourself (maybe with cumulative tracker/rehoster list?). Makes downloading easier for cases where original sender isnt always online. If we support torrentiness, can rehost immediately (before dl is complete) ### file "ticket" overlays ``` { o: ##, d: filename_suggestion.ext, h: master-hash, n: nonce, s: size_in_bytes, t?: optional tracker/rehoster info, } ``` ### download protocol * connect? * send h||n, get file manifest of chunk hashes * optionally, get list of rehosters * send h||n||chunk-request, get chunks * for bittorrentiness, some chunks may not be available right away, so expect to need to make re-requests * could report "unavailable" chunks to display in UI ### chunk request format * single chunks, eg: 71 * ranges, eg: 0:7295 * reverse ranges? eg: 7295:0 (for "meet in the middle") * concats, eg: 0:5,7,9:15,7295:20 * order randomizer, eg: r0:5,7 (makes rehosting/torrentiness more efficient; only one "r" per chunk-request)
Owner

notes off the top of my head:

consent: partly related to DoS prevention, but in general, file is for 1-n ppl to download only, only those approve (contact or group).

Q: how does that interact with rehosting wrt groups.

to some extent yeah once we send someone a file they can do with it whatever they want... (but at least hte ui doesnt have to support that easily)


we probably want some logic if we're doing bittorrent like downloading, like again, a) whats a sane default for chunk size, followed by what # chunks or file size do we stick to downloading from host and at what point are we more likely to enquire with group memebers

also if the orig host is offline and it's a group context then yeah depending on size threshold just pick one at random or grab from multiple


"file servers" under nice to have, again this may need some relational thought to group contracts and how "open" something is vs how restricted.


pre-dl thumbnails? - we can prolly get some basic image file type images from marcia for like doc, image, audio etc to start with ^_^

notes off the top of my head: consent: partly related to DoS prevention, but in general, file is for 1-n ppl to download only, only those approve (contact or group). Q: how does that interact with rehosting wrt groups. to some extent yeah once we send someone a file they can do with it whatever they want... (but at least hte ui doesnt have to support that easily) ---- we probably want some logic if we're doing bittorrent like downloading, like again, a) whats a sane default for chunk size, followed by what # chunks or file size do we stick to downloading from host and at what point are we more likely to enquire with group memebers also if the orig host is offline and it's a group context then yeah depending on size threshold just pick one at random or grab from multiple --- "file servers" under nice to have, again this may need some relational thought to group contracts and how "open" something is vs how restricted. --- pre-dl thumbnails? - we can prolly get some basic image file type images from marcia for like doc, image, audio etc to start with ^_^
Author
Owner

Thinking more about this today in regards to implementation (do not take anything here as design gospel this is mostly for my own reference):

It would be nice to be torrent-compatible, but most torrent libraries are heavily embedded into the http/ip dynamic and we have no hope of disetangling that enough (and I'm not comfortable embedding such a library in Cwtch)

Really there are 2 main usecase here and they are pretty distinct:

  1. inline relevant transfers (images, audio) designed to be conversation-specific
  2. externally relevant transfers (documents etc.) designed to be useful outside of the current conversation, likely to be larger (although not necessarily).

Everyone needs to be able to download (1) in order to participate in the conversation and so chunking and distributing makes very little sense in that case (in theory at least) . The file is likely small enough to be downloaded via a single connection and that connection is likely to remain online during the conversation.

(2) lends itself more to the torrent approach, kinda falls down when it comes to practicality i.e. one person in a group hosting a file server is almost certainly more efficient although not the most robust.

Makes me want to revisit a bot-orented group design where we always assume a bot is online for management and authorization. Even if the bot only acted as a "tracker" to allow people to efficiently find chunks (rather than host the file itself) it would greatly improve the efficiency and discoverability and we could probably layer it onto the current design in a graceful way.

Thinking more about this today in regards to implementation (do not take anything here as design gospel this is mostly for my own reference): It would be nice to be torrent-compatible, but most torrent libraries are heavily embedded into the http/ip dynamic and we have no hope of disetangling that enough (and I'm not comfortable embedding such a library in Cwtch) Really there are 2 main usecase here and they are pretty distinct: 1) inline relevant transfers (images, audio) designed to be conversation-specific 2) externally relevant transfers (documents etc.) designed to be useful outside of the current conversation, likely to be larger (although not necessarily). Everyone needs to be able to download (1) in order to participate in the conversation and so chunking and distributing makes very little sense in that case (in theory at least) . The file is likely small enough to be downloaded via a single connection and that connection is likely to remain online during the conversation. (2) lends itself more to the torrent approach, kinda falls down when it comes to practicality i.e. one person in a group hosting a file server is almost certainly more efficient although not the most robust. Makes me want to revisit a bot-orented group design where we always assume a bot is online for management and authorization. Even if the bot only acted as a "tracker" to allow people to efficiently find chunks (rather than host the file itself) it would greatly improve the efficiency and discoverability and we could probably layer it onto the current design in a graceful way.
sarah closed this issue 2021-08-31 18:13:45 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cwtch.im/cwtch-ui#110
No description provided.