Add 'Storage DRAFT'

Sarah Jamie Lewis 2021-10-21 20:19:56 +00:00
parent dedf8970d4
commit f32a678c38
1 changed files with 243 additions and 0 deletions

243
Storage-DRAFT.md Normal file

@ -0,0 +1,243 @@
# Persistence
## Introduction
Cwtch currently makes limited use of custom encrypted storage to store profile structs and conversation timelines.
Every time a Profile is update (new contacts, new messages, new attributes etc.) the entire structure must be
serialized and encrypted and re-written to disk. Because profiles tend to be limited in size, this performance is
currently acceptable.
For conversation timelines, each new message is appended to an encrypted log which is occasionally rotated, and the oldest
messages are deleted entirely. This implementation as several drawbacks:
- In order to be accessible, the entire conversation is loaded into memory at startup.
- Updating any attributes associated with a message requires re-encrypting the entire conversation.
Further, our current storage engine is built upon the Cwtch event bus, rather than being built-in into our core Cwtch Peer
object. This means that at any given time at least 2 versions of a profile are loaded into memory, including conversation
timelines. This requires non-trivial code updates to ensure that each store is synced and is an overall inefficient use
of memory.
Finally, we have a desire to support more exotic conversation structures within Cwtch in the near future (e.g.
lists, bulletin boards, anonymous voting etc.) and the current storage engine is incompatible with initial
designs.
The purpose of this document is to explore requirements for a new storage engine, investigate potential implementations
and document the eventual decision.
## Requirements for a New Storage Engine
The following are core requirements that any solution **must* meet:
- **Cross Platform** - Cwtch currently runs on Windows, Mac, Linux and Android. Any solution needs to run on
those operating systems at a minimum. We also require that any bindings be available in Go and Rust.
- **Free Software** - We don't use proprietary software in Cwtch.
- **Daemon-less** - The database itself must be able to run as part of a larger application and must not require the
existence of a server, daemon or any external interface.
### Secondary Requirements
The following are required for the final solution, but may be obtained using application-level code and do not
necessarily need to be part of the core database:
- **Encrypted** - it is essential that any adversary with file system or disk read permissions be unable to access Cwtch data.
- **Incremental** - storing small updates to stored structures should not require large updates i.e. re-encryption of
previously stored data should be rare and limited to special events (e.g. password changes)
- **Memory Efficient** - it should be possible to load small subsets of stored structures into memory without
requiring the entire structure to be decrypted/deserialized.
Note: While physical adversaries are [not considered in-scope in the Cwtch risk model](https://docs.openprivacy.ca/cwtch-security-handbook/risk.html#a-note-on-physical-attacks) we do make it a goal that such attacks should not be trivial.
## Investigating Implementations
We can split our considerations into two categories: generic database implementations requiring a tailored solution,
and niche databases that we could mold our own solution around.
### Generic Database Solutions
### sqlite
| Requirement | Supports |
| ----------- | ----------- |
| Cross Platform | Yes |
| Free Software | Yes |
| Daemonless | Yes |
| Encrypted | Page-based, via extension |
| Incremental | Custom |
| Memory Efficient | Custom |
We already use sqlite in the Cwtch server implementation. It is cross-platform and memory efficient.
There exist extensions such as SqlCipher (go bindings: https://github.com/xeodou/go-sqlcipher) which
provides 256 bit AES encryption of database files.
In SqlCipher encryption is based on database-pages, rather than rows or records. SqlCipher manages access and freeing of
plaintext pages. The actual encryption is provided by well-exercised libraries like libcrypto.
As with all generic solutions, basing the new storage engine on sqlite would require us to invest time
in database and structure design to ensure performance and future extensibility.
#### A note on room
| Requirement | Supports |
| ----------- | ----------- |
| Cross Platform | No |
| Free Software | Yes |
| Daemonless | Yes |
| Encrypted | No |
| Incremental | N/A |
| Memory Efficient | N/A |
Recently, many platforms have discouraged developers from directly engaging with sqlite directly e.g. Android
now [directs developers to use their sqlite wrapper Room](https://developer.android.com/training/data-storage/room) as it
provides compile time safety checks.
Room is not **Cross-platform**, nor **Encrypted** and as such we will not consider it further.
### H2 (as used in Briar)
We include a brief analysis of the H2 DBMS, as it is the primary database engine for Briar.
| Requirement | Supports |
| ----------- | ----------- |
| Cross Platform | See Notes |
| Free Software | Yes |
| Daemonless | Yes |
| Encrypted | File based |
| Incremental | N/A |
| Memory Efficient | N/A |
Briar breaks down their app into several distinct tables and queries e.g. `messages`, `groups`, `statuses` linked
by secondary keys and indexes.
Note: Experimental bindings exist for Go, but not Rust. This would be unstable as an option for Cwtch.
## Niche Databases
## Chestnut
| Requirement | Supports |
| ----------- | ----------- |
| Cross Platform | Yes |
| Free Software | Yes |
| Daemonless | Yes |
| Encrypted | Record-based |
| Incremental | N/A |
| Memory Efficient | N/A |
Chestnut is an encrypted nosql storage for Go which supports multiple different backend database. Supports
saving and loading of tagged structs. We would need to use Chestnut with a Bolt backend, which does support Android.
Drawbacks: nosql means no queries, the project is also very new with few (if any) real world users. We might find support
documentation lacking.
## Rejected Options
The following is a longer list of solutions that were briefly considered, but ultimately quickly rejected for
failure to meet one of our core requirements:
- bolt/bbolt - key/value database, nosql, no built-in encryption
- immudb - all rows immutable, focused on client, checking cryptographic integrity of rows.
- NutsDB - nosql database, no built-in encryption, no Android support.
- scribble - not cross-platform, not maintained.
- tiedot - nosql, focused on document storage, no built-in encryption, not maintained.
# Channel Structure
- Chat - (mostly) linear flow of messages forming a conversation.
- List - hierarchical todo list with task states, sub tasks.
- Bulletin - hierarchical discussion forum with threads and sub-threads
- anonymous voting - protocol based on public bulletin board with encrypted messages
# Open Questions
1) Do we want one table per conversation OR one master table for each profile?
One table per conversations likely means better performance
# Hierarchical Message Database Design
Each Profile has a number of named tables, one for each conversation.
We will denote this table: `profile.kv`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| KeyType | string | Application Specific |
| Key | string | Reference to ID |
| Value | blob | Application Specific |
We will denote this table: `conversations`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| Handle | text | Contact Public Key / Group ID |
| Attributes | blob | K/V Store |
| ACL | blob | TBD |
We will denote this table: `<conversation_id>.channels`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| ChannelName | text | Application Specific |
| ChannelType | int | Application Specific |
| Attributes | blob | Application Specific |
We will denote this table: `<conversation_id>.<channel_id>.chat`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| Body | text | Application Specific |
| Attributes | blob | Application Specific |
| Expiry | DateTime | For pruning / disappearing messages |
We will denote this table: `<conversation_id>.<channel_id>.tree`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| Parent | integer / key| Reference to ID |
| Body | blob | Application Specific |
| Attributes | blob | Application Specific |
| Expiry | DateTime | For pruning / disappearing messages |
We will denote this table: `<conversation_id>.<channel_id>.list`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| Parent | integer / key| Reference to ID |
| SortOrder | integer | Application Specific |
| Body | blob | Application Specific |
| Attributes | blob | Application Specific |
| Expiry | DateTime | For pruning / disappearing messages |
We will denote this table: `<conversation_id>.<channel_id>.kv`
| Column | Type | Description |
| ----------- | ------------ | --------------------- |
| ID | integer | Primary Key / AI |
| KeyType | string | Application Specific |
| Key | string | Reference to ID |
| Value | blob | Application Specific |
## Common Queries
`N` Most Recent Messages: `SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N;`
Select a page `M` of messages: `SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N,M;`
Count number of messages `SELECT COUNT(*) FROM <conversation_id>.<channel_id>.chat`