Add 'Storage DRAFT'
parent
dedf8970d4
commit
f32a678c38
|
@ -0,0 +1,243 @@
|
|||
# Persistence
|
||||
|
||||
## Introduction
|
||||
|
||||
Cwtch currently makes limited use of custom encrypted storage to store profile structs and conversation timelines.
|
||||
|
||||
Every time a Profile is update (new contacts, new messages, new attributes etc.) the entire structure must be
|
||||
serialized and encrypted and re-written to disk. Because profiles tend to be limited in size, this performance is
|
||||
currently acceptable.
|
||||
|
||||
For conversation timelines, each new message is appended to an encrypted log which is occasionally rotated, and the oldest
|
||||
messages are deleted entirely. This implementation as several drawbacks:
|
||||
|
||||
- In order to be accessible, the entire conversation is loaded into memory at startup.
|
||||
- Updating any attributes associated with a message requires re-encrypting the entire conversation.
|
||||
|
||||
|
||||
Further, our current storage engine is built upon the Cwtch event bus, rather than being built-in into our core Cwtch Peer
|
||||
object. This means that at any given time at least 2 versions of a profile are loaded into memory, including conversation
|
||||
timelines. This requires non-trivial code updates to ensure that each store is synced and is an overall inefficient use
|
||||
of memory.
|
||||
|
||||
Finally, we have a desire to support more exotic conversation structures within Cwtch in the near future (e.g.
|
||||
lists, bulletin boards, anonymous voting etc.) and the current storage engine is incompatible with initial
|
||||
designs.
|
||||
|
||||
The purpose of this document is to explore requirements for a new storage engine, investigate potential implementations
|
||||
and document the eventual decision.
|
||||
|
||||
## Requirements for a New Storage Engine
|
||||
|
||||
The following are core requirements that any solution **must* meet:
|
||||
|
||||
- **Cross Platform** - Cwtch currently runs on Windows, Mac, Linux and Android. Any solution needs to run on
|
||||
those operating systems at a minimum. We also require that any bindings be available in Go and Rust.
|
||||
- **Free Software** - We don't use proprietary software in Cwtch.
|
||||
- **Daemon-less** - The database itself must be able to run as part of a larger application and must not require the
|
||||
existence of a server, daemon or any external interface.
|
||||
|
||||
### Secondary Requirements
|
||||
|
||||
The following are required for the final solution, but may be obtained using application-level code and do not
|
||||
necessarily need to be part of the core database:
|
||||
|
||||
- **Encrypted** - it is essential that any adversary with file system or disk read permissions be unable to access Cwtch data.
|
||||
- **Incremental** - storing small updates to stored structures should not require large updates i.e. re-encryption of
|
||||
previously stored data should be rare and limited to special events (e.g. password changes)
|
||||
- **Memory Efficient** - it should be possible to load small subsets of stored structures into memory without
|
||||
requiring the entire structure to be decrypted/deserialized.
|
||||
|
||||
Note: While physical adversaries are [not considered in-scope in the Cwtch risk model](https://docs.openprivacy.ca/cwtch-security-handbook/risk.html#a-note-on-physical-attacks) we do make it a goal that such attacks should not be trivial.
|
||||
|
||||
## Investigating Implementations
|
||||
|
||||
We can split our considerations into two categories: generic database implementations requiring a tailored solution,
|
||||
and niche databases that we could mold our own solution around.
|
||||
|
||||
### Generic Database Solutions
|
||||
|
||||
### sqlite
|
||||
|
||||
| Requirement | Supports |
|
||||
| ----------- | ----------- |
|
||||
| Cross Platform | Yes |
|
||||
| Free Software | Yes |
|
||||
| Daemonless | Yes |
|
||||
| Encrypted | Page-based, via extension |
|
||||
| Incremental | Custom |
|
||||
| Memory Efficient | Custom |
|
||||
|
||||
We already use sqlite in the Cwtch server implementation. It is cross-platform and memory efficient.
|
||||
|
||||
There exist extensions such as SqlCipher (go bindings: https://github.com/xeodou/go-sqlcipher) which
|
||||
provides 256 bit AES encryption of database files.
|
||||
|
||||
In SqlCipher encryption is based on database-pages, rather than rows or records. SqlCipher manages access and freeing of
|
||||
plaintext pages. The actual encryption is provided by well-exercised libraries like libcrypto.
|
||||
|
||||
As with all generic solutions, basing the new storage engine on sqlite would require us to invest time
|
||||
in database and structure design to ensure performance and future extensibility.
|
||||
|
||||
#### A note on room
|
||||
|
||||
| Requirement | Supports |
|
||||
| ----------- | ----------- |
|
||||
| Cross Platform | No |
|
||||
| Free Software | Yes |
|
||||
| Daemonless | Yes |
|
||||
| Encrypted | No |
|
||||
| Incremental | N/A |
|
||||
| Memory Efficient | N/A |
|
||||
|
||||
Recently, many platforms have discouraged developers from directly engaging with sqlite directly e.g. Android
|
||||
now [directs developers to use their sqlite wrapper Room](https://developer.android.com/training/data-storage/room) as it
|
||||
provides compile time safety checks.
|
||||
|
||||
Room is not **Cross-platform**, nor **Encrypted** and as such we will not consider it further.
|
||||
|
||||
### H2 (as used in Briar)
|
||||
|
||||
We include a brief analysis of the H2 DBMS, as it is the primary database engine for Briar.
|
||||
|
||||
| Requirement | Supports |
|
||||
| ----------- | ----------- |
|
||||
| Cross Platform | See Notes |
|
||||
| Free Software | Yes |
|
||||
| Daemonless | Yes |
|
||||
| Encrypted | File based |
|
||||
| Incremental | N/A |
|
||||
| Memory Efficient | N/A |
|
||||
|
||||
Briar breaks down their app into several distinct tables and queries e.g. `messages`, `groups`, `statuses` linked
|
||||
by secondary keys and indexes.
|
||||
|
||||
Note: Experimental bindings exist for Go, but not Rust. This would be unstable as an option for Cwtch.
|
||||
|
||||
## Niche Databases
|
||||
|
||||
## Chestnut
|
||||
|
||||
| Requirement | Supports |
|
||||
| ----------- | ----------- |
|
||||
| Cross Platform | Yes |
|
||||
| Free Software | Yes |
|
||||
| Daemonless | Yes |
|
||||
| Encrypted | Record-based |
|
||||
| Incremental | N/A |
|
||||
| Memory Efficient | N/A |
|
||||
|
||||
Chestnut is an encrypted nosql storage for Go which supports multiple different backend database. Supports
|
||||
saving and loading of tagged structs. We would need to use Chestnut with a Bolt backend, which does support Android.
|
||||
|
||||
Drawbacks: nosql means no queries, the project is also very new with few (if any) real world users. We might find support
|
||||
documentation lacking.
|
||||
|
||||
## Rejected Options
|
||||
|
||||
The following is a longer list of solutions that were briefly considered, but ultimately quickly rejected for
|
||||
failure to meet one of our core requirements:
|
||||
|
||||
- bolt/bbolt - key/value database, nosql, no built-in encryption
|
||||
- immudb - all rows immutable, focused on client, checking cryptographic integrity of rows.
|
||||
- NutsDB - nosql database, no built-in encryption, no Android support.
|
||||
- scribble - not cross-platform, not maintained.
|
||||
- tiedot - nosql, focused on document storage, no built-in encryption, not maintained.
|
||||
|
||||
|
||||
# Channel Structure
|
||||
|
||||
- Chat - (mostly) linear flow of messages forming a conversation.
|
||||
- List - hierarchical todo list with task states, sub tasks.
|
||||
- Bulletin - hierarchical discussion forum with threads and sub-threads
|
||||
- anonymous voting - protocol based on public bulletin board with encrypted messages
|
||||
|
||||
# Open Questions
|
||||
|
||||
1) Do we want one table per conversation OR one master table for each profile?
|
||||
|
||||
One table per conversations likely means better performance
|
||||
|
||||
# Hierarchical Message Database Design
|
||||
|
||||
Each Profile has a number of named tables, one for each conversation.
|
||||
|
||||
|
||||
We will denote this table: `profile.kv`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| KeyType | string | Application Specific |
|
||||
| Key | string | Reference to ID |
|
||||
| Value | blob | Application Specific |
|
||||
|
||||
|
||||
We will denote this table: `conversations`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| Handle | text | Contact Public Key / Group ID |
|
||||
| Attributes | blob | K/V Store |
|
||||
| ACL | blob | TBD |
|
||||
|
||||
|
||||
We will denote this table: `<conversation_id>.channels`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| ChannelName | text | Application Specific |
|
||||
| ChannelType | int | Application Specific |
|
||||
| Attributes | blob | Application Specific |
|
||||
|
||||
We will denote this table: `<conversation_id>.<channel_id>.chat`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| Body | text | Application Specific |
|
||||
| Attributes | blob | Application Specific |
|
||||
| Expiry | DateTime | For pruning / disappearing messages |
|
||||
|
||||
We will denote this table: `<conversation_id>.<channel_id>.tree`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| Parent | integer / key| Reference to ID |
|
||||
| Body | blob | Application Specific |
|
||||
| Attributes | blob | Application Specific |
|
||||
| Expiry | DateTime | For pruning / disappearing messages |
|
||||
|
||||
|
||||
We will denote this table: `<conversation_id>.<channel_id>.list`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| Parent | integer / key| Reference to ID |
|
||||
| SortOrder | integer | Application Specific |
|
||||
| Body | blob | Application Specific |
|
||||
| Attributes | blob | Application Specific |
|
||||
| Expiry | DateTime | For pruning / disappearing messages |
|
||||
|
||||
We will denote this table: `<conversation_id>.<channel_id>.kv`
|
||||
|
||||
| Column | Type | Description |
|
||||
| ----------- | ------------ | --------------------- |
|
||||
| ID | integer | Primary Key / AI |
|
||||
| KeyType | string | Application Specific |
|
||||
| Key | string | Reference to ID |
|
||||
| Value | blob | Application Specific |
|
||||
|
||||
## Common Queries
|
||||
|
||||
`N` Most Recent Messages: `SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N;`
|
||||
|
||||
Select a page `M` of messages: `SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N,M;`
|
||||
|
||||
Count number of messages `SELECT COUNT(*) FROM <conversation_id>.<channel_id>.chat`
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue