1 Storage DRAFT
Sarah Jamie Lewis edited this page 2021-10-21 20:19:56 +00:00

Persistence

Introduction

Cwtch currently makes limited use of custom encrypted storage to store profile structs and conversation timelines.

Every time a Profile is update (new contacts, new messages, new attributes etc.) the entire structure must be serialized and encrypted and re-written to disk. Because profiles tend to be limited in size, this performance is currently acceptable.

For conversation timelines, each new message is appended to an encrypted log which is occasionally rotated, and the oldest messages are deleted entirely. This implementation as several drawbacks:

  • In order to be accessible, the entire conversation is loaded into memory at startup.
  • Updating any attributes associated with a message requires re-encrypting the entire conversation.

Further, our current storage engine is built upon the Cwtch event bus, rather than being built-in into our core Cwtch Peer object. This means that at any given time at least 2 versions of a profile are loaded into memory, including conversation timelines. This requires non-trivial code updates to ensure that each store is synced and is an overall inefficient use of memory.

Finally, we have a desire to support more exotic conversation structures within Cwtch in the near future (e.g. lists, bulletin boards, anonymous voting etc.) and the current storage engine is incompatible with initial designs.

The purpose of this document is to explore requirements for a new storage engine, investigate potential implementations and document the eventual decision.

Requirements for a New Storage Engine

The following are core requirements that any solution *must meet:

  • Cross Platform - Cwtch currently runs on Windows, Mac, Linux and Android. Any solution needs to run on those operating systems at a minimum. We also require that any bindings be available in Go and Rust.
  • Free Software - We don't use proprietary software in Cwtch.
  • Daemon-less - The database itself must be able to run as part of a larger application and must not require the existence of a server, daemon or any external interface.

Secondary Requirements

The following are required for the final solution, but may be obtained using application-level code and do not necessarily need to be part of the core database:

  • Encrypted - it is essential that any adversary with file system or disk read permissions be unable to access Cwtch data.
  • Incremental - storing small updates to stored structures should not require large updates i.e. re-encryption of previously stored data should be rare and limited to special events (e.g. password changes)
  • Memory Efficient - it should be possible to load small subsets of stored structures into memory without requiring the entire structure to be decrypted/deserialized.

Note: While physical adversaries are not considered in-scope in the Cwtch risk model we do make it a goal that such attacks should not be trivial.

Investigating Implementations

We can split our considerations into two categories: generic database implementations requiring a tailored solution, and niche databases that we could mold our own solution around.

Generic Database Solutions

sqlite

Requirement Supports
Cross Platform Yes
Free Software Yes
Daemonless Yes
Encrypted Page-based, via extension
Incremental Custom
Memory Efficient Custom

We already use sqlite in the Cwtch server implementation. It is cross-platform and memory efficient.

There exist extensions such as SqlCipher (go bindings: https://github.com/xeodou/go-sqlcipher) which provides 256 bit AES encryption of database files.

In SqlCipher encryption is based on database-pages, rather than rows or records. SqlCipher manages access and freeing of plaintext pages. The actual encryption is provided by well-exercised libraries like libcrypto.

As with all generic solutions, basing the new storage engine on sqlite would require us to invest time in database and structure design to ensure performance and future extensibility.

A note on room

Requirement Supports
Cross Platform No
Free Software Yes
Daemonless Yes
Encrypted No
Incremental N/A
Memory Efficient N/A

Recently, many platforms have discouraged developers from directly engaging with sqlite directly e.g. Android now directs developers to use their sqlite wrapper Room as it provides compile time safety checks.

Room is not Cross-platform, nor Encrypted and as such we will not consider it further.

H2 (as used in Briar)

We include a brief analysis of the H2 DBMS, as it is the primary database engine for Briar.

Requirement Supports
Cross Platform See Notes
Free Software Yes
Daemonless Yes
Encrypted File based
Incremental N/A
Memory Efficient N/A

Briar breaks down their app into several distinct tables and queries e.g. messages, groups, statuses linked by secondary keys and indexes.

Note: Experimental bindings exist for Go, but not Rust. This would be unstable as an option for Cwtch.

Niche Databases

Chestnut

Requirement Supports
Cross Platform Yes
Free Software Yes
Daemonless Yes
Encrypted Record-based
Incremental N/A
Memory Efficient N/A

Chestnut is an encrypted nosql storage for Go which supports multiple different backend database. Supports saving and loading of tagged structs. We would need to use Chestnut with a Bolt backend, which does support Android.

Drawbacks: nosql means no queries, the project is also very new with few (if any) real world users. We might find support documentation lacking.

Rejected Options

The following is a longer list of solutions that were briefly considered, but ultimately quickly rejected for failure to meet one of our core requirements:

  • bolt/bbolt - key/value database, nosql, no built-in encryption
  • immudb - all rows immutable, focused on client, checking cryptographic integrity of rows.
  • NutsDB - nosql database, no built-in encryption, no Android support.
  • scribble - not cross-platform, not maintained.
  • tiedot - nosql, focused on document storage, no built-in encryption, not maintained.

Channel Structure

  • Chat - (mostly) linear flow of messages forming a conversation.
  • List - hierarchical todo list with task states, sub tasks.
  • Bulletin - hierarchical discussion forum with threads and sub-threads
  • anonymous voting - protocol based on public bulletin board with encrypted messages

Open Questions

  1. Do we want one table per conversation OR one master table for each profile?

One table per conversations likely means better performance

Hierarchical Message Database Design

Each Profile has a number of named tables, one for each conversation.

We will denote this table: profile.kv

Column Type Description
KeyType string Application Specific
Key string Reference to ID
Value blob Application Specific

We will denote this table: conversations

Column Type Description
ID integer Primary Key / AI
Handle text Contact Public Key / Group ID
Attributes blob K/V Store
ACL blob TBD

We will denote this table: <conversation_id>.channels

Column Type Description
ID integer Primary Key / AI
ChannelName text Application Specific
ChannelType int Application Specific
Attributes blob Application Specific

We will denote this table: <conversation_id>.<channel_id>.chat

Column Type Description
ID integer Primary Key / AI
Body text Application Specific
Attributes blob Application Specific
Expiry DateTime For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.tree

Column Type Description
ID integer Primary Key / AI
Parent integer / key Reference to ID
Body blob Application Specific
Attributes blob Application Specific
Expiry DateTime For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.list

Column Type Description
ID integer Primary Key / AI
Parent integer / key Reference to ID
SortOrder integer Application Specific
Body blob Application Specific
Attributes blob Application Specific
Expiry DateTime For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.kv

Column Type Description
ID integer Primary Key / AI
KeyType string Application Specific
Key string Reference to ID
Value blob Application Specific

Common Queries

N Most Recent Messages: SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N;

Select a page M of messages: SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N,M;

Count number of messages SELECT COUNT(*) FROM <conversation_id>.<channel_id>.chat