Table of Contents

Persistence

Secondary Requirements

Investigating Implementations

Generic Database Solutions
sqlite

A note on room

H2 (as used in Briar)

Niche Databases
Chestnut
Rejected Options

Channel Structure
Open Questions
Hierarchical Message Database Design

Common Queries

Persistence

Introduction

Cwtch currently makes limited use of custom encrypted storage to store profile structs and conversation timelines.

Every time a Profile is update (new contacts, new messages, new attributes etc.) the entire structure must be serialized and encrypted and re-written to disk. Because profiles tend to be limited in size, this performance is currently acceptable.

For conversation timelines, each new message is appended to an encrypted log which is occasionally rotated, and the oldest messages are deleted entirely. This implementation as several drawbacks:

In order to be accessible, the entire conversation is loaded into memory at startup.
Updating any attributes associated with a message requires re-encrypting the entire conversation.

Further, our current storage engine is built upon the Cwtch event bus, rather than being built-in into our core Cwtch Peer object. This means that at any given time at least 2 versions of a profile are loaded into memory, including conversation timelines. This requires non-trivial code updates to ensure that each store is synced and is an overall inefficient use of memory.

Finally, we have a desire to support more exotic conversation structures within Cwtch in the near future (e.g. lists, bulletin boards, anonymous voting etc.) and the current storage engine is incompatible with initial designs.

The purpose of this document is to explore requirements for a new storage engine, investigate potential implementations and document the eventual decision.

Requirements for a New Storage Engine

The following are core requirements that any solution *must meet:

Cross Platform - Cwtch currently runs on Windows, Mac, Linux and Android. Any solution needs to run on those operating systems at a minimum. We also require that any bindings be available in Go and Rust.
Free Software - We don't use proprietary software in Cwtch.
Daemon-less - The database itself must be able to run as part of a larger application and must not require the existence of a server, daemon or any external interface.

Secondary Requirements

The following are required for the final solution, but may be obtained using application-level code and do not necessarily need to be part of the core database:

Encrypted - it is essential that any adversary with file system or disk read permissions be unable to access Cwtch data.
Incremental - storing small updates to stored structures should not require large updates i.e. re-encryption of previously stored data should be rare and limited to special events (e.g. password changes)
Memory Efficient - it should be possible to load small subsets of stored structures into memory without requiring the entire structure to be decrypted/deserialized.

Note: While physical adversaries are not considered in-scope in the Cwtch risk model we do make it a goal that such attacks should not be trivial.

Investigating Implementations

We can split our considerations into two categories: generic database implementations requiring a tailored solution, and niche databases that we could mold our own solution around.

Generic Database Solutions

sqlite

Requirement	Supports
Cross Platform	Yes
Free Software	Yes
Daemonless	Yes
Encrypted	Page-based, via extension
Incremental	Custom
Memory Efficient	Custom

We already use sqlite in the Cwtch server implementation. It is cross-platform and memory efficient.

There exist extensions such as SqlCipher (go bindings: https://github.com/xeodou/go-sqlcipher) which provides 256 bit AES encryption of database files.

In SqlCipher encryption is based on database-pages, rather than rows or records. SqlCipher manages access and freeing of plaintext pages. The actual encryption is provided by well-exercised libraries like libcrypto.

As with all generic solutions, basing the new storage engine on sqlite would require us to invest time in database and structure design to ensure performance and future extensibility.

A note on room

Requirement	Supports
Cross Platform	No
Free Software	Yes
Daemonless	Yes
Encrypted	No
Incremental	N/A
Memory Efficient	N/A

Recently, many platforms have discouraged developers from directly engaging with sqlite directly e.g. Android now directs developers to use their sqlite wrapper Room as it provides compile time safety checks.

Room is not Cross-platform, nor Encrypted and as such we will not consider it further.

H2 (as used in Briar)

We include a brief analysis of the H2 DBMS, as it is the primary database engine for Briar.

Requirement	Supports
Cross Platform	See Notes
Free Software	Yes
Daemonless	Yes
Encrypted	File based
Incremental	N/A
Memory Efficient	N/A

Briar breaks down their app into several distinct tables and queries e.g. messages, groups, statuses linked by secondary keys and indexes.

Note: Experimental bindings exist for Go, but not Rust. This would be unstable as an option for Cwtch.

Niche Databases

Chestnut

Requirement	Supports
Cross Platform	Yes
Free Software	Yes
Daemonless	Yes
Encrypted	Record-based
Incremental	N/A
Memory Efficient	N/A

Chestnut is an encrypted nosql storage for Go which supports multiple different backend database. Supports saving and loading of tagged structs. We would need to use Chestnut with a Bolt backend, which does support Android.

Drawbacks: nosql means no queries, the project is also very new with few (if any) real world users. We might find support documentation lacking.

Rejected Options

The following is a longer list of solutions that were briefly considered, but ultimately quickly rejected for failure to meet one of our core requirements:

bolt/bbolt - key/value database, nosql, no built-in encryption
immudb - all rows immutable, focused on client, checking cryptographic integrity of rows.
NutsDB - nosql database, no built-in encryption, no Android support.
scribble - not cross-platform, not maintained.
tiedot - nosql, focused on document storage, no built-in encryption, not maintained.

Channel Structure

Chat - (mostly) linear flow of messages forming a conversation.
List - hierarchical todo list with task states, sub tasks.
Bulletin - hierarchical discussion forum with threads and sub-threads
anonymous voting - protocol based on public bulletin board with encrypted messages

Open Questions

Do we want one table per conversation OR one master table for each profile?

One table per conversations likely means better performance

Hierarchical Message Database Design

Each Profile has a number of named tables, one for each conversation.

We will denote this table: profile.kv

Column	Type	Description
KeyType	string	Application Specific
Key	string	Reference to ID
Value	blob	Application Specific

We will denote this table: conversations

Column	Type	Description
ID	integer	Primary Key / AI
Handle	text	Contact Public Key / Group ID
Attributes	blob	K/V Store
ACL	blob	TBD

We will denote this table: <conversation_id>.channels

Column	Type	Description
ID	integer	Primary Key / AI
ChannelName	text	Application Specific
ChannelType	int	Application Specific
Attributes	blob	Application Specific

We will denote this table: <conversation_id>.<channel_id>.chat

Column	Type	Description
ID	integer	Primary Key / AI
Body	text	Application Specific
Attributes	blob	Application Specific
Expiry	DateTime	For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.tree

Column	Type	Description
ID	integer	Primary Key / AI
Parent	integer / key	Reference to ID
Body	blob	Application Specific
Attributes	blob	Application Specific
Expiry	DateTime	For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.list

Column	Type	Description
ID	integer	Primary Key / AI
Parent	integer / key	Reference to ID
SortOrder	integer	Application Specific
Body	blob	Application Specific
Attributes	blob	Application Specific
Expiry	DateTime	For pruning / disappearing messages

We will denote this table: <conversation_id>.<channel_id>.kv

Column	Type	Description
ID	integer	Primary Key / AI
KeyType	string	Application Specific
Key	string	Reference to ID
Value	blob	Application Specific

Common Queries

N Most Recent Messages: SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N;

Select a page M of messages: SELECT * FROM <conversation_id>.<channel_id>.chat order by id desc limit N,M;

Count number of messages SELECT COUNT(*) FROM <conversation_id>.<channel_id>.chat