Init 1
Category: database
Rust code platform - part 1
posted on 2024 Sep 22
Small project idea to have some fun with Rust and try to build something as close as possible to being production-ready. This project also aims to dabble a bit with Rust on the web backend side.
Constraints
- Minimal requirements
- Observability
The idea I was thinking about is a simplified coding platform in the neetcode style. The system will be seeded with a set of problems, users can register, admin users can also upload new problems (although authentication and authorization may be out of scope for the time being). Problems can be run inside self-contained environments (think about Docker containers, maybe providing a subset of pre-built images for each programming language offered, C, C++, Python and Go can be a good starting point).
Tech stack
Rust is the chosen language for this project, I’m not really familiar yet with
the current trends and best practices, last time I checked actix
and sqlx
were the backbone of any backend application. After some recognition work, this
is what I came up with:
- Actix
- Postgres
- Prometheus
- Grafana
We’re probably gonna use Docker and docker-compose
to run the entire environment
locally in dev, it’s a convenient way to emulate a staging/prod environment without
the hassle of deploying stuff to an Heroku dyno or any other cloud provider.
Data model
The main entity in the application is represented by the problems. They will include a title, a description and some metadata such as the category, the difficulty and a skeleton based on the language, something like
{
title: "Two Sum",
description: "Given an array of integers `nums` and an integer `target`, return the integers `i` and `j` such that `nums[i] + nums[j] = target`..",
categories: ["arrays"],
difficulty: "easy",
starting_snippets: {
"c": "int *two_sum(int *nums, size_t len, int target) ..",
"python": "def two_sum(nums, target)..."
},
solved: "false"
}
To start very simple they could be stored in a JSON or YAML static file on
a filesystem, I prefer to store them in the DB to allow easier query
capabilities and at a later stage, extensibility and upload of new problems.
Every problem can have multiple solutions clearly, so that would be another
table to be joined as a has_many
relationship with the problem.
To cut short, a loosely detailed definition of the schemas we’re going to need:
- User
- email: string
- nickname: string
- has_many(Problem)
- Problem
- title: string
- description: string
- categories: list(string)
- difficulty: string
- starting_snippets: map
- test_cases: list(string)
- solved: boolean
- editorial: map
- has_many(Solution)
- Solution
- description: string
- sinppet: string
Keeping things simple as a first draft, we can assume that the images repositories with various metadata (e.g. languages versions, OS etc) can be stored locally in a static JSON or YAML file.
APIs
It will be a pure REST application, no frontend at all, but it should provide all the required to be easily integrated in a JS app.
ASSUMPION we can start with a set of problems and editorial pre-seeded in the system, on a later stage we can decide to increment the functionalities and allow users roles, auth, and upload of new problems, contests maybe and notifications to subscribed users.
User management
- POST
/v1/users
- GET
/v1/users/:user_id
Problems management
- GET
/v1/problems/:problem_id
- GET
/v1/problems
- POST
/v1/problems/:problem_id
Time-series adventures
posted on 2024 Apr 13
Databases, how they interact with the filesystem at low-level, have always represented a fascinating topic for me. I implemented countless in-memory key-value stores at various level of abstraction and various stages of development; with multiple languages, Scala, Python, Elixir, Go. Gradually going more and more in details, it was finally time to try and write something a bit more challenging.
A relational database was too big of a step to begin with and being the persistence the very first big problem I could think of facing almost immediately, I wanted something that could be implemented based on a simple but extremely versatile and powerful structure, the log. I wanted something that could be implemented on top of a this simple concept, in my mind the system should’ve basically been a variation of a Kafka commit log, but instead of forwarding binary chunks to connected consumers (sendfile and how Kafka works internally is another quite interesting topic I explored and I’d like to dig into a bit more in the future); a time-series seemed interesting and fitting my initial thoughts.
The programming language to write it in was the next decision to make, I considered a couple of choices:
- Elixir/Erlang - a marvel, fantastic back-end language, the BEAM is a piece of art, I use Elixir daily at work though so I craved for something of a change my scenery
- Go - great design, simplicity and productivity like no others, also a bit boring too
- Rust - different beast from anything else, requires discipline and perseverance to retain productivity, I like it, but for it’s very nature and focus on safety, I always feel kinda constrained and fighting the compiler more than the problem I’m solving (gotta put some more effort into it to make it click completely). Will certainly pick it up again for something else in the future.
- C++ - Nope, this was a joke, didn’t consider this at all
Eventually I always find myself coming back to a place of comfort with C. I can’t really say the reason, it’s a powerful tool with many dangers and when I used it for my toy projects in the past, I almost never felt enough confidence to say “this could to be used in a prod environment”. I fought with myself on it for many years, learning Rust, flirting with Go, but I eventually gave up and embraced my comfort place. They’re great languages, I used Go for work for a while and Rust seems a good C++ replacement most of time, provided you use it with consistency (borrow checker and lifetimes can be hellish to deal and re-learn if out of shape).
With C, I reckon it all boils down to it’s compactness, conceptually simple and leaving very little to imagination of what’s happening under the hood. I’m perfectly conscious how easy it is to introduce memory-related bugs, and implementing the same dynamic array for each project can get old pretty fast; but what matters to me, ultimately, is having fun, and C provides me with that. Timeseries have always been a fascinating topic for me
Design
Conceptually it’s not really an efficient or particularly smart architecture, I approached the implementation on a best-effort way, the idea being to delay as much as possible the need of non-basic data-structures such as arrays.
The two main segments are just fixed size arrays where each position stores a pointer to the first position of a dynamic array. This to be able to store all the data points included in the time range that the segment covers
Each time a new record is to be inserted, first it is appended as a new entry in the Write-Ahead-Log (WAL) and only then it is store in the in-memory segment where it belongs. The WAL acts as a disaster recovery policy, it is solely responsible for storing incoming records in order to be able to read them and re-populate the segment on restarts.
In short:
- two segments of 15 minutes each
- each position represents a second, so 900 is the length of the segments
- each second points to a dynamic array storing positions based on the microsecond portion of the timestamp of the point
- each time a new point is inserted, it is first stored on a WAL on disk in order to be able to recover in case of crash
- once the main segment is full, or a timestamp too far in the future is received (i.e. more than 15 minutes past the 1st point store)
- the tail segment gets persisted on disk and becomes immutable, WAL is cleared
- the head segment gets persisted on disk and becomes immutable, WAL is cleared
- the head segment becomes the new tail segment
- a new head in-memory segment is generated
- immutable segments on disk are paired with an index file to read records from the past
Although there are plenty of improvements, fixes and other things to take care of, this function is roughly at the heart of the logic (there is a secondary logic in here for which we flush on disk also based on the WAL size, nothing of interest for the purpose of simplicity)
/*
* Set a record in a timeseries.
*
* This function sets a record with the specified timestamp and value in the
* given timeseries. The function handles the storage of records in memory and
* on disk to ensure data integrity and efficient usage of resources.
*
* @param ts A pointer to the Timeseries structure representing the timeseries.
* @param timestamp The timestamp of the record to be set, in nanoseconds.
* @param value The value of the record to be set.
* @return 0 on success, -1 on failure.
*/
int ts_insert(Timeseries *ts, uint64_t timestamp, double_t value) {
// Extract seconds and nanoseconds from timestamp
uint64_t sec = timestamp / (uint64_t)1e9;
uint64_t nsec = timestamp % (uint64_t)1e9;
char pathbuf[MAX_PATH_SIZE];
snprintf(pathbuf, sizeof(pathbuf), "%s/%s/%s", BASE_PATH, ts->db_data_path,
ts->name);
// if the limit is reached we dump the chunks into disk and create 2 new
// ones
if (wal_size(&ts->head.wal) >= TS_FLUSH_SIZE) {
uint64_t base = ts->prev.base_offset > 0 ? ts->prev.base_offset
: ts->head.base_offset;
size_t partition_nr = ts->partition_nr == 0 ? 0 : ts->partition_nr - 1;
if (ts->partitions[partition_nr].clog.base_timestamp < base) {
if (partition_init(&ts->partitions[ts->partition_nr], pathbuf,
base) < 0) {
return -1;
}
partition_nr = ts->partition_nr;
ts->partition_nr++;
}
// Dump chunks into disk and create new ones
if (partition_flush_chunk(&ts->partitions[partition_nr], &ts->prev) < 0)
return -1;
if (partition_flush_chunk(&ts->partitions[partition_nr], &ts->head) < 0)
return -1;
// Reset clean both head and prev in-memory chunks
ts_deinit(ts);
}
// Let it crash for now if the timestamp is out of bounds in the ooo
if (sec < ts->head.base_offset) {
// If the chunk is empty, it also means the base offset is 0, we set
// it here with the first record inserted
if (ts->prev.base_offset == 0)
ts_chunk_init(&ts->prev, pathbuf, sec, 0);
// Persist to disk for disaster recovery
wal_append_record(&ts->prev.wal, timestamp, value);
// If we successfully insert the record, we can return
if (ts_chunk_record_fit(&ts->prev, sec) == 0)
return ts_chunk_set_record(&ts->prev, sec, nsec, value);
}
if (ts->head.base_offset == 0)
ts_chunk_init(&ts->head, pathbuf, sec, 1);
// Persist to disk for disaster recovery
wal_append_record(&ts->head.wal, timestamp, value);
// Check if the timestamp is in range of the current chunk, otherwise
// create a new in-memory segment
if (ts_chunk_record_fit(&ts->head, sec) < 0) {
// Flush the prev chunk to persistence
if (partition_flush_chunk(&ts->partitions[ts->partition_nr],
&ts->prev) < 0)
return -1;
// Clean up the prev chunk and delete it's WAL
ts_chunk_destroy(&ts->prev);
wal_delete(&ts->prev.wal);
// Set the current head as new prev
ts->prev = ts->head;
// Reset the current head as new head
ts_chunk_destroy(&ts->head);
wal_delete(&ts->head.wal);
}
// Insert it into the head chunk
return ts_chunk_set_record(&ts->head, sec, nsec, value);
}
The current state
At the current stage of development, it’s still a very crude core set of features but it seems to be working as expected, with definitely many edge cases and assertions to solve; the heart of the DB is there, and can be built into a dynamic library to be used on a server. The repository can be found at https://github.com/codepr/roach.
Main features
- Fixed size records: to keep things simple each record is represented by just a timestamp with nanoseconds precision and a double
- In memory segments: Data is stored in time series format, allowing efficient querying and retrieval based on timestamp, with the last slice of data in memory, composed by two segments (currently covering 15 minutes of data each)
- The last 15 minutes of data
- The previous 15 minutes for records out of order, totaling 30 minutes
- Commit Log: Persistence is achieved using a commit log at the base, ensuring durability of data on disk.
- Write-Ahead Log (WAL): In-memory segments are managed using a write-ahead log, providing durability and recovery in case of crashes or failures.
What’s in the road map
- Duplicate points policy
- CRC32 of records for data integrity
- Adopt an arena for memory allocations
- Memory mapped indexes, above a threshold enable binary search
- Schema definitions
Timeseries library APIs
tsdb_init(1)
creates a new databasetsdb_close(1)
closes the databasets_create(3)
creates a new Timeseries in a given databasets_get(2)
retrieve an existing Timeseries from a databasets_insert(3)
inserts a new point into the Timeseriests_find(3)
finds a point inside the Timeseriests_range(4)
finds a range of points in the Timeseries, returning a vector with the resultsts_close(1)
closes a Timeseries
Writing a Makefile
A simple Makefile
to build the library as a .so
file that can be linked to any project as an external lightweight dependency or used alone.
CC=gcc
CFLAGS=-Wall -Wextra -Werror -Wunused -std=c11 -pedantic -ggdb -D_DEFAULT_SOURCE=200809L -Iinclude -Isrc
LDFLAGS=-L. -ltimeseries
LIB_SOURCES=src/timeseries.c src/partition.c src/wal.c src/disk_io.c src/binary.c src/logging.c src/persistent_index.c src/commit_log.c
LIB_OBJECTS=$(LIB_SOURCES:.c=.o)
libtimeseries.so: $(LIB_OBJECTS)
$(CC) -shared -o $@ $(LIB_OBJECTS)
%.o: %.c
$(CC) $(CFLAGS) -fPIC -c $< -o $@
clean:
@rm -f $(LIB_OBJECTS) libtimeseries.so
Building the library is a simple thing now, just a single command make
, to link it to a main, just a one liner
gcc -o my_project main.c -I/path/to/timeseries/include -L/path/to/timeseries -ltimeseries
LD_LIBRARY_PATH=/path/to/timeseries.so ./my_project
to run the main, a basic example of interaction with the library
#include "timeseries.h"
int main() {
// Initialize the database
Timeseries_DB *db = tsdb_init("testdb");
if (!db)
abort();
// Create a timeseries, retention is not implemented yet
Timeseries *ts = ts_create(db, "temperatures", 0, DP_IGNORE);
if (!ts)
abort();
// Insert records into the timeseries
ts_insert(&ts, 1710033421702081792, 25.5);
ts_insert(&ts, 1710033422047657984, 26.0);
// Find a record by timestamp
Record r;
int result = ts_find(&ts, 1710033422047657984, &r);
if (result == 0)
printf("Record found: timestamp=%lu, value=%.2lf\n", r.timestamp, r.value);
else
printf("Record not found.\n");
// Release the timeseries
ts_close(&ts);
// Close the database
tsdb_close(db);
return 0;
}
A server draft
Event based server (rely on ev at least initially), TCP as the main transport protocol, text-based custom protocol inspired by RESP but simpler:
$
string type!
error type#
array type:
integer type;
float type\r\n
delimiter
With the following encoding:
<type><length>\r\n<payload>\r\n
For example a simple hello string would be
$5\r\nHello\r\n
Simple query language
Definition of a simple, text-based format for clients to interact with the server, allowing them to send commands and receive responses.
Basic outline
- Text-Based Format: Use a text-based format where each command and response is represented as a single line of text.
- Commands: Define a set of commands that clients can send to the server to perform various operations such as inserting data, querying data, and managing the database.
- Responses: Define the format of responses that the server sends back to clients after processing commands. Responses should provide relevant information or acknowledge the completion of the requested operation.
Core commands
Define the basic operations in a SQL-like query language
-
CREATE creates a database or a timeseries
CREATE <database name>
CREATE <timeseries name> INTO <database name> [<retention period>] [<duplication policy>]
-
INSERT insertion of point(s) in a timeseries
INSERT <timeseries name> INTO <database name> <timestamp | *> <value>, ...
-
SELECT query a timeseries, selection of point(s) and aggregations
SELECT <timeseries name> FROM <database name> AT/RANGE <start_timestamp> TO <end_timestamp> WHERE value [>|<|=|<=|>=|!=] <literal> AGGREGATE [AVG|MIN|MAX] BY <literal>
-
DELETE delete a timeseries or a database
DELETE <database name> DELETE <timeseries name> FROM <database name>
Flow:
-
Client Sends Command: Clients send commands to the server in the specified text format.
-
Server Parses Command: The server parses the received command and executes the corresponding operation on the timeseries database.
-
Server Sends Response: After processing the command, the server sends a response back to the client indicating the result of the operation or providing requested data.
-
Client Processes Response: Clients receive and process the response from the server, handling success or error conditions accordingly.