Type-safe and fault-tolerant mesh services with Rust

Speaker: Nikita Lapkov

Description

The talk will be about how Rust enabled us to transform our actor system called elfo into a distributed mesh of services.

Elfo started out as an async Rust actor system, where all actors lived on a single node. It was created to serve extremely I/O-heavy workloads of high-frequency trading industry, with focus on developer ergonomics and performance. As the trading business grew, single-node deployment no longer satisfied the latency requirements when connected to different exchanges. From that, the need for distributed deployment arose.

The way messages are delivered is opaque to actors, since they use API provided by elfo for that. All messages are also defined as Rust structs, which we have complete control over. This means that if we “just” implemented a network transport for delivering messages, two actors living on different nodes could talk to each other as if they were on the same node.

The reality is, of course, not so simple. The talk will dive deep into how we chose multiple formats for message serialisation, implemented compression of messages while balancing between compression ratio and the latency, implemented back-pressure to avoid fast actors overwhelming the slow ones - all hidden behind just a few Rust macros and API calls. The talk will also include how we leverage total control of the transport to make everything observable and debuggable.

Track

Technology & Community

Level

Intermediate


Nikita Lapkov