Multi-server broadcast system

Multi-server system for broadcasting messages between clients
java

This is a multi-server system for broadcasting messages between clients inplemented in Java. The codes and basic introduction of the project can be found on my github repo here.

The message broadcasting is done by common implementation of the Socket/ServerSocket class and JSON object handling in Java, nothing too fancy. The code quality and structure can probably be improved quite a bit, like applying more software design pattern and making it more object-oriented, but that’s not the focus of this post.

The part I found interesting is to about maintaining high availability in a distributed system. According to the CAP theorem, a distributed system has to choose between availability and consistency in the presence of a network partition, as it’s impossible to guarantee both. We are required to focus on availability, i.e. to ensure servers can communicate with each other as soon as possible in the event of network partitioning.

Architecture of the system.

The system is implemented over the TCP protocol, meaning that it’s connection-oriented. Each server can receive multiple incoming connections from other servers or clients, but it can only make one outgoing connection to another server. In other words, the servers form a tree structure. Between any two nodes, there should always be only one connection, as more will result in abnormal behaviour of the broadcast protocol.

To achieve availability under network partition, I implemented a mechanism for servers to make outgoing connection to another server in the system, when the server it was originally connected to crashes.

The backup algorithm

In order to discuss the details of the algorithm, let’s call the servers that is making the outgoing connection children, and the servers that is receiving incoming connection parents, similar to a tree structure. We further define a backup server for each server, which is always the parent of its parent. The servers will follow the rules below:

The idea is simple: to ensure that when a server’s parent crashes it will connect to its backup server, so that all the remaining servers in the system would form a single tree. However, what about the root of the tree which doesn’t have a parent? To solve this, we need a few extra special rules:

These might seem a bit complicated at a first glance, so an example with diagram would be helpful. In the following diagram, the arrows indicate a parent-child relationship, and the parent and backup server of each node is shown beside each node.


server state graph 1           server state graph 2

a. The initial state of the system                                     b. The system after server A crashed


server state graph 3

c. The system after server B crashed (D,E,F fisrt connect to C, then C chooses F as its new parent)

Hopefully, we can see from the above example how the system works. This mechanism ensures that a single tree structure is always maintained in the system, and there will be only one connection between any two nodes. Therefore, availability is achieved(to some extent) under netwrok partitioning.

Flaws

This mechanism assumes that only one server would crash at a time. Imagine the scenario where a server’s parent and backup both crashes at the same time, before the server can connect to the backup. Then the server would try to connect to a server(its backup) that’s no longer available, which would of course fail. Although this simultaneous crash is unlikely to happen, it’s still a flaw in the system.

comments powered by Disqus