This is a multi-server system for broadcasting messages between clients inplemented in Java. The codes and basic introduction of the project can be found on my github repo here.
The message broadcasting is done by common implementation of the Socket/ServerSocket class and JSON object handling in Java, nothing too fancy. The code quality and structure can probably be improved quite a bit, like applying more software design pattern and making it more object-oriented, but that’s not the focus of this post.
The part I found interesting is to about maintaining high availability in a distributed system. According to the CAP theorem, a distributed system has to choose between availability and consistency in the presence of a network partition, as it’s impossible to guarantee both. We are required to focus on availability, i.e. to ensure servers can communicate with each other as soon as possible in the event of network partitioning.
Architecture of the system.
The system is implemented over the TCP protocol, meaning that it’s connection-oriented. Each server can receive multiple incoming connections from other servers or clients, but it can only make one outgoing connection to another server. In other words, the servers form a tree structure. Between any two nodes, there should always be only one connection, as more will result in abnormal behaviour of the broadcast protocol.
To achieve availability under network partition, I implemented a mechanism for servers to make outgoing connection to another server in the system, when the server it was originally connected to crashes.
The backup algorithm
In order to discuss the details of the algorithm, let’s call the servers that is making the outgoing connection children, and the servers that is receiving incoming connection parents, similar to a tree structure. We further define a backup server for each server, which is always the parent of its parent. The servers will follow the rules below:
- When a server detects that its parent crashes, it will try to connect to its backup server, and set that as its new parent.
- After a server connects to a new parent successfully, it will set up a new backup server, which is the parent of its new parent.
- Following the above, a server will broadcast a message to all of its children to inform them that its parent has changed.
- When a server receives a message that its parent’s parent has changed, it will change its backup server accordingly.
- A server would do nothing if any of its children crashes.
The idea is simple: to ensure that when a server’s parent crashes it will connect to its backup server, so that all the remaining servers in the system would form a single tree. However, what about the root of the tree which doesn’t have a parent? To solve this, we need a few extra special rules:
- The first two servers (the root and its first child) in a system will not have a backup server, i.e. their backup server is null, and they will set each other as their own parent.
- When a server without a backup detects that its parent crashes, it will choose one of its existing connected server as its parent.
- When a server A receives a message from its parent B, that B chooses A as its new parent, A will set its backup server to null. In other words, there will always be two servers without backup server in the system.
These might seem a bit complicated at a first glance, so an example with diagram would be helpful. In the following diagram, the arrows indicate a parent-child relationship, and the parent and backup server of each node is shown beside each node.
a. The initial state of the system b. The system after server A crashed
c. The system after server B crashed (D,E,F fisrt connect to C, then C chooses F as its new parent)
Hopefully, we can see from the above example how the system works. This mechanism ensures that a single tree structure is always maintained in the system, and there will be only one connection between any two nodes. Therefore, availability is achieved(to some extent) under netwrok partitioning.
Flaws
This mechanism assumes that only one server would crash at a time. Imagine the scenario where a server’s parent and backup both crashes at the same time, before the server can connect to the backup. Then the server would try to connect to a server(its backup) that’s no longer available, which would of course fail. Although this simultaneous crash is unlikely to happen, it’s still a flaw in the system.