10 tips for building “Purely” horizontal scalable Architectures

Braga J
Francium Tech
Published in
5 min readMar 20, 2018

--

To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new node or a computer to a distributed software application.

This type of scaling should result in an increase in the the overall throughput of the application. I say should, because this may not always be the case. There will always be an overhead when adding an extra node to the list of available nodes. So, the more nodes you add horizontally, there is always a risk of pummelling one of the systems somewhere else in your architecture which could choke and die because of this addition.

The following graph shows you a how a typical horizontal scaling affects the throughput.

The blue bars are a node’s contribution to the horizontal system, where as the red bars are the overall throughput.

It is often possible that not only adding extra nodes will improve the throughput, there are cases where this could be detrimental. Such an architecture will cost 10 times as an individual but will not perform 10 times faster as the earlier. Even worse is when there could be times where adding more nodes will bring the systems throughput to zero as shown below

The throughput (red bar) becomes zero after addition of “Node 13”. Further addition of nodes contribute nothing to the system

Enter “purely” horizontal scalable Architecture

If Horizontal scaling is going to improve the throughput, a “purely” horizontal scalable architecture will improve the throughput with zero or a very minuscule increase in the overhead. Which means if a node can complete a 10 tasks in 1 hour and if we add 9 more nodes to this, it could complete 100 tasks in the same 1 hour.

Continuing with the same graphs, a purely horizontal architecture will look like this,

A Purely horizontal scalable Architecture

This is exactly the type of architecture every system should try to achieve

Factors that would improve the pure scalability

Now that we have understood on a very high level what effects horizontal scaling has on the throughput, let us take a look into some of the parameters which we could tune to make it purely horizontal

  1. List out all the bottlenecks: At the beginning phase of your design, theoretically you should know what are all the things that are going to be shared by when you are wanting to scale them horizontally. A simple example could be a database system. Assuming each node wants 50 connections and the database in itself can provide upto a maximum of 500, you are now limited to scale your system only 10 times. Because by that time when you are going to add your 11th node, you would have exhausted all your connections and entering into “contention phase” which for sure will have a detrimental effect on the overall throughput.
  2. Make dependencies atomic: After identifying the bottlenecks a.k.a shared resources, see if you can minimise the communication frequency with these shared resources from each individual node. Best case scenario, see if you can eradicate it altogether.
  3. Caching: Caching, especially a distributed cache that is scalable by design will ease the pressure on the individual nodes. Lot of caching is available out of box. One example, that comes to my mind Amazon’s ElasticCache.
  4. Framework’s role: This is often an ignored topic, but choosing the right stack/framework is extremely vital when you are trying to go purely horizontal. For instance, a framework like Elasticsearch has all the ingredients built in while you want to scale out. Still, choosing a framework that advertises itself as scale friendly alone is not going to help you unless you know what exactly you are doing.
  5. Simulation Driven/Visualise: You should have a load script or a test script ready to simulate your horizontal scalability. It is very difficult to identify what will break or not break only when you increase the load to the required thresholds. Also, you need to be visually see how the scaling affects the throughput. To do that, building such a feature before hand will be extremely handy.
  6. Unit Test cases: Some might feel strange why do I even need unit test cases when I am trying to scale my infrastructure!? The simple answer is irrespective of you write code for scale or not for scale, unit test cases are extremely important. This has saved me millions of times and have provided extremely useful when tuning different variables of the system without compromising the functionality.
  7. Talk to other highly scalable systems: When you want to up your ante and go purely scalable, you need peers too who are in that league. This ensures that you know with “whom” you are committing to. Often cases, you can delegate your throughput issues to these systems and be at peace.
  8. Deployment: Not just having a feature to scaling is enough, but you should also be able to deploy it with minimum ease. The preparations you do before scaling out should be as minimal as possible. Otherwise this in itself will become a headache. Frameworks like Rancher gives superb support in making deployments and scaling absolutely painless.
  9. Error handling: When you are going to scale out your systems to 250,000 independent processes (like we literally did), it is extremely vital that you are able to capture any errors that occur during this insane scale period. Otherwise, you will never know what is going wrong where and you end up spending a lot of time debugging. Often a fraction of that debugging time is enough to build a fool-proof error handling system.
  10. Going Serverless: See if you can move or provision your problem set to a serverless architecture. They provide a lot of advantages and support this type of scaling out of the box. But then, they aren’t a breeze either. It comes with its own drawbacks like Cold Starts, Language support, code size limitations, library dependencies etc.,. Check out our cost study for one of the use cases we picked for evaluating its pros and cons.

The above factors are not exhaustive of course, there are various other factors too, but this should cover the overall pieces required in building purely horizontal architectures.

Francium Tech is a technology company laser focussed on delivering top quality software of scale at extreme speeds. Numbers don’t scare us. If you have any requirement or want a free health check of your systems architecture, feel free to shoot an email at contact@francium.tech, we will get in touch!

--

--