How to crack a System Design Interview Series — Twitter Snowflake Approach

Yash Jain
4 min readJul 6, 2023

--

Pre-requisite — Design a Unique ID Generator in Distributed Systems

Suppose you are asked to design a unique ID generator in distributed system. Your first thought might be to use a primary key with the auto_increment attribute in a relational database. However, auto_increment does not work in a distributed environment because a single database server is not large enough and generating unique IDs across multiple databases with minimal delay is challenging.

Few examples of unique IDs:

One use case of unique IDs is user_id

In this article we are going to deep dive into Twitter Snowflake Unique ID generation approach. It is very inspiring and thoughtful approach which is highly scalable. In this approach, instead of generating ID directly, we divide an ID into different sections. We are going take an example of generating 64-bit long ID.

Bitwise Distribution (64 bits) —

bitwise distribution (eg. 64bits)

Explanation of each section —

  • Sign bit: 1 bit. It will always be 0. This is reserved for future uses. It can potentially be used to distinguish between signed and unsigned numbers or we can use it for any purpose. It’s upto the system’s requirements.
  • Timestamp: 41 bits. Milliseconds. Refer to the sample calculations below for better understanding. It’s better if you take markup epoch time value as close as possible to the day you are launching this.
Sample timestamp calculations
  • Datacenter ID: 5 bits. which gives us 2⁵ = 32 datacenters.
  • Machine ID: 5 bits. which gives us 2⁵ = 32 machines per datacenter.
  • Sequence Number: 12 bits. For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond.

Datacenter IDs and machine IDs are chosen at the startup time, generally fixed once the system is up running. Any changes in datacenter IDs and machine IDs require careful review since an accidental change in those values can lead to ID conflicts. Timestamp and sequence numbers are generated when the ID generator is running.

Timestamp Analysis —

The most important 41 bits make up the timestamp section. As timestamp grows with time, IDs are sortable by time. The way binary representation is converted to UTC, you can also convert UTC back to binary representation using a similar method (libraries are pre available to do that for almost all the languages)

The maximum timestamp that can be represented in 41 bits is

2⁴¹ — 1 = 2199023255551 milliseconds (ms), which gives us ~ 69 years

2199023255551 ms / 1000 sec / 365 days / 24 hours / 3600 sec ~= 69 years

This means the ID generator will work for 69 years and having a custom markup epoch time close to today’s date delays the overflow time. After, 69 years, we will need a new epoch time or adopt other techniques to migrate IDs.

Sequence Number Analysis —

Sequence number is 12 bits, which gives us 2¹² = 4096 combinations. This field is 0 unless more than one ID is generated in a millisecond on the same server. Note: In theory, a machine can support a maximum of 4096 new IDs per millisecond.

Pros of Snowflake approach —

  • Works with distributed systems.
  • Takes care of the multiple Datacenter problem as well.
  • Highly Scalable.
  • Sortable IDs on time.

Cons of Snowflake approach —

  • Clock SynchronizationIn our design, we assume ID generation servers have the same clock. This assumption might not be true when a server is running on multiple cores. The same challenge exists in multi-machine scenarios. You don’t need to discuss the solutions to this problem, it’ll increase the complexity of the interview and might set a different path of discussion altogether. Network Time Protocol is the most popular solution to the this problem. For interested readers, refer to the reference materials below.
  • Section Length TuningFor example, fewer sequence numbers but more timestamp bits are effective for low concurrency and long term applications.

References —

If you have reached this far. Sit back and relax. Congratulations!!! You have learned something awesome today. Great job!!!

Please, follow #tech-granth

--

--

Yash Jain
Yash Jain

Written by Yash Jain

Tech Enthusiast | Founding Member - AckoDrive | Ex - DP World, OYO

Responses (1)