Interview Question: Explain Consistent Hashing

Publish date: 2025-02-04
Tags: Distributed-Systems, Interview-Questions

What is Consistent Hashing?

Imagine you’re distributing candy (data) among friends (servers). If a friend leaves or joins, you don’t want to reshuffle all the candy—only what’s necessary. Consistent hashing solves this problem by minimizing data movement when servers (nodes) are added or removed. It’s used by companies like Discord, Netflix, and Amazon DynamoDB to scale their systems efficiently.

Explain Consistent Hashing


How Consistent Hashing Works: A Step-by-Step Guide

1. The Hash Ring: A Circular Timeline

Think of a hash ring as a clock face. Both data objects (e.g., user profiles, videos) and nodes (servers) are hashed and placed on this ring.

Example Hash Ring:
Node A (Position 10) → Data X (Position 15) → Node B (Position 20)  

2. Adding or Removing Nodes

Diagram:

Before: [Node A] → (Data 1, Data 2) → [Node B]  
After adding Node C between A and B:  
[Node A] → (Data 1) → [Node C] → (Data 2) → [Node B]  

3. Virtual Nodes: Balancing the Load

To prevent one node from becoming a hotspot (handling too much data), each physical node is assigned multiple virtual nodes on the ring.

Physical Node A → Virtual Nodes (A1, A2, A3)  
Physical Node B → Virtual Nodes (B1, B2, B3)  

This spreads data evenly, even if some nodes are more powerful than others.


Key Benefits of Consistent Hashing

  1. Scalability: Add/remove nodes without reshuffling all data.
  2. Fault Tolerance: Failed nodes only affect their immediate data.
  3. Load Balancing: Virtual nodes prevent hotspots.

Challenges & Drawbacks

Pros and Cons of Consistent Hashing


Real-World Examples

  1. Discord: Uses consistent hashing to route messages across chat servers.
  2. Netflix: Distributes video content efficiently across CDN nodes.
  3. Amazon DynamoDB: Partitions NoSQL data dynamically.

Implementing Consistent Hashing

Data Structures

A self-balancing binary search tree (BST) is used to store node positions, enabling fast lookups (O(log n)).

Steps to Implement:

  1. Add a Node:
    • Hash its ID and add it to the BST.
    • Reassign data from neighboring nodes.
  2. Remove a Node:
    • Delete it from the BST.
    • Transfer its data to the next node.

Common Interview Questions (and How to Answer)

  1. “Explain consistent hashing in simple terms.”
    Compare it to a circular arrangement where data finds the next available server. Mention virtual nodes for load balancing.

  2. “What happens when a node fails?”
    Its data is reassigned to the next node clockwise. The system remains available.

  3. “How do virtual nodes improve performance?”
    They spread data evenly, prevent hotspots, and handle heterogeneous servers.


Final Tips for Your Interview


Consistent hashing is a foundational technique for building scalable systems. By mastering it, you’ll ace system design interviews and stand out as a candidate who understands real-world distributed systems. Good luck! 🚀

Tags: Distributed-Systems, Interview-Questions