Exploring Data Structures in Redis

Redis is an open-source, in-memory data structure store used as a database, cache, and message broker. One of the key features that sets Redis apart from other databases is its support for multiple data structures from simple ones like strings, hashes, lists, sorted sets to more complicated ones like Hyperloglog, Bitmaps, and Streams.

In this article, we'll dive into each of these data structures, discussing their use cases and how to work with them in Redis. Buckle up and let's get started!

Setting up Redis in docker container

Install docker. You may refer this or this for the same.

docker run --name redis -d -p 6379:6379 redis # To run redis in docker container
docker exec -it redis redis-cli # To access redis cli (Optional)

Strings

Strings are the most basic data structure in Redis, and they're pretty much what they sound like: a sequence of characters. In Redis, strings are used to store values with an associated key. Here's an example of setting and retrieving a string value in Redis:

127.0.0.1:6379> SET my_key "hello world"
OK
127.0.0.1:6379> SET my_int 30
OK
127.0.0.1:6379> GET my_key
"hello world"
127.0.0.1:6379> GET my_int
"30" # Note that the value is returned as a string

In addition to simple string values, Redis strings also have several useful commands for working with string values, such as APPEND to concatenate two strings and INCR to increment a numerical value stored as a string.

Hashes

Redis hashes are similar to dictionaries in other programming languages. Hashes allow you to store multiple key-value pairs within a single Redis key. Here's an example of setting and retrieving values from a Redis hash:

127.0.0.1:6379> HSET my_hash name "John Doe" email "john.doe@example.com"
(integer) 1
127.0.0.1:6379> HSET my_hash age 30
(integer) 1
127.0.0.1:6379> HGET my_hash name
"John Doe"

Hashes are incredibly useful for storing structured data, such as user profiles, product information, and more. Redis hashes also provide several commands for working with hash values, such as HGETALL to retrieve all key-value pairs within a hash and HINCRBY to increment a numerical value stored within a hash (I hear you. No you can't use INCR on a hash value)

Lists

Redis lists are ordered collections of strings, similar to arrays in other programming languages. You can think of a Redis list as a series of values, where each value has a unique index. Here's an example of using a Redis list:

127.0.0.1:6379> LPUSH my_list "apple"
(integer) 1
127.0.0.1:6379> LPUSH my_list "banana"
(integer) 2
127.0.0.1:6379> LPUSH my_list "cherry"
(integer) 3
127.0.0.1:6379> LRANGE my_list 0 -1 # 0 is the first index, -1 is the last index
1) "cherry"
2) "banana"
3) "apple"

Redis lists are commonly used for tasks such as message queues and task management. Redis provides several commands for working with lists, such as LPOP to remove the first value in a list and RPOP to remove the last value in a list. You can also use LINDEX to retrieve a value at a specific index in a list.

Sets

Redis sets are unordered collections of unique strings. Sets are similar to arrays, but with the added constraint that all values in a set must be unique. Here's an example of using a Redis set:

127.0.0.1:6379> SADD my_set "apple"
(integer) 1
127.0.0.1:6379> SADD my_set "banana"
(integer) 1
127.0.0.1:6379> SADD my_set "cherry"
(integer) 1
127.0.0.1:6379> SADD my_set "apple"
(integer) 0
127.0.0.1:6379> SMEMBERS my_set
1) "banana"
2) "apple"
3) "cherry"

As you can see, even though we tried to add the value "apple" twice to the set, Redis only stored it once since sets only allow unique values.

Redis sets are incredibly useful for tasks such as membership testing, set intersection, and set union operations. Some of the popular commands for sets include SISMEMBER to check if a value is in the set, SINTER to find the intersection of two or more sets, and SUNION to find the union of two or more sets.

Sorted Sets

Redis sorted sets are similar to sets, but each value in a sorted set is associated with a score. This score is used to sort the values in the set, with the highest score appearing first. Here's an example of using a Redis sorted set:

127.0.0.1:6379> ZADD my_sorted_set 100 "apple"
(integer) 1
127.0.0.1:6379> ZADD my_sorted_set 200 "banana"
(integer) 1
127.0.0.1:6379> ZADD my_sorted_set 150 "cherry"
(integer) 1
127.0.0.1:6379> ZRANGE my_sorted_set 0 -1
1) "banana"
2) "cherry"
3) "apple"

Reason why sorted set has command ZADD has Z in the beginning is because it represents Z-Score.

As you can see, the values in the sorted set are sorted based on their scores, with the highest score appearing first.

Redis sorted sets are incredibly useful for tasks such as leaderboards, real-time analytics, and more. Some of the popular commands for sorted sets:

ZRANGEBYSCORE to retrieve values within a specific score range.
ZREVRANGE to retrieve values in reverse order (get top N).
ZINCRBY to increment the score of a value in a sorted set.
ZCARD to retrieve the number of values in a sorted set.
ZSCORE to retrieve the score of a value in a sorted set.

Here's how you can use it in real-time analytics

Use ZREVRANGE to retrieve the top N pages with the highest number of page views.
Use ZINCRBY to increment the page view count for a page when a user visits the page.
Use ZSCORE to retrieve the page view count for a page.
Use ZCARD to retrieve the total number of pages in the sorted set.

HyperLogLog

HyperLogLog is a probabilistic data structure in Redis that is used for estimating the number of unique elements in a large data set. Unlike traditional counting methods, HyperLogLog uses a small amount of memory and can estimate the number of unique elements with a high degree of accuracy.

127.0.0.1:6379> PFADD my_hyperloglog "apple"
(integer) 1
127.0.0.1:6379> PFADD my_hyperloglog "banana"
(integer) 1
127.0.0.1:6379> PFADD my_hyperloglog "cherry"
(integer) 1
127.0.0.1:6379> PFCOUNT my_hyperloglog
(integer) 3

As you can see, we were able to use the PFADD command to add values to the HyperLogLog, and the PFCOUNT command to retrieve the estimated count of unique values in the set.

Here is an example of using the HyperLogLog in python to insert millions of values and estimate the number of unique values with and without HyperLogLog:

import redis
import timeit

print("Connecting to Redis server...")

r = redis.Redis(host='localhost', port=6379, db=0)

def million_times(func):
    for i in range(10_00_000):
        if i % 10_000 == 0:
            print(f'Inserted {i} values')
        func(i)

if __name__ == '__main__':
    print("Connected. Inserting values...")

    time = timeit.timeit(million_times(lambda i: r.sadd('unique_values', i)), number=1)
    print(f'Time taken to insert values without HyperLogLog: {time})

    time = timeit.timeit(r.scard('unique_values'), number=1)
    print(f'Time taken to estimate unique values without HyperLogLog: {time})

    time = timeit.timeit(million_times(lambda i: r.pfadd('unique_values_hll', i)), number=1)
    print(f'Time taken to insert values with HyperLogLog: {time})

    time = timeit.timeit(r.pfcount('unique_values_hll'), number=1)
    print(f'Time taken to estimate unique values with HyperLogLog: {time})

To get more performance in Python using multiple threads as well as Redis pipelines:

import redis
import concurrent.futures
import time
from tqdm import tqdm  # pip install tqdm


def insert_values(r, values):
    with r.pipeline() as pipe:
        for i in values:
            pipe.sadd("unique_values", i)
        pipe.execute()

def insert_values_hll(r, values):
    with r.pipeline() as pipe:
        for i in values:
            pipe.pfadd("unique_values_hll", i)
        pipe.execute()


def main():
    r = redis.Redis(host="localhost", port=6379, db=0)

    NUM_VALUES = 10_000_000
    CHUNK_SIZE = 10_000
    num_chunks = NUM_VALUES // CHUNK_SIZE
    chunks = [range(i * CHUNK_SIZE, (i + 1) * CHUNK_SIZE) for i in range(num_chunks)]

    print("Connected. Inserting values...")

    start_time = time.perf_counter()

    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Insert values without HyperLogLog
        results = [executor.submit(insert_values, r, chunk) for chunk in chunks]
        for result in tqdm(
            concurrent.futures.as_completed(results),
            total=len(results),
            desc="Inserting values without HyperLogLog",
        ):
            result.result()

    end_time = time.perf_counter()
    print(
        f"Time taken to insert values without HyperLogLog: {end_time - start_time:0.2f} seconds"
    )

    start_time = time.perf_counter()
    unique_value_count = r.scard('unique_values')
    end_time = time.perf_counter()
    print(
        f"Time taken to estimate unique values without HyperLogLog: {end_time - start_time} seconds"
    )

    start_time = time.perf_counter()

    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Insert values with HyperLogLog
        results = [executor.submit(insert_values_hll, r, chunk) for chunk in chunks]
        for result in tqdm(
            concurrent.futures.as_completed(results),
            total=len(results),
            desc="Inserting values with HyperLogLog",
        ):
            result.result()

    end_time = time.perf_counter()
    print(
        f"Time taken to insert values with HyperLogLog: {end_time - start_time:0.2f} seconds"
    )

    start_time = time.perf_counter()
    unique_value_count_hll = pipe.pfcount("unique_values_hll")
    end_time = time.perf_counter()
    print(
        f"Time taken to estimate unique values with HyperLogLog: {end_time - start_time} seconds"
    )


if __name__ == "__main__":
    main()

You can run htop to see different threads as well montior the CPU/memory usage. Once the data has been inserted you may want to run redis-cli and run MEMORY USAGE unique_values and MEMORY USAGE unique_values_hll to see the memory usage of the two sets. You can also shut down the docker container using docker restart redis and then ru docker stats redis to see the memory usage of the Redis container. You'll notice that it rises over time (reaches 500MB+) as Redis loads the data into memory. This demonstrates persistence in Redis :)

Bitmaps

Redis also supports bitmaps, which are arrays of bits that can be used to represent simple binary values (e.g. true/false or on/off). Bitmaps are a memory-efficient way to store large amounts of binary data and can be used for tasks such as real-time analytics, A/B testing, and more.

Here is an example of using bitmaps in Redis:

127.0.0.1:6379> SETBIT my_bitmap 1 1
(integer) 0
127.0.0.1:6379> SETBIT my_bitmap 0 1
(integer) 0
127.0.0.1:6379> GETBIT my_bitmap 0
(integer) 0
127.0.0.1:6379> GETBIT my_bitmap 1
(integer) 1

In this example, we used the SETBIT command to set the value of two bits in the bitmap, and the GETBIT command to retrieve the values of these bits.

For A/B testing at scale, you can use the BITOP command to count the number of users who have seen the A variant of an A/B test, the number of users who have seen the B variant of an A/B test, and the number of users who have seen both variants of an A/B test.

For analytics, you can use the BITOP command to count the number of users who have visited a certain page, the number of users who have visited a certain page and then visited another page, and the number of users who have visited a certain page and then visited another page and then visited a third page.

Streams

Finally, Redis also supports streams, which are a new data structure in Redis that allow for storing ordered collections of data (e.g. logs, events, and more). Streams provide a scalable way to handle real-time data streams and can be used for tasks such as real-time analytics, event-driven architectures, and more.

Here's an example of using streams in Redis:

127.0.0.1:6379> XADD my_stream * name "John" age 30
1550373738452-0
127.0.0.1:6379> XADD my_stream * name "Jane" age 25
1550373738453-0
127.0.0.1:6379> XRANGE my_stream - +
1) 1) "1550373738452-0"
   2) 1) "name"
      2) "John"
      3) "age"
      4) "30"
2) 1) "1550373738453-0"
   2) 1) "name"
      2) "Jane"
      3) "age"
      4) "25"

Here XRANGE is used to read all the data from the stream. The - and + arguments specify that we want to read all the data from the stream. The XADD command is used to add data to the stream. The * argument specifies that we want to add data to the stream without specifying an ID. The name and age fields are added to the stream.

For real-time analytics or event-driven architectures, you can use the XREAD command to read data from a stream in real-time. For example, you can use the XREAD command to read data from a stream that contains logs from a web server and then use the logs to perform real-time analytics.

Geospatial Indexes

Redis supports geospatial indexes, which can be used to store and query data based on its geographic location. This makes it possible to perform operations such as finding the nearest neighbor, checking if a point is within a given area, and more.

Here's an example of using geospatial indexes in Redis:

127.0.0.1:6379> GEOADD my_geoindex 13.3613 89.3522 "Bangkok"
(integer) 1
127.0.0.1:6379> GEOADD my_geoindex 37.7749 -122.4194 "San Francisco"
(integer) 1
127.0.0.1:6379> GEOPOS my_geoindex "Bangkok" "San Francisco"
1) 1) "13.3613" "89.3522"
2) 1) "37.7749" "-122.4194"

In this example, we used the GEOADD command to add two cities to the geospatial index, and the GEOPOS command to retrieve their geographic locations.

Pub/Sub

Redis also supports a publish/subscribe (pub/sub) messaging system, which allows clients to publish messages to channels and subscribe to messages from these channels. This makes it possible to implement real-time, event-driven systems and enables decoupled communication between different parts of an application.

Here's an example of using the pub/sub system in Redis:

# First client
127.0.0.1:6379> SUBSCRIBE my_channel
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "my_channel"
3) (integer) 1

# Second client
127.0.0.1:6379> PUBLISH my_channel "Hello, World!"
(integer) 1

# First client
1) "message"
2) "my_channel"
3) "Hello, World!"

In this example, the first client subscribes to the channel my_channel, and the second client publishes a message to this channel. The first client receives the message and prints it to the console.

Extra Commands

redis-cli monitor: see the commands being executed by Redis.
redis-cli info: see information about the Redis server.
redis-cli config get *: see the configuration settings of the Redis server.
redis-cli slowlog get: see the slowest commands executed by Redis.

Conclusion

As you can see, Redis has a rich set of data structures that can be used to solve a wide range of problems. From strings, hashes, and lists, to more advanced structures such as HyperLogLog, bitmaps, streams, geospatial indexes, and pub/sub, Redis provides a flexible and powerful toolkit for building high-performance, scalable applications. I hope this blog post has helped you understand the different data structures that Redis supports and how they can be used to solve real-world problems.