Friday, October 30, 2009

Notes on Memcached

Some notes about Memcached. Here is its architecture.


How it works ?
Memcached is organized as a farm of N servers. The storage model can be considered as a huge HashTable partitioned among these N servers.

Every API request takes a "key" parameter. There is a 2-step process at the client lib ...
  • Given the key, locate the server
  • Forward the request to that server
The server receiving the request will do a local lookup for that key. The servers within the farm doesn't gossip with each other at all. Each server use asynchronous, non-blocking I/O and one thread can be used to handle large number of incoming TCP sockets. Actually a thread pool is being used but the number of threads is independent of the number of incoming sockets. This architecture is highly scalable for large number of incoming network connections.

API
Memcached provide a HashTable-like interface, so it has ...
  • get(key)
  • set(key, value)
Memcached also provides a richer "multi-get" so that one read request can retrieve values for multiple keys. The client library will issue different requests to multiple servers and doing the lookup in parallel.
  • get_multi(["k1", "k2", "k3"])
Some client lib offers a "master-key" concept such that a key contains 2 parts, the prefix master-key and the suffix key. In this model, the client lib only use the prefix to located the server (rather than looking at the whole key) and then pass the suffix key to that server. So user can group entries to be stored by the same server by using the same prefix key.
  • get_multi(["user1:k1", "user1:k2", "user1:k3"]) -- This request just go to the server hosting all keys of "user1:*"

For updating data, Memcached provides a number of variations.
  • set(key, value, expiration) -- Memcached guarantees the item will never be staying in the cache once the expiration time is reached. (Note that it is possible that the item being kicked out before expiration due to cache full)
  • add(key, value, expiration) -- Success only when no entry of the key exist.
  • replace(key, value, expiration) -- Success only when an entry of the key already exist.
Server Crashes
When one of the server crashes, all entries owned by that server is lost. Higher resilience can be achieved by storing redundant copies of data in different servers. Memcached has no support for data replication. This has to be taken care by the application (or client lib).

Note that the default server hashing algorithm doesn't handle the growth and shrink of the number of servers very well. When the number of servers changes, the ownership equation (key mod N) will all be wrong. In other words, if the crashed server needs to be taken out from the pool, the total number of servers will be decreased by one and all the existing entries needs to be redistributed to different server. Effectively, the whole cache (among all server) is invalidated even when just one server crashes.

So one approach to address this problem is to retain the number of Memcached servers across system crashes. We can have a monitor server to detect the heartbeat of all Memcached server and in case any crashes is detected, start a new server with the same IP address as the dead server. In this case, although the new server will still lost all the entries and has to repopulate the cache, the ownership of the keys are unchanged and data within the surviving node doesn't need to be redistributed.

Another approach is to run logical servers within a farm of physical machines. When a physical machine crashes, its logical servers will be re-start in the surviving physical machines. In other words, the number of logical servers is unchanged even when crashes happens. This logical server approach is also good when the underlying physical machines has different memory capacity. We can start more Memcached process in the machine with more memory and proportionally spread the cache according to memory capacity.

We also can use a more sophisticated technique called "consistent hashing", which localize the ownership changes to just the neighbor of the crashed server. Under this schema, each server is assigned with an id under the same key space. The ownership of a key is determined by the closest server whose key is the first one encountered when walking in the anti-clockwise direction. When a server crashes, its immediate upstream neighbor server (walking along the anti-clockwise direction) will adopt the key ownership of the dead server, while all other servers has the same ownership of key range unchanged.


Atomicity

Each request to Memcached is atomic by itself. But there is no direct support for atomicity across multiple requests. However, App can implement its own locking mechanism by using the "add()" operation provide by Memcached as follows ...
success = add("lock", null, 5.seconds)
if success
  set("key1", value1)
  set("key2", value2)
  cache.delete("lock")
else
  raise UpdateException.new("fail to get lock")
end

Memcached also support a "check-and-set" mechanism that can be used for optimistic concurrency control. The basic idea is to get a version stamp when getting an object and pass that version stamp in the set method. The system will verify the version stamp to make sure the entry hasn't been modified by something else or otherwise, fail the update.
data, stamp = get_stamp(key)
...
check_and_set(key, value1, stamp)

What Memcached doesn't do ?
Memcached's design goal is centered at performance and scalability. By design, it doesn't deal with the following concerns.
  • Authentication for client request
  • Data replication between servers for fault resilience
  • Key > 250 chars
  • Large object > 1MB
  • Storing collection objects
Data Replication DIY
First of all, think carefully about whether you really need to have data replication at the cache level, given that cache data should always be able to recreated from the original source (although at a higher cost).

The main purpose of using a cache is for "performance" reason. If your system cannot tolerate data lost at the cache level, rethink your design !

Although Memcached doesn't provide data replication, it can easily be done by the client lib or at the application level, based on a similar idea described below.

At the client side, we can use multiple keys to represent different copies of the same data. A monotonically increasing version number is also attached with the data. This version number is used to identify the most up-to-date copy and will be incremented for each update.

When doing update, we update all the copies of the same data via different keys.
def reliable_set(key, versioned_value)
  key_array = [key+':1', key+':2', key+':3']
  new_value = versioned_value.value
  new_version = versioned_value.version + 1
  new_versioned_value =
          combine(new_value, new_version)

  for k in key_array
      set(k, new_versioned_value)
  end
end
For reading the data from cache, use "multi-get" for multiple keys (one for each copy) and return the copy which has the latest version. If any discrepancy is detected (ie: some copies have a lacking version, or some copies are missing), start a background thread to write the latest version back to all copies.
def reliable_get(key)
  key_array = [key+':1', key+':2', key+':3']
  value_array = get_multi(key_array)

  latest_version = 0
  latest_value = nil
  need_fix = false

  for v in value_array
      if (v.version > latest_verson)
          if (!need_fix) && (latest_version > 0)
              need_fix = true
          end
          latest_version = v.version
          latest_value = v.value
      end
  end
  versioned_value =
          combine(latest_value, latest_version)

  if need_fix
      Thread.new do
          reliable_set(key, versioned_value)
      end
  end

  return versioned_value
end
When we delete the data, we don't actually remove it. Instead, we mark the data as deleted but keep it in the cache and let it expire.

User Throttling
An interesting use case other than caching is to throttle user that is too active. Basically you want to disallow user request that is too frequent.
user = get(userId)
if user != null
  disallow request and warn user
else
  add(userId, anything, inactive_period)
  handle request
end

9 comments:

Dustin said...

Note that as of 1.4.2 you *can* store objects greater than 1MB, but that's still discouraged (most people who try that are doing the wrong thing).

Also, as of 1.4.3 (which will be in RC in a few hours), the server can be configured to use SASL authentication and refuse clients that don't authenticate. The clients you have deployed today won't have this support, but the functionality should be trivial for your client vendor to add.

Ricky Ho said...

Thanks for comment !

I just add a section on data replication. Love to hear your comment.

Dustin said...

I think your replication section is pretty well on. I like to ask people if they replicate their L1 or L2 processor caches. :)

But there are clients that do the replication for you without so much work at the application level (though yours is a good idea, I think).

Ricky Ho said...

Thanks for comment.

Yes, if the client lib provides replication, that's even better. But even doing at the App level is not too much of work, you have seen pretty much all the code already.

Anonymous said...

Do I misunderstand the throttling example or did you mean to write:

if user != null

?

Ricky Ho said...

You are right ! I correct it.
Thanks for pointing it out

Don Xmar said...

very nicely explained article, keep it up!

Unknown said...

Hi Ricky,
As the last point in the section -'What Memcached doesn't do?', you have mentioned that memcached can't store collection objects. Do you mean Java Collection Objects like Lists, Sets, HashMaps etc?

faha said...

hello,

nice article, i know 2 years ago, but this could be interesting for somebody.

there is a path for memcached 1.2.x, add multi master replication feature

http://repcached.lab.klab.org/

thx, faha