JBrisbin.com

Updates to RabbitMQ NoSQL cache

10
Jul

Updates to RabbitMQ NoSQL cache

I'm going on vacation next week, so I ended up pushing an extremely rough, alpha-quality rough draft of the asynchronous distributed cache I blogged about earlier in the week. I told you I wouldn't be able to stay away from it! I also made some tweaks this morning that eek out a few more milliseconds by reusing message sending objects, which eliminates the overhead of creating new channels all the time (though that operation is not particularly "expensive" to begin with). I'll be replacing the existing code in the session manager with code that uses this new distributed cache. This should boost performance in the session manager too, since I wasn't focussed on getting the object send/receive code working as efficiently as possible. By breaking out this particular bit of code, I was able to isolate it and boost its performance.

It would seem to me that parallelizing cache interaction and making it asynchronous has the potential to increase the performance of applications that fetch a lot of objects at once. If you're only pulling in one object at a time, however, you might not see as much of a performance boost. Either way, you gain the flexibility of dynamic cache backends. It's really hard to do meaningful performance testing on my local machine because I'm running the broker, the cache provider, and doing the cache tests all on the same machine. That results in a lot of context switching and the performance is noticeably degraded. When I ran my tests on three different machines, however, the performance was hovering around 10 milliseconds or less. I tripled the number of objects I was loading in my tests and the performance did not change. I don't know what the balance is between cache providers, workers, and number of objects because I haven't had much time to do significant testing to find that out.

Parent/Child Relationships

One of the things I've not really fleshed out, but is built into this from the beginning, is the idea of parent-child relationships for creating arbitrarily-large object graphs. There's no reason it can't be done with this system and it might even significantly improve the performance of nested retrievals because everything is parallelized. Assuming a parent with 1,000 children in the cache, spread across 5 cache providers, each cache would store 200 objects. If each cache provider had 100 workers running (in 100 threads) the application cache could request up to 500 objects at once. Since the application cache is processing objects as they come in via callbacks, the application code could either wait to accumulate the entire object graph, or it could do more processing with each object returned. Which you do would depend on the situation, of course.

Problems and Caveats

I'm very pleased so far with the performance and the flexibility of an asynchronous, RabbitMQ-backed, distributed NoSQL cache. I'm kind of excited by the possibilities it opens up for private cloud architectures.

One of the problems of a distributed cache I haven't solved to my satisfaction yet is what happens with requests that have no object in the cache. In the session manager, I solved this problem by keeping a list of valid session ids that exist anywhere in the cloud. This works great when you want to know if a session ID is valid or not. But it didn't seem feasible to me to extend this to a generic cache. It seemed to be duplicitous. The approach I'm taking here is to always send a message whether the object exists in the cache or not. Any empty message body is basically a cache miss. Via the heartbeat messages, the cache knows how many providers it should expect responses from. If it gets that many messages back without an object in any of them, then that ID doesn't exist anywhere in the cloud. But this also means that if 1 out of the 5 providers has an object, every object load will result in 5 messages: 1 with the object and 4 "null" messages. The cache load method will call the callback only with a valid object at the moment. My plan is to implement a CountDownLatch based on the number of cache nodes active and call the callback with either the object or "null" after its gotten the expected number of responses.

There's still more to do to make this a robust and re-usable distributed cache. As I always say, patches and feedback are warmly welcomed!

Code is on GitHub: http://github.com/jbrisbin/vcloud/tree/master/async-cache/

blog comments powered by Disqus