Re: Large Rock Store

From: Kinkie <gkinkie_at_gmail.com>
Date: Wed, 17 Oct 2012 16:57:47 +0200

On Wed, Oct 17, 2012 at 4:34 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 10/16/2012 04:45 PM, Amos Jeffries wrote:
>> On 17.10.2012 11:02, Alex Rousskov wrote:
>>> Hello,
>>>
>>> We have started working on caching "large" (i.e., multi-slot)
>>> responses in Rock Store. Initial design overview is available at the
>>> following wiki page, along with several design alternatives we have
>>> considered. Feedback is welcomed.
>>>
>>> http://wiki.squid-cache.org/Features/LargeRockStore
>>>
>>>
>>> As a part of Large Rock work, I will also try to fix at least one of
>>> Store API quality problems that complicate any Store-related
>>> improvements:
>>>
>>> 1) Store class is a parent of too many, barely related classes:
>>> StoreController, StoreHashIndex, SwapDir, MemStore (and TestStore). This
>>> "top heavy" design forces us to add pure virtual methods to Store and
>>> then supply dummy implementations in many Store kids. And, more
>>> importantly, it makes it difficult to understand which part of the
>>> storage API each Store kid is responsible for, leading to boundaries
>>> violation and other problems.
>>>
>>> 2) There is no class dedicated to a non-shared memory cache.
>>> StoreController currently implements most of non-shared memory cache
>>> logic while MemStore implements shared memory cache. StoreController
>>> non-shared memory sharing code should be moved to a dedicated class
>>> instead.
>>>
>>> 3) StoreHashIndex should probably become responsible for all disk caches
>>> (SwapDirs) as a whole, while StoreController will coordinate disk and
>>> memory caching (using StoreHashIndex and MemStore objects). Currently,
>>> some disk-related manipulations reside in StoreHashIndex and some in
>>> StoreController.
>>>
>>> 4) Get rid of the global store_table. Make it local to non-shared
>>> caching code. As we have discussed previously, the existence of this
>>> global makes shared caching life very difficult because we cannot share
>>> such a complex table and yet some older code expects to find all entries
>>> there. It also leads to problems with entry locking where the code
>>> assumes that an idle entry will remain in the global table at least for
>>> a while (I believe there are a few bugzilla reports about associated
>>> core dumps). I doubt I can solve all the StoreEntry locking problems (it
>>> may require significant client side rewrite), but removing store_table
>>> global is a step in the right direction.
>>>
>>>
>>> If you have suggestions on priorities or want to add other large-scale
>>> Store annoyances to the above list, please let me know.
>
>
>
>> Thank you, architecture clarification for store would be very welcome.
>>
>> IMHO;
>> - "Store" should be the namespace - or the name of a base class for
>> shared API accessors to a *single* store regardless of type.
>
> Good idea! A common namespace (and source directory?) for core "object
> store" classes would be a nice addition. If we need a single store
> interface class, it should probably be named "Store". The namespace
> would then be "Cache" (too general?), "Swap" (too disk-specific), or
> "Storage" (too long?).
>
>
>> - Low-level virtual methods should be in another class specific to the
>> type of store (disk, memory, shared, ...). Which the particular store
>> controller inherit from alongside "Store" interface class.
>
> Yes, of course, except I am not sure a common parent for all classes is
> actually needed. This will become clear during polishing.
>
>
>> - Index hash object/s should be independent of the store backend and
>> the class re-used by all (separate instances though).
>
> Actually, indexing objects is store's prerogative and its details are
> invisible to callers. Today, all indexing classes are required by some
> stores and cannot be reused by other stores, and I do not expect that to
> change. There is one serious challenge here. I will discuss it in a
> separate email.
>
>
>> - store controller - managing the set of available stores for
>> startup/reload/clean/dirty/up/down/search/object-move handling?
>
> Yes, it already more-or-less does that. We just need to polish it by
> removing code that is meant to be elsewhere and bringing
> Controller-specific code into that class. It will manage/coordinate two
> objects: "memory store" and "disk stores".
>
>
>> Since you mention a difficulty identifying what each of the store API
>> classes does, can you suggest anything better than by arch brainstorm
>> above?
>
> I think we are in agreement on the high-level stuff. I do not think it
> we should try to document most of the polishing/reshuffling details now
> because there are too many of them, they are determined by high-level
> class roles, the changes should be pretty clear in the patches, and I
> may not fix everything anyway. There is one exception that I will
> discuss separately.
>
>
>> For example; its not clear to me in the slightest why StoreHashIndex is
>> relevant to disks but not in-memory caches. Surely the in-memory have a
>> hash index of their content as well? (or should after global store_table
>> is gone) - or is the class name irrelevant to what it actually does
>> nowdays?
>
> Please ignore the current StoreHashIndex class name. It will not survive
> this polishing.
>
> This class is needed to represent all disk caches taken together and
> coordinate activities among individual disk caches (e.g., cache_dir
> selection for a given miss object).
>
> Why separate the memory store from disk stores? While there is a lot in
> common between disk and memory caches, there are a few significant
> differences. For example:
> * There may be many disk caches (that need coordination) but there is at
> most one memory cache. That is why disk caches need a "disk cache
> coordination" class and memory cache does not.
>
> * Memory cache access methods are synchronous. Most disk cache access
> methods are not (or should not be).

This is an interesting argument in my opinion, which I have spent some
time thinking on in the past.
In my opinion, the "memory cache" name is misleading.
IMVHO eventually we could have a tiered cache system, with consistent
API modeled after the current disk caches, with a "small and very fast
cache" (e.g. ram-backed), a "bigger and quite fast cache" (e.g.
rockstore+ssd - backed) cache, a "big and quite slow cache" (e.g.
aufs), and a "very big and very slow" cache (e.g. somehow distributed,
for instance: memcached, aufs over nfs, hadoop/cassandra/gfs...), and
policies to promote/demote objects across the various tiers.

What is now the memory cache, in that context, would become more of a
specialized, synchronous, transient area used to shuttle data (partial
objects, collapsed forwarding, etc) to/from the http pipe, and
possibly to support the object promotion/demotion activities. But then
it wouldn't be really a cache anymore, would it?

> * An object may be placed in at most one disk cache, in at most one
> memory cache, but it can be cached both on disk and in memory. The first
> algorithm is (or will be) implemented by the StoreHashIndex replacement,
> the second by MemStore itself, and the third by StoreController.

Why?
If we accept we can promote/demote objects in tiered caches, it means
that an object could be (with different lifetimes if determined by
capacity) in different caches.

-- 
    /kinkie
Received on Wed Oct 17 2012 - 14:57:55 MDT

This archive was generated by hypermail 2.2.0 : Wed Oct 17 2012 - 12:00:06 MDT