Re: pseudo-specs for a String class: char *buf from Kinkie on 2008-09-03 (squid-dev)

From: Kinkie <gkinkie_at_gmail.com>
Date: Thu, 4 Sep 2008 00:12:57 +0200

On Wed, Sep 3, 2008 at 5:49 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On Wed, 2008-09-03 at 16:53 +0200, Kinkie wrote:
>> On Wed, Sep 3, 2008 at 3:59 PM, Alex Rousskov
>> <rousskov_at_measurement-factory.com> wrote:
>> >
>> > I looked at your StringNg wiki page and noticed that your string has
>> > a "char *buf" pointer into the memory buffer (in addition to the buffer
>> > pointer itself). I think it would be better to use an offset instead of
>> > the pointer into internal buffer area:
>>
>> Yes, I had a discussion with Adrian about the same issue earlier on on IRC.
>> You make excellent points - as Adrian did :)
>> Here's my take
>>
>> > - cleaner design: no peeking into other object's privates
>> Yes. At the same time the KBuf::Buf class (the "other object") is a
>> private member class of the KBuf class;
>> it's actually little more than a glorified struct, and shouldn't be
>> thought as a first-level citizen on its own.
>
> Private or not, it is still another object.
>
> And, FWIW, I doubt the memory buffer class will remain inside the string
> class.

I'm not planning to take it out, and if noone else does...

>> > - easier to change memory buffer internals
>> If you mean "change the buffer contents", that's an operation which
>> should be quite rare.
>> If you mean "change the code" that should be even rarer.
>
> I meant "change the code". I do expect those changes in the foreseeable
> future.
>
>> > - easier to support several buffer types with different internals
>> I didn't really think of different buffer types. Do you have in mind
>> any scenario where it would be useful?
>
> Yes, I do (e.g., small versus large, thread-safe versus not, and
> contiguous versus chunked).

Chunks are out-of-scope, they are better dealt by a KBufList class.
Honestly the only argument I buy 100% is easier thread-safety.

> In fact, you kind of documented different buffer implementations
> yourself: "small Bufs (<8Kb) should be managed by MemPools. - Bufs
> bigger than 8Kb should be allocated in sizes compatible with the system
> page size"

The "magic" is in the allocation strategy function not for the Buf
itself, but for its underlying storage.
We want to allocate a bit more than strictly needed to be able to grow
but not too much.
Quoting:

        void init(size_t size)
        {
            pagesize=sysconf(_SC_PAGESIZE); //FIXME: make this autoconf-based
            size_t actualsize=(size*12)/10; //FIXME: make self-tuning
using nreallocs
            // arbitrary allocation algorithm
            if (actualsize <= 64)
                actualsize=64+malloc_overhead;
            else if (actualsize <= 64*pagesize)
                actualsize=nearestPowerOf2(actualsize);
            else
                actualsize+=actualsize%pagesize; //increase to closest pagesize
            //shrink: we WANT to fit in a page
            actualsize-=malloc_overhead;
            mem=new char[actualsize];
            bufsize=actualsize;
            bufused=0;
            refs=1;
            debugs("Buf@" <<(void *)this <<"::init(). req:"<<size<<",gave:"
                << actualsize);
            stats.live++;
        }

Different strategies are of course possible, including defining a few
MemPools-backed sizes (1k, 2k, 4k, more than that it's probably not
wise)

>> > - easier to support re-allocation of buffer memory
>> > - easier to provide a thread-safe implementation.
>>
>> On the other hand, char* are significantly more efficient for common
>> operations, consistently with the design goals..
>
> I do not think an offset would be significantly less efficient in this
> context. I bet 90+% of operations that require raw data access are far
> more expensive than adding an offset to a pointer.

The most common one is a NULL check, which is hard to express using
(offset/length).
Further extra work would be the need of more temporary storage to
rebuild the char* out of (memhandle->mem + offset).
Extremely small details, but I expect they'll be very common
operations (I bumped the size of the stat-counters to 64bits).

>> I'm not saying that I won't change them, I'd just like to be shown
>> scenarios where it makes a difference.
>
> I believe I provided more than enough reasons and you agreed with at
> least some of them. You have provided one so far ("significantly more
> efficient for common operations"). I think the burden of proof should be
> on you in this case.

What I can (and will) do is try and see if and how much these changes
would complicate the code.
In the meantime, I ask everyone who can spare 5 minutes to just grab
the code off launchapd and see it live ("make" will compile and launch
a builtin testsuite/demo, "make dox" will generate the class reference
- the code is extensively documented).

-- 
 /kinkie

Received on Wed Sep 03 2008 - 22:12:59 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 04 2008 - 12:00:04 MDT