Regarding the understanding of the malloc(3) code

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding the understanding of the malloc(3) code

neerajpal
Hi there,

I am reading and learning the internals of malloc(3).
So, after compiling the debug version of libc and using it for one
basic sample code for malloc(3).

Not able to understand some parts of the following code snippet:

void
_malloc_init(int from_rthreads)
{
    u_int i, nmutexes;
    struct dir_info *d;

    _MALLOC_LOCK(1);
    if (!from_rthreads && mopts.malloc_pool[1]) {
        _MALLOC_UNLOCK(1);
        return;
    }
    if (!mopts.malloc_canary)
        omalloc_init();

    nmutexes = from_rthreads ? mopts.malloc_mutexes : 2;
    if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
        mprotect(&malloc_readonly, sizeof(malloc_readonly),
            PROT_READ | PROT_WRITE);
    for (i = 0; i < nmutexes; i++) {
        if (mopts.malloc_pool[i])
            continue;
        if (i == 0) {
            omalloc_poolinit(&d, MAP_CONCEAL);
            d->malloc_junk = 2;
            d->malloc_cache = 0;
        } else {
            omalloc_poolinit(&d, 0);
            d->malloc_junk = mopts.def_malloc_junk;
            d->malloc_cache = mopts.def_malloc_cache;
        }
        d->mutex = i;
        mopts.malloc_pool[i] = d;
    }

    if (from_rthreads)
        mopts.malloc_mt = 1;
    else
        mopts.internal_funcs = 1;

    /*
     * Options have been set and will never be reset.
     * Prevent further tampering with them.
     */
    if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
        mprotect(&malloc_readonly, sizeof(malloc_readonly), PROT_READ);
    _MALLOC_UNLOCK(1);
}

In the above code snippet, could some please through some light on the
following queries
1. Use of nmutexes?
2. And, why it is looping till nmutexes and calls function
omalloc_poolinit(&d, MAP_CONCEAL) /* when i == 0*/
and other calls to omalloc_poolinit(&d, 0) /* when i != 0 */
So, suppose in the case of nmutexes = 2, I am not sure where are the
uses of these 3 initialized pools, that is, malloc_pool[0],
malloc_pool[1] and malloc_pool[2]?


static void
omalloc_poolinit(struct dir_info **dp, int mmap_flag)
{
    char *p;
    size_t d_avail, regioninfo_size;
    struct dir_info *d;
    int i, j;

    /*
     * Allocate dir_info with a guard page on either side. Also
     * randomise offset inside the page at which the dir_info
     * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
     */
    if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
        MAP_FAILED)
        wrterror(NULL, "malloc init mmap failed");
    mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
    d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
    d = (struct dir_info *)(p + MALLOC_PAGESIZE +
        (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));

    rbytes_init(d);
    d->regions_free = d->regions_total = MALLOC_INITIAL_REGIONS;
    regioninfo_size = d->regions_total * sizeof(struct region_info);
    d->r = MMAP(regioninfo_size, mmap_flag);
    if (d->r == MAP_FAILED) {
        d->regions_total = 0;
        wrterror(NULL, "malloc init mmap failed");
    }
    for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
        LIST_INIT(&d->chunk_info_list[i]);
        for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
            LIST_INIT(&d->chunk_dir[i][j]);
...
...
...
In the above code, inside function omalloc_poolinit(), first, it will
allocate dir_info structure with a guard page on *both sides* like
[guard page] <dir_info> [guard page]?

And, why it is initializing the list  chunk_info_list to 32 times and
chunk_dir to 64 times, that is chunk_info_list[0...31] and
chunk_dir[0...31][0...31]?


Could someone please provide some hints on the above queries?

Regards,
Neeraj

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Tue, Mar 10, 2020 at 03:04:00AM +0530, Neeraj Pal wrote:

> Hi there,
>
> I am reading and learning the internals of malloc(3).
> So, after compiling the debug version of libc and using it for one
> basic sample code for malloc(3).
>
> Not able to understand some parts of the following code snippet:
>
> void
> _malloc_init(int from_rthreads)
> {
>     u_int i, nmutexes;
>     struct dir_info *d;
>
>     _MALLOC_LOCK(1);
>     if (!from_rthreads && mopts.malloc_pool[1]) {
>         _MALLOC_UNLOCK(1);
>         return;
>     }
>     if (!mopts.malloc_canary)
>         omalloc_init();
>
>     nmutexes = from_rthreads ? mopts.malloc_mutexes : 2;
>     if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
>         mprotect(&malloc_readonly, sizeof(malloc_readonly),
>             PROT_READ | PROT_WRITE);
>     for (i = 0; i < nmutexes; i++) {
>         if (mopts.malloc_pool[i])
>             continue;
>         if (i == 0) {
>             omalloc_poolinit(&d, MAP_CONCEAL);
>             d->malloc_junk = 2;
>             d->malloc_cache = 0;
>         } else {
>             omalloc_poolinit(&d, 0);
>             d->malloc_junk = mopts.def_malloc_junk;
>             d->malloc_cache = mopts.def_malloc_cache;
>         }
>         d->mutex = i;
>         mopts.malloc_pool[i] = d;
>     }
>
>     if (from_rthreads)
>         mopts.malloc_mt = 1;
>     else
>         mopts.internal_funcs = 1;
>
>     /*
>      * Options have been set and will never be reset.
>      * Prevent further tampering with them.
>      */
>     if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
>         mprotect(&malloc_readonly, sizeof(malloc_readonly), PROT_READ);
>     _MALLOC_UNLOCK(1);
> }
>
> In the above code snippet, could some please through some light on the
> following queries
> 1. Use of nmutexes?
> 2. And, why it is looping till nmutexes and calls function
> omalloc_poolinit(&d, MAP_CONCEAL) /* when i == 0*/
> and other calls to omalloc_poolinit(&d, 0) /* when i != 0 */
> So, suppose in the case of nmutexes = 2, I am not sure where are the
> uses of these 3 initialized pools, that is, malloc_pool[0],
> malloc_pool[1] and malloc_pool[2]?

There's an off by one in your question :-)

Fo single threaded programs, two malloc_dir pools are maintained.
One for MAP_CONCEALED memory (#0) and one for regular (#1).
For multi-threaded porgram more pools are created. This is to avoid contention,
accesses to diffrent pools can run concurently.

>
>
> static void
> omalloc_poolinit(struct dir_info **dp, int mmap_flag)
> {
>     char *p;
>     size_t d_avail, regioninfo_size;
>     struct dir_info *d;
>     int i, j;
>
>     /*
>      * Allocate dir_info with a guard page on either side. Also
>      * randomise offset inside the page at which the dir_info
>      * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
>      */
>     if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
>         MAP_FAILED)
>         wrterror(NULL, "malloc init mmap failed");
>     mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
>     d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
>     d = (struct dir_info *)(p + MALLOC_PAGESIZE +
>         (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
>
>     rbytes_init(d);
>     d->regions_free = d->regions_total = MALLOC_INITIAL_REGIONS;
>     regioninfo_size = d->regions_total * sizeof(struct region_info);
>     d->r = MMAP(regioninfo_size, mmap_flag);
>     if (d->r == MAP_FAILED) {
>         d->regions_total = 0;
>         wrterror(NULL, "malloc init mmap failed");
>     }
>     for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
>         LIST_INIT(&d->chunk_info_list[i]);
>         for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
>             LIST_INIT(&d->chunk_dir[i][j]);
> ...
> ...
> ...
> In the above code, inside function omalloc_poolinit(), first, it will
> allocate dir_info structure with a guard page on *both sides* like
> [guard page] <dir_info> [guard page]?

yes. That way both underflow and oveflow has a chance to be caught.

>
> And, why it is initializing the list  chunk_info_list to 32 times and
> chunk_dir to 64 times, that is chunk_info_list[0...31] and
> chunk_dir[0...31][0...31]?

The second index of chunk_dir has size MALLOC_CHUNK_LISTS which is 4,
not 32.

More than one list of free chunk pages per chunk size is maintained to
allow for more randomization.

        -Otto


>
> Could someone please provide some hints on the above queries?
>
> Regards,
> Neeraj
>

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Tue, Mar 10, 2020 at 4:03 PM Otto Moerbeek <[hidden email]> wrote:
> There's an off by one in your question :-)
Yeah, sorry about that, actually in flow of writing the mail forgot to notice.

> Fo single threaded programs, two malloc_dir pools are maintained.
> One for MAP_CONCEALED memory (#0) and one for regular (#1).
> For multi-threaded porgram more pools are created. This is to avoid contention,
> accesses to diffrent pools can run concurently.
okay, thanks for the information. So, likewise, for multi threaded
applications, by default the malloc_mutexes is 8, (#0 for
MAP_CONCEALED and other 7 for regular) as mentioned in the below code:

static void
omalloc_init(void)
{
char *p, *q, b[16];
int i, j, mib[2];
size_t sb;
/*
* Default options
*/
mopts.malloc_mutexes = 8;
mopts.def_malloc_junk = 1;
...
...
...



> yes. That way both underflow and oveflow has a chance to be caught.
yeah, it's good. but I am not sure about it from the code. I mean from
the below code snippet it seems that by default (means vm.malloc_conf
!= G) it is not on both sides?
 <guard page> dir_info

static void
omalloc_poolinit(struct dir_info **dp, int mmap_flag)
{
char *p;
size_t d_avail, regioninfo_size;
struct dir_info *d;
int i, j;
/*
* Allocate dir_info with a guard page on either side. Also
* randomise offset inside the page at which the dir_info
* lies (subject to alignment by 1 << MALLOC_MINSHIFT)
*/
if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
MAP_FAILED)
wrterror(NULL, "malloc init mmap failed");
mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
d = (struct dir_info *)(p + MALLOC_PAGESIZE +
(arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
...
...
...

From the above code, my observations are
sizeof(*d) = 4824
MALLOC_PAGEMASK = 4095
DIR_INFO_RSZ = (4284 + 4095) & ~4095 = 8192

Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384
then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ - 1
d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210

Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)

where d is the randomized offset inside the page at which dir_info lies,
So, lets suppose p is 1000 then 1000 + 4096 + (100 << 4) , then d will be 6696.
So, it means [p + MALLOC_PAGESIZE] can be treated as guard page before
dir_info offset and if yes then after that there is no guard page by
default, right?


> The second index of chunk_dir has size MALLOC_CHUNK_LISTS which is 4,
> not 32.
Yeah, sorry for incorrect values.
> More than one list of free chunk pages per chunk size is maintained to
> allow for more randomization.
Okay, so in short it means below code will create 12 chunk_info_list
where i is 0 to 11 and for each and every ith index there is j, so as
per that,
chunk_dir[0][0]
chunk_dir[0][1]
chunk_dir[0][2]
chunk_dir[0][3]
...
...
...
chunk_dir[11][0]
chunk_dir[11][1]
chunk_dir[11][2]
chunk_dir[11][3]

...
...
...
for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
LIST_INIT(&d->chunk_info_list[i]);
    for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
        LIST_INIT(&d->chunk_dir[i][j]);
...
....
....
So, these many lists simply means it allows more randomization,
wherever it is used, like also in case of allocating chunk using
omalloc_make_chunks() in malloc_bytes()
...
...
...
j = find_chunksize(size);
r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
listnum = r % MALLOC_CHUNK_LISTS;
/* If it's empty, make a page more of that size chunks */
if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
bp = omalloc_make_chunks(d, j, listnum);
if (bp == NULL)
return NULL;
}
...
...
...
And, then also it means that it maybe possible that one can increase
the MALLOC_MAXSHIFT and MALLOC_CHUNK_LISTS to increase more
randomization?

Also, may I know the use of structure "struct region_info", is it used
to keep track of mmap’ed regions by storing their address and size
into a hash table as mentioned in the
https://www.openbsd.com.au/papers/eurobsdcon2009/otto-malloc.pdf


Please confirm whether my understanding is correct or not.

Regards,
Neeraj

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Fri, Mar 13, 2020 at 03:43:21AM +0530, Neeraj Pal wrote:

> On Tue, Mar 10, 2020 at 4:03 PM Otto Moerbeek <[hidden email]> wrote:
> > There's an off by one in your question :-)
> Yeah, sorry about that, actually in flow of writing the mail forgot to notice.
>
> > Fo single threaded programs, two malloc_dir pools are maintained.
> > One for MAP_CONCEALED memory (#0) and one for regular (#1).
> > For multi-threaded porgram more pools are created. This is to avoid contention,
> > accesses to diffrent pools can run concurently.
> okay, thanks for the information. So, likewise, for multi threaded
> applications, by default the malloc_mutexes is 8, (#0 for
> MAP_CONCEALED and other 7 for regular) as mentioned in the below code:

Please indent your code snippets.

>
> static void
> omalloc_init(void)
> {
> char *p, *q, b[16];
> int i, j, mib[2];
> size_t sb;
> /*
> * Default options
> */
> mopts.malloc_mutexes = 8;
> mopts.def_malloc_junk = 1;
> ...
> ...
> ...
>
>
>
> > yes. That way both underflow and oveflow has a chance to be caught.
> yeah, it's good. but I am not sure about it from the code. I mean from
> the below code snippet it seems that by default (means vm.malloc_conf
> != G) it is not on both sides?
>  <guard page> dir_info

di_info is special. Having a guard page on both sides for regular
allocation can be done, but would waste more pages. Note that
allocations are already spread thrrougout the address space, so it is
very likely that an allocation is surrounded by unmapped pages.

>
> static void
> omalloc_poolinit(struct dir_info **dp, int mmap_flag)
> {
> char *p;
> size_t d_avail, regioninfo_size;
> struct dir_info *d;
> int i, j;
> /*
> * Allocate dir_info with a guard page on either side. Also
> * randomise offset inside the page at which the dir_info
> * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
> */
> if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
> MAP_FAILED)
> wrterror(NULL, "malloc init mmap failed");
> mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
> d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
> d = (struct dir_info *)(p + MALLOC_PAGESIZE +
> (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
> ...
> ...
> ...
>
> From the above code, my observations are
> sizeof(*d) = 4824
> MALLOC_PAGEMASK = 4095
> DIR_INFO_RSZ = (4284 + 4095) & ~4095 = 8192

We need two pages to store dir_info.

>
> Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384

We allocate 4 pages prot none.

> then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ - 1

the two middle pages are r/w.

> d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
>
> Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)

di_info ends up on an aligned address somewhere in the middle pages on
an offset between 0 and (210<<4) = 0..3360, counting from the start of
the two middle pages.

>
> where d is the randomized offset inside the page at which dir_info lies,
> So, lets suppose p is 1000 then 1000 + 4096 + (100 << 4) , then d will be 6696.
> So, it means [p + MALLOC_PAGESIZE] can be treated as guard page before
> dir_info offset and if yes then after that there is no guard page by
> default, right?

No, there will be a guard page on each side.

>
>
> > The second index of chunk_dir has size MALLOC_CHUNK_LISTS which is 4,
> > not 32.
> Yeah, sorry for incorrect values.
> > More than one list of free chunk pages per chunk size is maintained to
> > allow for more randomization.
> Okay, so in short it means below code will create 12 chunk_info_list
> where i is 0 to 11 and for each and every ith index there is j, so as
> per that,
> chunk_dir[0][0]
> chunk_dir[0][1]
> chunk_dir[0][2]
> chunk_dir[0][3]
> ...
> ...
> ...
> chunk_dir[11][0]
> chunk_dir[11][1]
> chunk_dir[11][2]
> chunk_dir[11][3]
>
> ...
> ...
> ...
> for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
> LIST_INIT(&d->chunk_info_list[i]);
>     for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
>         LIST_INIT(&d->chunk_dir[i][j]);
> ...
> ....
> ....
> So, these many lists simply means it allows more randomization,
> wherever it is used, like also in case of allocating chunk using
> omalloc_make_chunks() in malloc_bytes()
> ...
> ...
> ...
> j = find_chunksize(size);
> r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> listnum = r % MALLOC_CHUNK_LISTS;
> /* If it's empty, make a page more of that size chunks */
> if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
> bp = omalloc_make_chunks(d, j, listnum);
> if (bp == NULL)
> return NULL;
> }
> ...
> ...
> ...
> And, then also it means that it maybe possible that one can increase
> the MALLOC_MAXSHIFT and MALLOC_CHUNK_LISTS to increase more
> randomization?

MALLOC_CHUNK_LISTS could be increased at the cost of overhead.
MALLOC_MAXSHIFT cannot, it is the shift of the max chunk size that fits in
a page.


>
> Also, may I know the use of structure "struct region_info", is it used
> to keep track of mmap’ed regions by storing their address and size
> into a hash table as mentioned in the
> https://www.openbsd.com.au/papers/eurobsdcon2009/otto-malloc.pdf

yes.

>
>
> Please confirm whether my understanding is correct or not.
>
> Regards,
> Neeraj
>

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Fri, Mar 13, 2020, at 11:45 AM Otto Moerbeek <[hidden email]> wrote:
>
> Please indent your code snippets.
yeah, my apologies. I shall indent the code snippets.

>
> di_info is special. Having a guard page on both sides for regular
> allocation can be done, but would waste more pages. Note that
> allocations are already spread throughout the address space, so it is
> very likely that an allocation is surrounded by unmapped pages.
>
Okay, So, as from omalloc_poolinit() code it is not there, but we can do
but it will be wastage of one-page memory and also entire address space is
spread with unmapped pages. So, it is very likely that there will some
unmapped page beside dir_info in the memory.

>
> We need two pages to store dir_info.
>
>
> > Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384
>
> We allocate 4 pages prot none.
>
> > then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ
- 1

>
> the two middle pages are r/w.
>
> > d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
> >
> > Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)
>
> di_info ends up on an aligned address somewhere in the middle pages on
> an offset between 0 and (210<<4) = 0..3360, counting from the start of
> the two middle pages.
Thank you for the information. So, it is like [(p + MALLOC_PAGESIZE) +
(0..3360)]. It is kind of like array, For example, let's suppose, x = p +
MALLOC_PAGESIZE. So, it will be x[0..3360].
Am I right?
>
> MALLOC_CHUNK_LISTS could be increased at the cost of overhead.
> MALLOC_MAXSHIFT cannot, it is the shift of the max chunk size that fits in
> a page.
Okay. I understood. And, yeah more randomization means more cost overhead.

Thank you, Otto, for your detailed information.

Please find the code below:

948 /*
949 * Allocate a chunk
950 */
951 static void *
952 malloc_bytes(struct dir_info *d, size_t size, void *f)
953 {
954 u_int i, r;
955 int j, listnum;
956 size_t k;
957 u_short *lp;
958 struct chunk_info *bp;
959 void *p;
960
961 if (mopts.malloc_canary != (d->canary1 ^ (u_int32_t)(uintptr_t)d) ||
962    d->canary1 != ~d->canary2)
963 wrterror(d, "internal struct corrupt");
964
965 j = find_chunksize(size);
966
967 r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
968 listnum = r % MALLOC_CHUNK_LISTS;
969 /* If it's empty, make a page more of that size chunks */
970 if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
971 bp = omalloc_make_chunks(d, j, listnum);
972 if (bp == NULL)
973 return NULL;
974 }
975
976 if (bp->canary != (u_short)d->canary1)
977 wrterror(d, "chunk info corrupted");


Here, in the code mentioned above, we can see that on line 961 and
line 976. I don't understand why it is checking for
canaries of malloc_readonly with d and then allocated chunk bp with d,
because I have seen that validation of canary
happens in free(3) not in malloc(3). So, it is like there may be some
cases where one can corrupt these also??


978
979 i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
980
981 /* start somewhere in a short */
982 lp = &bp->bits[i / MALLOC_BITS];
983 if (*lp) {
984 j = i % MALLOC_BITS;
985 k = ffs(*lp >> j);
986 if (k != 0) {
987 k += j - 1;
988 goto found;
989 }
990 }
991 /* no bit halfway, go to next full short */
992 i /= MALLOC_BITS;
993 for (;;) {
994 if (++i >= bp->total / MALLOC_BITS)
995 i = 0;
996 lp = &bp->bits[i];
997 if (*lp) {
998 k = ffs(*lp) - 1;
999 break;
1000 }
1001 }
1002 found:
1003 #ifdef MALLOC_STATS
1004 if (i == 0 && k == 0) {
1005 struct region_info *r = find(d, bp->page);
1006 r->f = f;
1007 }
1008 #endif
1009
1010 *lp ^= 1 << k;
1011
1012 /* If there are no more free, remove from free-list */
1013 if (--bp->free == 0)
1014 LIST_REMOVE(bp, entries);
1015
1016 /* Adjust to the real offset of that chunk */
1017 k += (lp - bp->bits) * MALLOC_BITS;
1018
1019 if (mopts.chunk_canaries && size > 0)
1020 bp->bits[bp->offset + k] = size;
1021
1022 k <<= bp->shift;
1023
1024 p = (char *)bp->page + k;
1025 if (bp->size > 0) {
1026 if (d->malloc_junk == 2)
1027 memset(p, SOME_JUNK, bp->size);
1028 else if (mopts.chunk_canaries)
1029 fill_canary(p, size, bp->size);
1030 }
1031 return p;
1032 }

And, the calculations above, is it for calculating the offset in the page,
that is, k. Because I have seen that during free(3) operations, it again
calculates this offset to find the same. So, is it doing only offset
calculation?

Please confirm.

Regards,
Neeraj
Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Wed, Mar 18, 2020 at 07:29:51AM +0530, Neeraj Pal wrote:

> On Fri, Mar 13, 2020, at 11:45 AM Otto Moerbeek <[hidden email]> wrote:
> >
> > Please indent your code snippets.
> yeah, my apologies. I shall indent the code snippets.
>
> >
> > di_info is special. Having a guard page on both sides for regular
> > allocation can be done, but would waste more pages. Note that
> > allocations are already spread throughout the address space, so it is
> > very likely that an allocation is surrounded by unmapped pages.
> >
> Okay, So, as from omalloc_poolinit() code it is not there, but we can do
> but it will be wastage of one-page memory and also entire address space is
> spread with unmapped pages. So, it is very likely that there will some
> unmapped page beside dir_info in the memory.
>
> >
> > We need two pages to store dir_info.
> >
> >
> > > Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384
> >
> > We allocate 4 pages prot none.
> >
> > > then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ
> - 1
> >
> > the two middle pages are r/w.
> >
> > > d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
> > >
> > > Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)
> >
> > di_info ends up on an aligned address somewhere in the middle pages on
> > an offset between 0 and (210<<4) = 0..3360, counting from the start of
> > the two middle pages.
> Thank you for the information. So, it is like [(p + MALLOC_PAGESIZE) +
> (0..3360)]. It is kind of like array, For example, let's suppose, x = p +
> MALLOC_PAGESIZE. So, it will be x[0..3360].
> Am I right?
> >
> > MALLOC_CHUNK_LISTS could be increased at the cost of overhead.
> > MALLOC_MAXSHIFT cannot, it is the shift of the max chunk size that fits in
> > a page.
> Okay. I understood. And, yeah more randomization means more cost overhead.
>
> Thank you, Otto, for your detailed information.
>
> Please find the code below:
>
> 948 /*
> 949 * Allocate a chunk
> 950 */
> 951 static void *
> 952 malloc_bytes(struct dir_info *d, size_t size, void *f)
> 953 {
> 954 u_int i, r;
> 955 int j, listnum;
> 956 size_t k;
> 957 u_short *lp;
> 958 struct chunk_info *bp;
> 959 void *p;
> 960
> 961 if (mopts.malloc_canary != (d->canary1 ^ (u_int32_t)(uintptr_t)d) ||
> 962    d->canary1 != ~d->canary2)
> 963 wrterror(d, "internal struct corrupt");
> 964
> 965 j = find_chunksize(size);
> 966
> 967 r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> 968 listnum = r % MALLOC_CHUNK_LISTS;
> 969 /* If it's empty, make a page more of that size chunks */
> 970 if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
> 971 bp = omalloc_make_chunks(d, j, listnum);
> 972 if (bp == NULL)
> 973 return NULL;
> 974 }
> 975
> 976 if (bp->canary != (u_short)d->canary1)
> 977 wrterror(d, "chunk info corrupted");
>
>
> Here, in the code mentioned above, we can see that on line 961 and
> line 976. I don't understand why it is checking for
> canaries of malloc_readonly with d and then allocated chunk bp with d,
> because I have seen that validation of canary
> happens in free(3) not in malloc(3). So, it is like there may be some
> cases where one can corrupt these also??

There are several types of canaries. They try to detect corruption of
various meta data structures. There are alo canaries for user allocated
data, they are enabled with the C option.

>
>
> 978
> 979 i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
> 980
> 981 /* start somewhere in a short */
> 982 lp = &bp->bits[i / MALLOC_BITS];
> 983 if (*lp) {
> 984 j = i % MALLOC_BITS;
> 985 k = ffs(*lp >> j);
> 986 if (k != 0) {
> 987 k += j - 1;
> 988 goto found;
> 989 }
> 990 }
> 991 /* no bit halfway, go to next full short */
> 992 i /= MALLOC_BITS;
> 993 for (;;) {
> 994 if (++i >= bp->total / MALLOC_BITS)
> 995 i = 0;
> 996 lp = &bp->bits[i];
> 997 if (*lp) {
> 998 k = ffs(*lp) - 1;
> 999 break;
> 1000 }
> 1001 }
> 1002 found:
> 1003 #ifdef MALLOC_STATS
> 1004 if (i == 0 && k == 0) {
> 1005 struct region_info *r = find(d, bp->page);
> 1006 r->f = f;
> 1007 }
> 1008 #endif
> 1009
> 1010 *lp ^= 1 << k;
> 1011
> 1012 /* If there are no more free, remove from free-list */
> 1013 if (--bp->free == 0)
> 1014 LIST_REMOVE(bp, entries);
> 1015
> 1016 /* Adjust to the real offset of that chunk */
> 1017 k += (lp - bp->bits) * MALLOC_BITS;
> 1018
> 1019 if (mopts.chunk_canaries && size > 0)
> 1020 bp->bits[bp->offset + k] = size;
> 1021
> 1022 k <<= bp->shift;
> 1023
> 1024 p = (char *)bp->page + k;
> 1025 if (bp->size > 0) {
> 1026 if (d->malloc_junk == 2)
> 1027 memset(p, SOME_JUNK, bp->size);
> 1028 else if (mopts.chunk_canaries)
> 1029 fill_canary(p, size, bp->size);
> 1030 }
> 1031 return p;
> 1032 }
>
> And, the calculations above, is it for calculating the offset in the page,
> that is, k. Because I have seen that during free(3) operations, it again
> calculates this offset to find the same. So, is it doing only offset
> calculation?

In general addding an int to a pointer calculates an offset, so yes.
Study whats the role of p and k is. Let the code speak. If you fail to
understand parts, study further and play with it. You'll learn more
from that than asking for confirmation all the time.

        -Otto
>
> Please confirm.
>
> Regards,
> Neeraj

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Wed, 18 Mar, 2020, 12:46 pm Otto Moerbeek, <[hidden email]> wrote:

> There are several types of canaries. They try to detect corruption of
> various meta data structures. There are alo canaries for user allocated
> data, they are enabled with the C option.
>
Yeah, I am using option C through sysctl(8) to understand the Canary
concept. And understood the user controlled part, that validates during
free(3).

But I was thinking the idea behind the writing the canary checks, I know
canary is something means random cookie which we usually places to detect
overflow/underflow related vulns. So, I thought if there is now way one can
corrupt the metadata then is it possible to remove them as may be it will
improve some performance. But I don't have the idea or main reason for the
same. So I maybe wrong. So, that's why I asked.

>
> In general addding an int to a pointer calculates an offset, so yes.
>
Yeah, I understood.

Study whats the role of p and k is. Let the code speak. If you fail to
> understand parts, study further and play with it. You'll learn more
> from that than asking for confirmation all the time.

Yeah sure. I understood basic idea about those calculations for p and k.
That, p is page, here, and k is the offset for the chunk on the page.  I
think the whole calculations related to that. But due to lots of
mathematical operations not able to understood some parts,maybe I have to
read and understand it again and again.

Actually, after compiling libc with debug symbols, I have written one basic
sample code and debugging it though gdb and reading the source code side by
side.

I am daily learning something new from reading the malloc(3) code. But
sometimes I am not able to relate or match those thoughts that I got from
reading codes with the thoughts of the developer that he has while
development. So that's why I have asked about my understanding from
developer's point of view.

Thank you for resolving my queries :)

Regards
Neeraj
Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Wed, Mar 18, 2020 at 03:35:45PM +0530, Neeraj Pal wrote:

> On Wed, 18 Mar, 2020, 12:46 pm Otto Moerbeek, <[hidden email]> wrote:
>
> > There are several types of canaries. They try to detect corruption of
> > various meta data structures. There are alo canaries for user allocated
> > data, they are enabled with the C option.
> >
> Yeah, I am using option C through sysctl(8) to understand the Canary
> concept. And understood the user controlled part, that validates during
> free(3).
>
> But I was thinking the idea behind the writing the canary checks, I know
> canary is something means random cookie which we usually places to detect
> overflow/underflow related vulns. So, I thought if there is now way one can
> corrupt the metadata then is it possible to remove them as may be it will
> improve some performance. But I don't have the idea or main reason for the
> same. So I maybe wrong. So, that's why I asked.

Not all meta-data canaries live in r/o memory.

>
> >
> > In general addding an int to a pointer calculates an offset, so yes.
> >
> Yeah, I understood.
>
> Study whats the role of p and k is. Let the code speak. If you fail to
> > understand parts, study further and play with it. You'll learn more
> > from that than asking for confirmation all the time.
>
> Yeah sure. I understood basic idea about those calculations for p and k.
> That, p is page, here, and k is the offset for the chunk on the page.  I
> think the whole calculations related to that. But due to lots of
> mathematical operations not able to understood some parts,maybe I have to
> read and understand it again and again.
>
> Actually, after compiling libc with debug symbols, I have written one basic
> sample code and debugging it though gdb and reading the source code side by
> side.
>
> I am daily learning something new from reading the malloc(3) code. But
> sometimes I am not able to relate or match those thoughts that I got from
> reading codes with the thoughts of the developer that he has while
> development. So that's why I have asked about my understanding from
> developer's point of view.
>
> Thank you for resolving my queries :)
>
> Regards
> Neeraj

A thing that also helps is to follow the cvs history of a file. The
first version of my malloc (form 2008) was more simple, and looking at
the diffs through the years gives you great hints at what features
were added over the years, plus a few bugfixes.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Wed, Mar 18, 2020 at 4:01 PM Otto Moerbeek <[hidden email]> wrote:

> Not all meta-data canaries live in r/o memory.
>

> A thing that also helps is to follow the cvs history of a file. The
> first version of my malloc (form 2008) was more simple, and looking at
> the diffs through the years gives you great hints at what features
> were added over the years, plus a few bugfixes.
>

Thanks for the tip. I have gone through GitHub history and also read CVS
history. It was very helpful. Also got some information from the pkhmalloc
code and some old papers.

And, I have read and tried to understand the malloc internals. So, I have
shared whatever I have learned on the blog but I haven't published it yet.

I have sent the link to you @drijf.net for review as I don't want to convey
any wrong understanding or misguiding the people. So, requesting you to
review it whenever you have time.

Thanks and regards,
Neeraj
Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
Hi Otto,

I am having two small issues or confusions:

First Query:

 885 /*
 886  * Allocate a page of chunks
 887  */
 888 static struct chunk_info *
 889 omalloc_make_chunks(struct dir_info *d, int bits, int listnum)
 890 {
 891         struct chunk_info *bp;
 892         void *pp;
 893
 894         /* Allocate a new bucket */
 895         pp = map(d, NULL, MALLOC_PAGESIZE, 0);
 896         if (pp == MAP_FAILED)
 897                 return NULL;
 898
 899         /* memory protect the page allocated in the malloc(0) case */
 900         if (bits == 0 && mprotect(pp, MALLOC_PAGESIZE, PROT_NONE) ==
-1)
 901                 goto err;
 902
 903         bp = alloc_chunk_info(d, bits);
 904         if (bp == NULL)
 905                 goto err;
 906         bp->page = pp;
 907
 908         if (insert(d, (void *)((uintptr_t)pp | (bits + 1)),
(uintptr_t)bp,
 909             NULL))
 910                 goto err;
 911         LIST_INSERT_HEAD(&d->chunk_dir[bits][listnum], bp, entries);
 912         return bp;
 913
 914 err:
 915         unmap(d, pp, MALLOC_PAGESIZE, 0, d->malloc_junk);
 916         return NULL;
 917 }

So, actually, as per the comment on line no. 885 in the above code, it is
mentioned that it will allocate a page of chunks. But, what I have observed
as from the code that pp is the new bucket means pp is the page which is
full of chunks or which has chunks. As we can see on line 905, it calls
alloc_chunk_info() function then inside that it calls init_chunk_info(),
so, in short, we can say that first, it will allocate some chunk and then
initialized that allocated chunk and then returns the same.

So, if we compare from here then it means, bp is the allocated and
initialized chunk and the bp->page = pp, means it stores the page pp to
bp->page. Then, after the hash table, it returns bp, means it returns the
allocated-initialized chunk. But at the same time, I was referring the
https://junk.tintagel.pl/openbsd-daily-malloc-3.txt by @mulander where he
mentioned that "so bp is a page of chunks". So, I became little confused
because pp is the page of chunks, which is used in function malloc_bytes()
where it calculates the page offset and adds it to page bp->page, which is
used by the user to input or writes stuff, like it returns address bp->page
+ k addr returns by malloc(3).

Second Query:

And, bp->page is the pp and k is the offset, so, is it possible to get the
address of the specific chunk because (bp->page + k) belongs to some chunks
or we can say k is the index of the chunk that is inside the bucket
bp->page or pp?


And in the structure chunk_info, u_short bits[1] is the bit for tracking
whether the chunk is free or not. So, it belongs to each and every chunk.
For example, there are 10 chunks in a page and 5fth chunk is not free then
it will set that bits[1] to 0 and other 9 will be 1.

OR

Is it like bits denote the bits, like in the function init_chunk_info, in
the end, it copies 0xff bytes to p->bits with size 30. Then it calculates
p->bits[15] = 65535, so it is like it makes the last bit to 1.

sometimes I also have things in my mind like if bits[1] then how it is
possible to assignbits[15] it means it performs the operation on bits. that
I have analyzed by debugging.

I have referred the paper for the understanding of bits[1] value,
http://www.ouah.org/BSD-heap-smashing.txt. Actually not have proper
confidence of understanding on bits logic.

Apart from the issues discussed above, I mostly understood most of the
stuff on malloc(3) but for the above still not getting the convinsible
understainding.


Regards,
Neeraj
Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Wed, Mar 25, 2020 at 01:54:51AM +0530, Neeraj Pal wrote:

> Hi Otto,
>
> I am having two small issues or confusions:
>
> First Query:
>
>  885 /*
>  886  * Allocate a page of chunks
>  887  */
>  888 static struct chunk_info *
>  889 omalloc_make_chunks(struct dir_info *d, int bits, int listnum)
>  890 {
>  891         struct chunk_info *bp;
>  892         void *pp;
>  893
>  894         /* Allocate a new bucket */
>  895         pp = map(d, NULL, MALLOC_PAGESIZE, 0);
>  896         if (pp == MAP_FAILED)
>  897                 return NULL;
>  898
>  899         /* memory protect the page allocated in the malloc(0) case */
>  900         if (bits == 0 && mprotect(pp, MALLOC_PAGESIZE, PROT_NONE) ==
> -1)
>  901                 goto err;
>  902
>  903         bp = alloc_chunk_info(d, bits);
>  904         if (bp == NULL)
>  905                 goto err;
>  906         bp->page = pp;
>  907
>  908         if (insert(d, (void *)((uintptr_t)pp | (bits + 1)),
> (uintptr_t)bp,
>  909             NULL))
>  910                 goto err;
>  911         LIST_INSERT_HEAD(&d->chunk_dir[bits][listnum], bp, entries);
>  912         return bp;
>  913
>  914 err:
>  915         unmap(d, pp, MALLOC_PAGESIZE, 0, d->malloc_junk);
>  916         return NULL;
>  917 }
>
> So, actually, as per the comment on line no. 885 in the above code, it is
> mentioned that it will allocate a page of chunks. But, what I have observed
> as from the code that pp is the new bucket means pp is the page which is
> full of chunks or which has chunks. As we can see on line 905, it calls
> alloc_chunk_info() function then inside that it calls init_chunk_info(),
> so, in short, we can say that first, it will allocate some chunk and then
> initialized that allocated chunk and then returns the same.

pp points to a page of chunks
bp point to the associated meta info: a bitmap that says which chunks
in the page are free. The bitmap is an aray of shorts, so 16 bits per entry.

>
> So, if we compare from here then it means, bp is the allocated and
> initialized chunk and the bp->page = pp, means it stores the page pp to
> bp->page. Then, after the hash table, it returns bp, means it returns the
> allocated-initialized chunk. But at the same time, I was referring the
> https://junk.tintagel.pl/openbsd-daily-malloc-3.txt by @mulander where he
> mentioned that "so bp is a page of chunks". So, I became little confused
> because pp is the page of chunks, which is used in function malloc_bytes()
> where it calculates the page offset and adds it to page bp->page, which is
> used by the user to input or writes stuff, like it returns address bp->page
> + k addr returns by malloc(3).

again bp is the meta info. bp->page is the page of chunks itself

>
> Second Query:
>
> And, bp->page is the pp and k is the offset, so, is it possible to get the
> address of the specific chunk because (bp->page + k) belongs to some chunks
> or we can say k is the index of the chunk that is inside the bucket
> bp->page or pp?

in the code k is first is the chunk number, and then multiplied (by
shifting it by bp->shift) to get the byte offset of the chunk inside
the chunk page.

>
>
> And in the structure chunk_info, u_short bits[1] is the bit for tracking
> whether the chunk is free or not. So, it belongs to each and every chunk.
> For example, there are 10 chunks in a page and 5fth chunk is not free then
> it will set that bits[1] to 0 and other 9 will be 1.
>
> OR
>
> Is it like bits denote the bits, like in the function init_chunk_info, in
> the end, it copies 0xff bytes to p->bits with size 30. Then it calculates
> p->bits[15] = 65535, so it is like it makes the last bit to 1.

p->bits is a bit mask. Each short in it holds 16 bits, so the first 16
chunks end uo in the first short, the next in the 2nd short etc.

The *lp ^= 1 << k line actuall sets the bit.

> sometimes I also have things in my mind like if bits[1] then how it is
> possible to assignbits[15] it means it performs the operation on bits. that
> I have analyzed by debugging.
>
> I have referred the paper for the understanding of bits[1] value,
> http://www.ouah.org/BSD-heap-smashing.txt. Actually not have proper
> confidence of understanding on bits logic.
>
> Apart from the issues discussed above, I mostly understood most of the
> stuff on malloc(3) but for the above still not getting the convinsible
> understainding.
>
>
> Regards,
> Neeraj

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Wed, Mar 25, 2020 at 2:06 AM Otto Moerbeek <[hidden email]> wrote:

> pp points to a page of chunks
> bp point to the associated meta info: a bitmap that says which chunks
> in the page are free. The bitmap is an aray of shorts, so 16 bits per
> entry.
>

 per entry means for our case bits[1], so only one entry?

in the code k is first is the chunk number, and then multiplied (by
> shifting it by bp->shift) to get the byte offset of the chunk inside
> the chunk page.
>

Okay, so, here, we have 3 things:
1. pp is the page
2. k is the chunknum, before shiting, which means k is the index for the
chunk in the page pp.
3. After shifting, k becomes the byte offset of the chunk inside the pp
page.

p->bits is a bit mask. Each short in it holds 16 bits, so the first 16
> chunks end uo in the first short, the next in the 2nd short etc.
>
> The *lp ^= 1 << k line actuall sets the bit.
>

Okay, 16 chunks because each chunk has different bit and a total of 16 bits
represents 16 different chunks.

So, as per the init_chunk_info() function. for the bitmap operations given
below:
840
841         /* set all valid bits in the bitmap */
842         i = p->total - 1;
843         memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
844         p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
845 }

Here, it first calculates the i, which is 256 - 1, that is, 255 or in other
words 0xff.
Then, the memset(3) writes the len bytes of "0xff" to p->bits.

Here, len is sizeof(p-bits[0]) * (i / MALLOC_BITS), where
sizeof(p->bits[0]) = 2bytes and (i / MALLOC_BYTES) is 255 / 16, that is,
15. And total it becomes, 2 * 15 = 30bytes.

So, it copies 30 bytes of 0xff to p->bits. But here, the main confusion
lies, like the bits[1] is of type u_short and as we know the sizeof u_short
is 2bytes but we are copying the 30 bytes through memset(3).

Then, in the next line, it is again making the 2 bytes to 0xff through
calculating the last index of bit array. That is,

p->bits[255 / 16] = (2U << (255 % 16)) - 1
p->bits[15] = 65536 - 1 = 65535, which, is 0xffff.

So, after overall calculations, it is copying the 30 bytes over to the
sizeof(u_short), which is 2 bytes.

Below are the observations from the debugger:

openbsd# LD_PRELOAD=/usr/src/lib/libc/obj/libc.so.95.1 gdb -q sample
(gdb) br main
Breakpoint 1 at 0x1363: file sample.c, line 7.
(gdb) r 12345
Starting program: /root/test/sample 12345
Breakpoint 1 at 0xfa68fc8a363: file sample.c, line 7.
Error while reading shared library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be 2)
[in module /usr/libexec/ld.so]

Breakpoint 1, main (argc=2, argv=0x7f7ffffea018) at sample.c:7
7 char *buff1, *buff2 = NULL;
Current language:  auto; currently minimal
(gdb) s
8 buff1 = (char *)malloc(8);
(gdb) s
malloc (size=8) at /usr/src/lib/libc/stdlib/malloc.c:1293
1293 int saved_errno = errno;
(gdb) br init_chunk_info
Breakpoint 2 at 0xfa951dafa20: file /usr/src/lib/libc/stdlib/malloc.c, line
832.
(gdb) c
Continuing.

Breakpoint 2, init_chunk_info (d=0xfa9011de110, p=0xfa910ecedb0, bits=4)
    at /usr/src/lib/libc/stdlib/malloc.c:832
832 if (bits == 0) {
(gdb) n
838 p->shift = bits;
(gdb)
839 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
(gdb)
840 p->size = 1U << bits;
(gdb)
841 p->offset = howmany(p->total, MALLOC_BITS);
(gdb)
843 p->canary = (u_short)d->canary1;
(gdb)
846 i = p->total - 1;
(gdb)
848 memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
(gdb) x/30wx p->bits
0xfa910ecedd4: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecede4: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecedf4: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee44: 0x00000000 0x00000000
(gdb) p/x i
$1 = 0xff
(gdb) n
849 p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
(gdb) x/30wx p->bits
0xfa910ecedd4: 0xffffffff        0xffffffff          0xffffffff
0xffffffff
0xfa910ecede4: 0xffffffff        0xffffffff          0xffffffff
0x0000ffff
0xfa910ecedf4: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee44: 0x00000000 0x00000000
(gdb) p/x i / 16
$2 = 0xf
(gdb) n
850 }

As we can see above it copied the 0xff to 30 bytes, and still, the last
2bytes are 0.
then after the line no. 849, it calculates p->bits[15] which seems to be
the last remaining bytes. And, making them 0xffff.

(gdb) x/30wx p->bits
0xfa910ecedd4: 0xffffffff        0xffffffff          0xffffffff
0xffffffff
0xfa910ecede4: 0xffffffff        0xffffffff          0xffffffff
0xffffffff
0xfa910ecedf4: 0x00000000 0x00000000  0x00000000 0x00000000
0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
0xfa910ecee44: 0x00000000 0x00000000
(gdb) n

Also, I have read the mail
https://marc.info/?l=openbsd-tech&m=131537857923062&w=2, where the u_short
bits[(MALLOC_PAGESIZE / MALLOC_MINSIZE) / MALLOC_BITS], which means
bits[15], changed to bits[1].
And, that change is because of making the chunk_info a variable-sized
struct and wasting less space for metadata.
So, in short, it seems that change is due to space improvement.

And, as from the code snippet of phkmalloc, given below,

    /* set all valid bits in the bitmap */
    k = bp->total;
    i = 0;

    /* Do a bunch at a time */
    for(;k-i >= MALLOC_BITS; i += MALLOC_BITS)
bp->bits[i / MALLOC_BITS] = ~0;

Here, in my understanding, it seems that it is making the all the 16 bits
to 1 but the same code logic is little different on OpenBSD.


So, for the above observations to copy 30bytes (excluding the 2 bytes of
u_short). Is that due to some improvements that I am missing to understand?


Thanks,
Neeraj
Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

Otto Moerbeek
On Fri, Mar 27, 2020 at 02:21:44PM +0530, Neeraj Pal wrote:

> On Wed, Mar 25, 2020 at 2:06 AM Otto Moerbeek <[hidden email]> wrote:
>
> > pp points to a page of chunks
> > bp point to the associated meta info: a bitmap that says which chunks
> > in the page are free. The bitmap is an aray of shorts, so 16 bits per
> > entry.
> >
>
>  per entry means for our case bits[1], so only one entry?
>
> in the code k is first is the chunk number, and then multiplied (by
> > shifting it by bp->shift) to get the byte offset of the chunk inside
> > the chunk page.
> >
>
> Okay, so, here, we have 3 things:
> 1. pp is the page
> 2. k is the chunknum, before shiting, which means k is the index for the
> chunk in the page pp.
> 3. After shifting, k becomes the byte offset of the chunk inside the pp
> page.
>
> p->bits is a bit mask. Each short in it holds 16 bits, so the first 16
> > chunks end uo in the first short, the next in the 2nd short etc.
> >
> > The *lp ^= 1 << k line actuall sets the bit.
> >
>
> Okay, 16 chunks because each chunk has different bit and a total of 16 bits
> represents 16 different chunks.
>
> So, as per the init_chunk_info() function. for the bitmap operations given
> below:
> 840
> 841         /* set all valid bits in the bitmap */
> 842         i = p->total - 1;
> 843         memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
> 844         p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
> 845 }
>
> Here, it first calculates the i, which is 256 - 1, that is, 255 or in other
> words 0xff.
> Then, the memset(3) writes the len bytes of "0xff" to p->bits.
>
> Here, len is sizeof(p-bits[0]) * (i / MALLOC_BITS), where
> sizeof(p->bits[0]) = 2bytes and (i / MALLOC_BYTES) is 255 / 16, that is,
> 15. And total it becomes, 2 * 15 = 30bytes.

For chunk size 256, there will indeed be 16 chunks in a page. i will
*not* be 255 in that case, but 15.  There is no such thing as
MALLOC_BYTES.  the memset will becomes memset(p->bits, 0xff, 2) and
set p->bits[0] to to 0xffff The line below it will set p->bits[1] to
(2<<15) - 1 = 0xffff; So all 16 bits needed are set to 1.

The debug session below is for chunk size 16, so the numbers are different.

>
> So, it copies 30 bytes of 0xff to p->bits. But here, the main confusion
> lies, like the bits[1] is of type u_short and as we know the sizeof u_short
> is 2bytes but we are copying the 30 bytes through memset(3).
>
> Then, in the next line, it is again making the 2 bytes to 0xff through
> calculating the last index of bit array. That is,
>
> p->bits[255 / 16] = (2U << (255 % 16)) - 1
> p->bits[15] = 65536 - 1 = 65535, which, is 0xffff.
>
> So, after overall calculations, it is copying the 30 bytes over to the
> sizeof(u_short), which is 2 bytes.
>
> Below are the observations from the debugger:
>
> openbsd# LD_PRELOAD=/usr/src/lib/libc/obj/libc.so.95.1 gdb -q sample
> (gdb) br main
> Breakpoint 1 at 0x1363: file sample.c, line 7.
> (gdb) r 12345
> Starting program: /root/test/sample 12345
> Breakpoint 1 at 0xfa68fc8a363: file sample.c, line 7.
> Error while reading shared library symbols:
> Dwarf Error: wrong version in compilation unit header (is 4, should be 2)
> [in module /usr/libexec/ld.so]
>
> Breakpoint 1, main (argc=2, argv=0x7f7ffffea018) at sample.c:7
> 7 char *buff1, *buff2 = NULL;
> Current language:  auto; currently minimal
> (gdb) s
> 8 buff1 = (char *)malloc(8);
> (gdb) s
> malloc (size=8) at /usr/src/lib/libc/stdlib/malloc.c:1293
> 1293 int saved_errno = errno;
> (gdb) br init_chunk_info
> Breakpoint 2 at 0xfa951dafa20: file /usr/src/lib/libc/stdlib/malloc.c, line
> 832.
> (gdb) c
> Continuing.
>
> Breakpoint 2, init_chunk_info (d=0xfa9011de110, p=0xfa910ecedb0, bits=4)
>     at /usr/src/lib/libc/stdlib/malloc.c:832
> 832 if (bits == 0) {
> (gdb) n
> 838 p->shift = bits;
> (gdb)
> 839 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
> (gdb)
> 840 p->size = 1U << bits;
> (gdb)
> 841 p->offset = howmany(p->total, MALLOC_BITS);
> (gdb)
> 843 p->canary = (u_short)d->canary1;
> (gdb)
> 846 i = p->total - 1;
> (gdb)
> 848 memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
> (gdb) x/30wx p->bits
> 0xfa910ecedd4: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecede4: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecedf4: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee44: 0x00000000 0x00000000
> (gdb) p/x i
> $1 = 0xff
> (gdb) n
> 849 p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
> (gdb) x/30wx p->bits
> 0xfa910ecedd4: 0xffffffff        0xffffffff          0xffffffff
> 0xffffffff
> 0xfa910ecede4: 0xffffffff        0xffffffff          0xffffffff
> 0x0000ffff
> 0xfa910ecedf4: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee44: 0x00000000 0x00000000
> (gdb) p/x i / 16
> $2 = 0xf
> (gdb) n
> 850 }
>
> As we can see above it copied the 0xff to 30 bytes, and still, the last
> 2bytes are 0.
> then after the line no. 849, it calculates p->bits[15] which seems to be
> the last remaining bytes. And, making them 0xffff.
>
> (gdb) x/30wx p->bits
> 0xfa910ecedd4: 0xffffffff        0xffffffff          0xffffffff
> 0xffffffff
> 0xfa910ecede4: 0xffffffff        0xffffffff          0xffffffff
> 0xffffffff
> 0xfa910ecedf4: 0x00000000 0x00000000  0x00000000 0x00000000
> 0xfa910ecee04: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee14: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee24: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee34: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xfa910ecee44: 0x00000000 0x00000000
> (gdb) n
>
> Also, I have read the mail
> https://marc.info/?l=openbsd-tech&m=131537857923062&w=2, where the u_short
> bits[(MALLOC_PAGESIZE / MALLOC_MINSIZE) / MALLOC_BITS], which means
> bits[15], changed to bits[1].
> And, that change is because of making the chunk_info a variable-sized
> struct and wasting less space for metadata.
> So, in short, it seems that change is due to space improvement.
>
> And, as from the code snippet of phkmalloc, given below,
>
>     /* set all valid bits in the bitmap */
>     k = bp->total;
>     i = 0;
>
>     /* Do a bunch at a time */
>     for(;k-i >= MALLOC_BITS; i += MALLOC_BITS)
> bp->bits[i / MALLOC_BITS] = ~0;
>
> Here, in my understanding, it seems that it is making the all the 16 bits
> to 1 but the same code logic is little different on OpenBSD.
>
>
> So, for the above observations to copy 30bytes (excluding the 2 bytes of
> u_short). Is that due to some improvements that I am missing to understand?
>
>
> Thanks,
> Neeraj

You need to look at the bigger picture.

Depending on the chunk size, a different number of bits are needed,
since the amount of chunks in a page differs.

In OpenBSD, struct chunk_info is a variable size data structure. In
alloc_chunk_info() the size needed to store the struct itself plus
the shorts needed for the bitmap and the canary info (if enabled) is
computed.

For a chunk of half a page, we need two bits, so a single short is enough.
For chunk size 16, we need MALLOC_PAGESIZE/16 bits for the bitmap.
That translates to (MALLOC_PAGESIZE/16)/2 shorts. The rest of the
calculation adjust for the other fields in chunk_info.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: Regarding the understanding of the malloc(3) code

neerajpal
On Fri, Mar 27, 2020 at 3:21 PM Otto Moerbeek <[hidden email]> wrote:


> For chunk size 256, there will indeed be 16 chunks in a page. i will
> *not* be 255 in that case, but 15.  There is no such thing as
> MALLOC_BYTES.  the memset will becomes memset(p->bits, 0xff, 2) and
> set p->bits[0] to to 0xffff The line below it will set p->bits[1] to
> (2<<15) - 1 = 0xffff; So all 16 bits needed are set to 1.
>
> The debug session below is for chunk size 16, so the numbers are different.
>

Yeah, my apologies for the typo. Actually, I was about to write MALLOC_BITS
instead of MALLOC_BYTES.

And, I am requesting you to explain a little bit more. Because all the
observations here are based on the sample code,

int
main(int argc, char **argv) {
     char *buff1, *buff2 = NULL;
     buff1 = (char *)malloc(8);
     strcpy(buff1, argv[1]);
     free(buff1);
     return 0;
}

So, p->total is different for different sizes as it depends on the bits.
But, as per the sample code given above,
in the init_chunk_info() as given below:

 823 static void
 824 init_chunk_info(struct dir_info *d, struct chunk_info *p, int bits)
 825 {
 826         int i;
 827
 828         if (bits == 0) {
 829                 p->shift = MALLOC_MINSHIFT;
 830                 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
 831                 p->size = 0;
 832                 p->offset = 0xdead;
 833         } else {
 834                 p->shift = bits;
 835                 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
 836                 p->size = 1U << bits;
 837                 p->offset = howmany(p->total, MALLOC_BITS);
 838         }
 839         p->canary = (u_short)d->canary1;
 840
 841         /* set all valid bits in the bitmap */
 842         i = p->total - 1;
 843         memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
 844         p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
 845 }

On line no. 835 we can see that it calculates the p->total and p->free with
MALLOC_PAGESIZE >> p->shift. And, for the sample code, p->shifts = 4. So,
p->total will become 256.
Then, from line no. 842 we can see that i = p->total - 1, which means i
becomes 255.

Then, memset(p->bits, 0xff, 2 * (255 / 16)) => memset(p->bits, 0xff, 30), I
am not able to understand how it is becoming memset(p->bits, 0xff, 2).

You need to look at the bigger picture.
>
> Depending on the chunk size, a different number of bits are needed,
> since the amount of chunks in a page differs.
>

Yes, thanks for remembering me. I have verified with different malloc
requests, like malloc(8) provides bits value = 4, malloc(24) provides bits
value = 5 then malloc(54) provides bits = 6 and malloc(100) provides bits =
7.

In OpenBSD, struct chunk_info is a variable size data structure. In
> alloc_chunk_info() the size needed to store the struct itself plus
> the shorts needed for the bitmap and the canary info (if enabled) is
> computed.
>
> For a chunk of half a page, we need two bits, so a single short is enough.
> For chunk size 16, we need MALLOC_PAGESIZE/16 bits for the bitmap.
> That translates to (MALLOC_PAGESIZE/16)/2 shorts. The rest of the
> calculation adjust for the other fields in chunk_info.
>

So, here it means for chunk size 16, we need 256 bits or 32bytes for the
bitmap. That translates to 128 shorts. What do you mean by 128 shorts here?
I mean is it u_short bit[1], that bit[1] count is 128?

So you are saying that in the alloc_chunk_info() function on line no. 859
as mentioned below:

 855
 856                 if (bits == 0)
 857                         count = MALLOC_PAGESIZE / MALLOC_MINSIZE;
 858                 else
 859                         count = MALLOC_PAGESIZE >> bits;
 860
 861                 size = howmany(count, MALLOC_BITS);
 862                 size = sizeof(struct chunk_info) + (size - 1) *
sizeof(u_short);
 863                 if (mopts.chunk_canaries)
 864                         size += count * sizeof(u_short);

Here, as per the sample code, the value of the bits is 4. So, count = 4096
>> 4 = 256. So, the count variable refers to the number of bitmap bits.
Then, size = (count + (MALLOC_BITS - 1)) / MALLOC_BITS

size = 271 / 16 = 16.

then, from line no. 862 it calculates the minimum size requires for bitmap
including the entire chunk_info structure. Like for count = 256 case, size
= (16 - 1) * 2 = 30. Then if canary then it calculates for the canary.

So, from here, it calculates the size for bitmap like I have already
mentioned above, that memset doing 30 bytes for bits = 4. So, that 30 bytes
comes from here?

Thanks,
Neeraj Pal