rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn

author: https://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14 <diana@web> 2015-02-16 20:08:03 +0100
committer: GNU Hurd web pages engine <web-hurd@gnu.org> 2015-02-16 20:08:03 +0100
commit: 95878586ec7611791f4001a4ee17abf943fae3c1 (patch)
tree: 847cf658ab3c3208a296202194b16a6550b243cf /open_issues/gnumach_memory_management.mdwn
parent: 8063426bf7848411b0ef3626d57be8cb4826715e (diff)
download: web-95878586ec7611791f4001a4ee17abf943fae3c1.tar.gz
web-95878586ec7611791f4001a4ee17abf943fae3c1.tar.bz2
web-95878586ec7611791f4001a4ee17abf943fae3c1.zip
1 files changed, 0 insertions, 2391 deletions
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
deleted file mode 100644
index b36c674a..00000000
--- a/open_issues/gnumach_memory_management.mdwn
+++ /dev/null
@@ -1,2391 +0,0 @@
-[[!meta copyright="Copyright © 2011, 2012, 2013, 2014 Free Software Foundation,
-Inc."]]
-
-[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
-id="license" text="Permission is granted to copy, distribute and/or modify this
-document under the terms of the GNU Free Documentation License, Version 1.2 or
-any later version published by the Free Software Foundation; with no Invariant
-Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
-is included in the section entitled [[GNU Free Documentation
-License|/fdl]]."]]"""]]
-
-[[!tag open_issue_gnumach]]
-
-There is a [[!FF_project 266]][[!tag bounty]] on this task.
-
-[[!toc]]
-
-
-# IRC, freenode, #hurd, 2011-04-12
-
-    <antrik> braunr: do you think the allocator you wrote for x15 could be used
-      for gnumach? and would you be willing to mentor this? :-)
-    <braunr> antrik: to be willing to isn't my current problem
-    <braunr> antrik: and yes, I think my allocator can be used
-    <braunr> it's a slab allocator after all, it only requires reap() and
-      grow()
-    <braunr> or mmap()/munmap() whatever you want to call it
-    <braunr> a backend
-    <braunr> antrik: although i've been having other ideas recently
-    <braunr> that would have more impact on our usage patterns I think
-    <antrik> mcsim: have you investigated how the zone allocator works and how
-      it's hooked into the system yet?
-    <braunr> mcsim: now let me give you a link
-    <braunr> mcsim:
-      http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=330436e799f322949bfd9e2fedf0475660309946;hb=HEAD
-    <braunr> mcsim: this is an implementation of the slab allocator i've been
-      working on recently
-    <braunr> mcsim: i haven't made it public because i reworked the per
-      processor layer, and this part isn't complete yet
-    <braunr> mcsim: you could use it as a reference for your project
-    <mcsim> braunr: ok
-    <braunr> it used to be close to the 2001 vmem paper
-    <braunr> but after many tests, fragmentation and accounting issues have
-      been found
-    <braunr> so i rewrote it to be closer to the linux implementation (cache
-      filling/draining in bukl transfers)
-    <braunr> bulk*
-    <braunr> they actually use the word draining in linux too :)
-    <mcsim> antrik: not complete yet.
-    <antrik> braunr: oh, it's unfinished? that's unfortunate...
-    <braunr> antrik: only the per processor part
-    <braunr> antrik: so it doesn't matter much for gnumach
-    <braunr> and it's not difficult to set up
-    <antrik> mcsim: hm, OK... but do you think you will have a fairly good
-      understanding in the next couple of days?...
-    <antrik> I'm asking because I'd really like to see a proposal a bit more
-      specific than "I'll look into things..."
-    <antrik> i.e. you should have an idea which things you will actually have
-      to change to hook up a new allocator etc.
-    <antrik> braunr: OK. will the interface remain unchanged, so it could be
-      easily replaced with an improved implementation later?
-    <braunr> the zone allocator in gnumach is a badly written bare object
-      allocator actually, there aren't many things to understand about it
-    <braunr> antrik: yes
-    <antrik> great :-)
-    <braunr> and the per processor part should be very close to the phys
-      allocator sitting next to it
-    <braunr> (with the slight difference that, as per cpu caches have variable
-      sizes, they are allocated on the free path rather than on the allocation
-      path)
-    <braunr> this is a nice trick in the vmem paper i've kept in mind
-    <braunr> and the interface also allows to set a "source" for caches
-    <antrik> ah, good point... do you think we should replace the physmem
-      allocator too? and if so, do it in one step, or one piece at a time?...
-    <braunr> no
-    <braunr> too many drivers currently depend on the physical allocator and
-      the pmap module as they are
-    <braunr> remember linux 2.0 drivers need a direct virtual to physical
-      mapping
-    <braunr> (especially true for dma mappings)
-    <antrik> OK
-    <braunr> the nice thing about having a configurable memory source is that
-    <antrik> whot do you mean by "allocated on the free path"?
-    <braunr> even if most caches will use the standard vm_kmem module as their
-      backend
-    <braunr> there is one exception in the vm_map module, allowing us to get
-      rid of either a static limit, or specific allocation code
-    <braunr> antrik: well, when you allocate a page, the allocator will lookup
-      one in a per cpu cache
-    <braunr> if it's empty, it fills the cache
-    <braunr> (called pools in my implementations)
-    <braunr> it then retries
-    <braunr> the problem in the slab allocator is that per cpu caches have
-      variable sizes
-    <braunr> so per cpu pools are allocated from their own pools
-    <braunr> (remember the magazine_xx caches in the output i showed you, this
-      is the same thing)
-    <braunr> but if you allocate them at allocation time, you could end up in
-      an infinite loop
-    <braunr> so, in the slab allocator, when a per cpu cache is empty, you just
-      fall back to the slab layer
-    <braunr> on the free path, when a per cpu cache doesn't exist, you allocate
-      it from its own cache
-    <braunr> this way you can't have an infinite loop
-    <mcsim> antrik: I'll try, but I have exams now.
-    <mcsim> As I understand amount of elements which could be allocated we
-      determine by zone initialization. And at this time memory for zone is
-      reserved. I'm going to change this. And make something similar to kmalloc
-      and vmalloc (support for pages consecutive physically and virtually). And
-      pages in zones consecutive always physically.
-    <mcsim> Am I right?
-    <braunr> mcsim: don't try to do that
-    <mcsim> why?
-    <braunr> mcsim: we just need a slab allocator with an interface close to
-      the zone allocator
-    <antrik> mcsim: IIRC the size of the complete zalloc map is fixed; but not
-      the number of elements per zone
-    <braunr> we don't need two allocators like kmalloc and vmalloc
-    <braunr> actually we just need vmalloc
-    <braunr> IIRC the limits are only present because the original developers
-      wanted to track leaks
-    <braunr> they assumed zones would be large enough, which isn't true any
-      more today
-    <braunr> but i didn't see any true reservation
-    <braunr> antrik: i'm not sure i was clear enough about the "allocation of
-      cpu caches on the free path"
-    <braunr> antrik: for a better explanation, read the vmem paper ;)
-    <antrik> braunr: you mean there is no fundamental reason why the zone map
-      has a limited maximal size; and it was only put in to catch cases where
-      something eats up all memory with kernel object creation?...
-    <antrik> braunr: I think I got it now :-)
-    <braunr> antrik: i'm pretty certin of it yes
-    <antrik> I don't see though how it is related to what we were talking
-      about...
-    <braunr> 10:55 < braunr> and the per processor part should be very close to
-      the phys allocator sitting next to it
-    <braunr> the phys allocator doesn't have to use this trick
-    <braunr> because pages have a fixed size, so per cpu caches all have the
-      same size too
-    <braunr> and the number of "caches", that is, physical segments, is limited
-      and known at compile time
-    <braunr> so having them statically allocated is possible
-    <antrik> I see
-    <braunr> it would actually be very difficult to have a phys allocator
-      requiring dynamic allocation when the dynamic allocator isn't yet ready
-    <antrik> hehe :-)
-    <mcsim> total size of all zone allocations is limited to 12 MB. And is "was
-      only put in to catch cases where something eats up all memory with kernel
-      object creation?"
-    <braunr> mcsim: ah right, there could be a kernel submap backing all the
-      zones
-    <braunr> but this can be increased too
-    <braunr> submaps are kind of evil :/
-    <antrik> mcsim: I think it's actually 32 MiB or something like that in the
-      Debian version...
-    <antrik> braunr: I'm not sure I ever fully understood what the zalloc map
-      is... I looked through the code once, and I think I got a rough
-      understading, but I was still pretty uncertain about some bits. and I
-      don't remember the details anyways :-)
-    <braunr> antrik: IIRC, it's a kernel submap
-    <braunr> it's named kmem_map in x15
-    <antrik> don't know what a submap is
-    <braunr> submaps are vm_map objects
-    <braunr> in a top vm_map, there are vm_map_entries
-    <braunr> these entries usually point to vm_objects
-    <braunr> (for the page cache)
-    <braunr> but they can point to other maps too
-    <braunr> the goal is to reduce fragmentation by isolating allocations
-    <braunr> this also helps reducing contention
-    <braunr> for exemple, on BSD, there is a submap for mbufs, so that the
-      network code doesn't interfere too much with other kernel allocations
-    <braunr> antrik: they are similar to spans in vmem, but vmem has an elegant
-      importing mechanism which eliminates the static limit problem
-    <antrik> so memory is not directly allocated from the physical allocator,
-      but instead from another map which in turn contains physical memory, or
-      something like that?...
-    <braunr> no, this is entirely virtual
-    <braunr> submaps are almost exclusively used for the kernel_map
-    <antrik> you are using a lot of identifies here, but I don't remember (or
-      never knew) what most of them mean :-(
-    <braunr> sorry :)
-    <braunr> the kernel map is the vm_map used to represent the ~1 GiB of
-      virtual memory the kernel has (on i386)
-    <braunr> vm_map objects are simple virtual space maps
-    <braunr> they contain what you see in linux when doing /proc/self/maps
-    <braunr> cat /proc/self/maps
-    <braunr> (linux uses entirely different names but it's roughly the same
-      structure)
-    <braunr> each line is a vm_map_entry
-    <braunr> (well, there aren't submaps in linux though)
-    <braunr> the pmap tool on netbsd is able to show the kernel map with its
-      submaps, but i don't have any image around
-    <mcsim> braunr: is limit for zones is feature and shouldn't be changed?
-    <braunr> mcsim: i think we shouldn't have fixed limits for zones
-    <braunr> mcsim: this should be part of the debugging facilities in the slab
-      allocator
-    <braunr> is this fixed limit really a major problem ?
-    <braunr> i mean, don't focus on that too much, there are other issues
-      requiring more attention
-    <antrik> braunr: at 12 MiB, it used to be, causing a lot of zalloc
-      panics. after increasing, I don't think it's much of a problem anymore...
-    <antrik> but as memory sizes grow, it might become one again
-    <antrik> that's the problem with a fixed size...
-    <braunr> yes, that's the issue with submaps
-    <braunr> but gnumach is full of those, so let's fix them by order of
-      priority
-    <antrik> well, I'm still trying to digest what you wrote about submaps :-)
-    <braunr> i'm downloading netbsd, so you can have a good view of all this
-    <antrik> so, when the kernel allocates virtual address space regions
-      (mostly for itself), instead of grabbing chunks of the address space
-      directly, it takes parts out of a pre-reserved region?
-    <braunr> not exactly
-    <braunr> both statements are true
-    <mcsim> antrik: only virtual addresses are reserved
-    <braunr> it grabs chunks of the address space directly, but does so in a
-      reserved region of the address space
-    <braunr> a submap is like a normal map, it has a start address, a size, and
-      is empty, then it's populated with vm_map_entries
-    <braunr> so instead of allocating from 3-4 GiB, you allocate from, say,
-      3.1-3.2 GiB
-    <antrik> yeah, that's more or less what I meant...
-    <mcsim> braunr: I see two problems: limited zones and absence of caching. 
-    <mcsim> with caching absence of readahead paging will be not so significant
-    <braunr> please avoid readahead
-    <mcsim> ok
-    <braunr> and it's not about paging, it's about kernel memory, which is
-      wired
-    <braunr> (well most of it)
-    <braunr> what about limited zones ?
-    <braunr> the whole kernel space is limited, there has to be limits
-    <braunr> the problem is how to handle them
-    <antrik> braunr: almost all. I looked through all zones once, and IIRC I
-      found exactly one that actually allows paging...
-    <braunr> currently, when you reach the limit, you have an OOM error
-    <braunr> antrik: yes, there are
-    <braunr> i don't remember which implementation does that but, when
-      processes haven't been active for a minute or so, they are "swapedout"
-    <braunr> completely
-    <braunr> even the kernel stack
-    <braunr> and the page tables
-    <braunr> (most of the pmap structures are destroyed, some are retained)
-    <antrik> that might very well be true... at least inactive processes often
-      show up with 0 memory use in top on Hurd
-    <braunr> this is done by having a pageable kernel map, with wired entries
-    <braunr> when the swapper thread swaps tasks out, it unwires them
-    <braunr> but i think modern implementations don't do that any more
-    <antrik> well, I was talking about zalloc only :-)
-    <braunr> oh
-    <braunr> so the zalloc_map must be pageable
-    <braunr> or there are two submaps ?
-    <antrik> not sure whether "morden implementations" includes Linux ;-)
-    <braunr> no, i'm talking about the bsd family only
-    <antrik> but it's certainly true that on Linux even inactive processes
-      retain some memory
-    <braunr> linux doesn't make any difference between processor-bound and
-      I/O-bound processes
-    <antrik> braunr: I have no idea how it works. I just remember that when
-      creating zones, one of the optional flags decides whether the zone is
-      pagable. but as I said, IIRC there is exactly one that actually is...
-    <braunr> zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max,
-      zone_map_size, FALSE);
-    <braunr> kmem_suballoc(parent, min, max, size, pageable)
-    <braunr> so the zone_map isn't
-    <antrik> IIRC my conclusion was that pagable zones do not count in the
-      fixed zone map limit... but I'm not sure anymore
-    <braunr> zinit() has a memtype parameter
-    <braunr> with ZONE_PAGEABLE as a possible flag
-    <braunr> this is wierd :)
-    <mcsim> There is no any zones which use ZONE_PAGEABLE flag
-    <antrik> mcsim: are you sure? I think I found one...
-    <braunr> if (zone->type & ZONE_PAGEABLE) {
-    <antrik> admittedly, it is several years ago that I looked into this, so my
-      memory is rather dim...
-    <braunr> if (kmem_alloc_pageable(zone_map, &addr, ...
-    <braunr> calling kmem_alloc_pageable() on an unpageable submap seems wrong
-    <mcsim> I've greped gnumach code and there is no any zinit procedure call
-      with ZONE_PAGEABLE flag
-    <braunr> good
-    <antrik> hm... perhaps it was in some code that has been removed
-      alltogether since ;-)
-    <antrik> actually I think it would be pretty neat to have pageable kernel
-      objects... but I guess it would require considerable effort to implement
-      this right
-    <braunr> mcsim: you also mentioned absence of caching
-    <braunr> mcsim: the zone allocator actually is a bare caching object
-      allocator
-    <braunr> antrik: no, it's easy
-    <braunr> antrik: i already had that in x15 0.1
-    <braunr> antrik: the problem is being sure the objects you allocate from a
-      pageable backing store are never used when resolving a page fault
-    <braunr> that's all
-    <antrik> I wouldn't expect that to be easy... but surely you know better
-      :-)
-    <mcsim> braunr: indeed. I was wrong.
-    <antrik> braunr: what is a caching object allocator?...
-    <braunr> antrik: ok, it's not easy
-    <braunr> antrik: but once you have vm_objects implemented, having pageable
-      kernel object is just a matter of using the right options, really
-    <braunr> antrik: an allocator that caches its buffers
-    <braunr> some years ago, the term "object" would also apply to
-      preconstructed buffers
-    <antrik> I have no idea what you mean by "caches its buffers" here :-)
-    <braunr> well, a memory allocator which doesn't immediately free its
-      buffers caches them
-    <mcsim> braunr: but can it return objects to system?
-    <braunr> mcsim: which one ?
-    <antrik> yeah, obviously the *implementation* of pageable kernel objects is
-      not hard. the tricky part is deciding which objects can be pageable, and
-      which need to be wired...
-    <mcsim> Can zone allocator return cached objects to system as in slab?
-    <mcsim> I mean reap()
-    <braunr> well yes, it does so, and it does that too often
-    <braunr> the caching in the zone allocator is actually limited to the
-      pagesize
-    <braunr> once page is completely free, it is returned to the vm
-    <mcsim> this is bad caching
-    <braunr> yes
-    <mcsim> if object takes all page than there is now caching at all
-    <braunr> caching by side effect
-    <braunr> true
-    <braunr> but the linux slab allocator does the same thing :p
-    <braunr> hm
-    <braunr> no, the solaris slab allocator does so
-    <mcsim> linux's slab returns objects only when system ask
-    <antrik> without preconstructed objects, is there actually any point in
-      caching empty slabs?...
-    <mcsim> Once I've changed my allocator to slab and it cached more than 1GB
-      of my memory)
-    <braunr> ok wait, need to fix a few mistakes first
-    <mcsim> s/ask/asks
-    <braunr> the zone allocator (in gnumach) actually has a garbage collector
-    <antrik> braunr: well, the Solaris allocator follows the slab/magazine
-      paper, right? so there is caching at the magazine layer... in that case
-      caching empty slabs too would be rather redundant I'd say...
-    <braunr> which is called when running low on memory, similar to the slab
-      allocaotr
-    <braunr> antrik: yes
-    <antrik> (or rather the paper follows the Solaris allocator ;-) )
-    <braunr> mcsim: the zone allocator reap() is zone_gc()
-    <antrik> braunr: hm, right, there is a "collectable" flag for zones... but
-      I never understood what it means
-    <antrik> braunr: BTW, I heard Linux has yet another allocator now called
-      "slob"... do you happen to know what that is?
-    <braunr> slob is a very simple allocator for embedded devices
-    <mcsim> AFAIR this is just heap allocator
-    <braunr> useful when you have a very low amount of memory
-    <braunr> like 1 MiB
-    <braunr> yes
-    <antrik> just googled it :-)
-    <braunr> zone and slab are very similar
-    <antrik> sounds like a simple heap allocator
-    <mcsim> there is another allocator that calls slub, and it better than slab
-      in many cases
-    <braunr> the main difference is the data structures used to store slabs
-    <braunr> mcsim: i disagree
-    <antrik> mcsim: ah, you already said that :-)
-    <braunr> mcsim: slub is better for systems with very large amounts of
-      memory and processors
-    <braunr> otherwise, slab is better
-    <braunr> in addition, there are accounting issues with slub
-    <braunr> because of cache merging
-    <mcsim> ok. This strange that slub is default allocator
-    <braunr> well both are very good
-    <braunr> iirc, linus stated that he really doesn't care as long as its
-      works fine
-    <braunr> he refused slqb because of that
-    <braunr> slub is nice because it requires less memory than slab, while
-      still being as fast for most cases
-    <braunr> it gets slower on the free path, when the cpu performing the free
-      is different from the one which allocated the object
-    <braunr> that's a reasonable cost
-    <mcsim> slub uses heap for large object. Are there any tests that compare
-      what is better for large objects?
-    <antrik> well, if slub requires less memory, why do you think slab is
-      better for smaller systems? :-)
-    <braunr> antrik: smaller is relative
-    <antrik> mcsim: for large objects slab allocation is rather pointless, as
-      you don't have multiple objects in a page anyways...
-    <braunr> antrik: when lameter wrote slub, it was intended for systems with
-      several hundreds processors
-    <antrik> BTW, was slqb really refused only because the other ones are "good
-      enough"?...
-    <braunr> yes
-    <antrik> wow, that's a strange argument...
-    <braunr> linus is already unhappy of having "so many" allocators
-    <antrik> well, if the new one is better, it could replace one of the others
-      :-)
-    <antrik> or is it useful only in certain cases?
-    <braunr> that's the problem
-    <braunr> nobody really knows
-    <antrik> hm, OK... I guess that should be tested *before* merging ;-)
-    <antrik> is anyone still working on it, or was it abandonned?
-    <antrik> mcsim: back to caching...
-    <antrik> what does caching in the kernel object allocator got to do with
-      readahead (i.e. clustered paging)?...
-    <mcsim> if we cached some physical pages we don't need to find new ones for
-      allocating new object. And that's why there will not be a page fault.
-    <mcsim> antrik: Regarding kam. Hasn't he finished his project?
-    <antrik> err... what?
-    <antrik> one of us must be seriously confused
-    <antrik> I totally fail to see what caching of physical pages (which isn't
-      even really a correct description of what slab does) has to do with page
-      faults
-    <antrik> right, KAM didn't finish his project
-    <mcsim> If we free the physical page and return it to system we need
-      another one for next allocation. But if we keep it, we don't need to find
-      new physical page. 
-    <mcsim> And physical page is allocated only then when page fault
-      occurs. Probably, I'm wrong
-    <antrik> what does "return to system" mean? we are talking about the
-      kernel...
-    <antrik> zalloc/slab are about allocating kernel objects. this doesn't have
-      *anything* to do with paging of userspace processes
-    <antrik> only thing the have in common is that they need to get pages from
-      the physical page allocator. but that's yet another topic
-    <mcsim> Under "return to system" I mean ability to use this page for other
-      needs.
-    <braunr> mcsim: consider kernel memory to be wired
-    <braunr> here, return to system means releasing a page back to the vm
-      system
-    <braunr> the vm_kmem module then unmaps the physical page and free its
-      virtual address in the kernel map
-    <mcsim> ok
-    <braunr> antrik: the problem with new allocators like slqb is that it's
-      very difficult to really know if they're better, even with extensive
-      testing
-    <braunr> antrik: there are papers (like wilson95) about the difficulties in
-      making valuable results in this field
-    <braunr> see
-      http://www.sceen.net/~rbraun/dynamic_storage_allocation_a_survey_and_critical_review.pdf
-    <mcsim> how can be allocated physically continuous object now?
-    <braunr> mcsim: rephrase please
-    <mcsim> what is similar to kmalloc in Linux to gnumach?
-    <braunr> i know memory is reserved for dma in a direct virtual to physical
-      mapping
-    <braunr> so even if the allocation is done similarly to vmalloc()
-    <braunr> the selected region of virtual space maps physical memory, so
-      memory is physically contiguous too
-    <braunr> for other allocation types, a block large enough is allocated, so
-      it's contiguous too
-    <mcsim> I don't clearly understand. If we have fragmentation in physical
-      ram, so there aren't 2 free pages in a row, but there are able apart, we
-      can't to allocate these 2 pages along?
-    <braunr> no
-    <braunr> but every system has this problem
-    <mcsim> But since we have only 12 or 32 MB of memory the problem becomes
-      more significant
-    <braunr> you're confusing virtual and physical memory
-    <braunr> those 32 MiB are virtual
-    <braunr> the physical pages backing them don't have to be contiguous
-    <mcsim> Oh, indeed 
-    <mcsim> So the only problem are limits?
-    <braunr> and performance
-    <braunr> and correctness
-    <braunr> i find the zone allocator badly written
-    <braunr> antrik: mcsim: here is the content of the kernel pmap on NetBSD
-      (which uses a virtual memory system close to the Mach VM)
-    <braunr> antrik: mcsim: http://www.sceen.net/~rbraun/pmap.out
-
-[[pmap.out]]
-
-    <braunr> you can see the kmem_map (which is used for most general kernel
-      allocations) is 128 MiB large
-    <braunr> actually it's not the kernel pmap, it's the kernel_map
-    <antrik> braunr: why is it called pmap.out then? ;-)
-    <braunr> antrik: because the tool is named pmap
-    <braunr> for process map
-    <braunr> it also exists under Linux, although direct access to
-      /proc/xx/maps gives more info
-    <mcsim> braunr: I've said that this is kernel_map. Can I see kernel_map for
-      Linux?
-    <braunr> mcsim: I don't know how to do that
-    <mcsim> s/I've/You've
-    <braunr> but Linux doesn't have submaps, and uses a direct virtual to
-      physical mapping, so it's used differently
-    <antrik> how are things (such as zalloc zones) entered into kernel_map?
-    <braunr> in zone_init() you have
-    <braunr> zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max,
-      zone_map_size, FALSE);
-    <braunr> so here, kmem_map is named zone_map
-    <braunr> then, in zalloc()
-    <braunr> kmem_alloc_wired(zone_map, &addr, zone->alloc_size)
-    <antrik> so, kmem_alloc just deals out chunks of memory referenced directly
-      by the address, and without knowing anything about the use?
-    <braunr> kmem_alloc() gives virtual pages
-    <braunr> zalloc() carves them into buffers, as in the slab allocator
-    <braunr> the difference is essentially the lack of formal "slab" object
-    <braunr> which makes the zone code look like a mess
-    <antrik> so kmem_suballoc() essentially just takes a bunch of pages from
-      the main kernel_map, and uses these to back another map which then in
-      turn deals out pages just like the main kernel_map?
-    <braunr> no
-    <braunr> kmem_suballoc creates a vm_map_entry object, and sets its start
-      and end address
-    <braunr> and creates a vm_map object, which is then inserted in the new
-      entry
-    <braunr> maybe that's what you meant with "essentially just takes a bunch
-      of pages from the main kernel_map"
-    <braunr> but there really is no allocation at this point
-    <braunr> except the map entry and the new map objects
-    <antrik> well, I'm trying to understand how kmem_alloc() manages things. so
-      it has map_entry structures like the maps of userspace processes? do
-      these also reference actual memory objects?
-    <braunr> kmem_alloc just allocates virtual pages from a vm_map, and backs
-      those with physical pages (unless the user requested pageable memory)
-    <braunr> it's not "like the maps of userspace processes"
-    <braunr> these are actually the same structures
-    <braunr> a vm_map_entry can reference a memory object or a kernel submap
-    <braunr> in netbsd, it can also referernce nothing (for pure wired kernel
-      memory like the vm_page array)
-    <braunr> maybe it's the same in mach, i don't remember exactly
-    <braunr> antrik: this is actually very clear in vm/vm_kern.c
-    <braunr> kmem_alloc() creates a new kernel object for the allocation
-    <braunr> allocates a new entry (or uses a previous existing one if it can
-      be extended) through vm_map_find_entry()
-    <braunr> then calls kmem_alloc_pages() to back it with wired memory
-    <antrik> "creates a new kernel object" -- what kind of kernel object?
-    <braunr> kmem_alloc_wired() does roughly the same thing, except it doesn't
-      need a new kernel object because it knows the new area won't be pageable
-    <braunr> a simple vm_object
-    <braunr> used as a container for anonymous memory in case the pages are
-      swapped out
-    <antrik> vm_object is the same as memory object/pager? or yet something
-      different?
-    <braunr> antrik: almost
-    <braunr> antrik: a memory_object is the user view of a vm_object
-    <braunr> as in the kernel/user interfaces used by external pagers
-    <braunr> vm_object is a more internal name
-    <mcsim> Is fragmentation a big problem in slab allocator?
-    <mcsim> I've tested it on my computer in Linux and for some caches it
-      reached 30-40%
-    <antrik> well, fragmentation is a major problem for any allocator...
-    <antrik> the original slab allocator was design specifically with the goal
-      of reducing fragmentation
-    <antrik> the revised version with the addition of magazines takes a step
-      back on this though
-    <antrik> have you compared it to slub? would be pretty interesting...
-    <mcsim> I have an idea how can it be decreased, but it will hurt by
-      performance...
-    <mcsim> antrik: no I haven't, but there will be might the same, I think
-    <mcsim> if each cache will handle two types of object: with sizes that will
-      fit cache sizes (or I bit smaller) and with sizes which are much smaller
-      than maximal cache size. For first type of object will be used standard
-      slab allocator and for latter type will be used (within page) heap
-      allocator.
-    <mcsim> I think that than fragmentation will be decreased
-    <antrik> not at all. heap allocator has much worse fragmentation. that's
-      why slab allocator was invented
-    <antrik> the problem is that in a long-running program (such an the
-      kernel), objects tend to have vastly varying lifespans
-    <mcsim> but we use heap only for objects of specified sizes
-    <antrik> so often a few old objects will keep a whole page hostage
-    <mcsim> for example for 32 byte cache it could be 20-28 byte objects
-    <antrik> that's particularily visible in programs such as firefox, which
-      will grow the heap during use even though actual needs don't change
-    <antrik> the slab allocator groups objects in a fashion that makes it more
-      likely adjacent objects will be freed at similar times
-    <antrik> well, that's pretty oversimplyfied, but I hope you get the
-      idea... it's about locality
-    <mcsim> I agree, but I speak not about general heap allocation. We have
-      many heaps for objects with different sizes.
-    <mcsim> Could it be better?
-    <antrik> note that this has been a topic of considerable research. you
-      shouldn't seek to improve the actual algorithms -- you would have to read
-      up on the existing research at least before you can contribute anything
-      to the field :-)
-    <antrik> how would that be different from the slab allocator?
-    <mcsim> slab will allocate 32 byte for both 20 and 32 byte requests
-    <mcsim> And if there was request for 20 bytes we get 12 unused
-    <antrik> oh, you mean the implementation of the generic allocator on top of
-      slabs? well, that might not be optimal... but it's not an often used case
-      anyways. mostly the kernel uses constant-sized objects, which get their
-      own caches with custom tailored size
-    <antrik> I don't think the waste here matters at all
-    <mcsim> affirmative. So my idea is useless. 
-    <antrik> does the statistic you refer to show the fragmentation in absolute
-      sizes too?
-    <mcsim> Can you explain what is absolute size?
-    <mcsim> I've counted what were requested (as parameter of kmalloc) and what
-      was really allocated (according to best fit cache size).
-    <antrik> how did you get that information?
-    <mcsim> I simply wrote a hook
-    <antrik> I mean total. i.e. how many KiB or MiB are wasted due to
-      fragmentation alltogether
-    <antrik> ah, interesting. how does it work?
-    <antrik> BTW, did you read the slab papers?
-    <mcsim> Do you mean articles from lwn.net?
-    <antrik> no 
-    <antrik> I mean the papers from the Sun hackers who invented the slab
-      allocator(s)
-    <antrik> Bonwick mostly IIRC
-    <mcsim> Yes
-    <antrik> hm... then you really should know the rationale behind it...
-    <mcsim> There he says about 11% percent of memory waste
-    <antrik> you didn't answer my other questions BTW :-)
-    <mcsim> I've corrupted kernel tree with patch, and tomorrow I'm going to
-      read myself up for exam (I have it on Thursday). But than I'll send you a
-      module which I've used for testing.
-    <antrik> OK
-    <mcsim> I can send you module now, but it will not work without patch.
-    <mcsim> It would be better to rewrite it using debugfs, but when I was
-      writing this test I didn't know about trace_* macros
-
-
-# IRC, freenode, #hurd, 2011-04-15
-
-    <mcsim> There is a hack in zone_gc when it allocates and frees two
-      vm_map_kentry_zone elements to make sure the gc will be able to allocate
-      two in vm_map_delete. Isn't it better to allocate memory for these
-      entries statically?
-    <youpi> mcsim: that's not the point of the hack
-    <youpi> mcsim: the point of the hack is to make sure vm_map_delete will be
-      able to allocate stuff
-    <youpi> allocating them statically will just work once
-    <youpi> it may happen several times that vm_map_delete needs to allocate it
-      while it's empty (and thus zget_space has to get called, leading to a
-      hang)
-    <youpi> funnily enough, the bug is also in macos X
-    <youpi> it's still in my TODO list to manage to find how to submit the
-      issue to them
-    <braunr> really ?
-    <braunr> eh
-    <braunr> is that because of map entry splitting ?
-    <youpi> it's git commit efc3d9c47cd744c316a8521c9a29fa274b507d26
-    <youpi> braunr: iirc something like this, yes
-    <braunr> netbsd has this issue too
-    <youpi> possibly
-    <braunr> i think it's a fundamental problem with the design
-    <braunr> people think of munmap() as something similar to free()
-    <braunr> whereas it's really unmap
-    <braunr> with a BSD-like VM, unmap can easily end up splitting one entry in
-      two
-    <braunr> but your issue is more about harmful recursion right ?
-    <youpi> I don't remember actually
-    <youpi> it's quite some time ago :)
-    <braunr> ok
-    <braunr> i think that's why i have "sources" in my slab allocator, the
-      default source (vm_kern) and a custom one for kernel map entries
-
-
-# IRC, freenode, #hurd, 2011-04-18
-
-    <mcsim> braunr: you've said that once page is completely free, it is
-      returned to the vm.
-    <mcsim> who else, besides zone_gc, can return free pages to the vm?
-    <braunr> mcsim: i also said i was wrong about that
-    <braunr> zone_gc is the only one
-
-
-# IRC, freenode, #hurd, 2011-04-19
-
-    <braunr> antrik: mcsim: i added back a new per-cpu layer as planned
-    <braunr>
-      http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=c629b2b9b149f118a30f0129bd8b7526b0302c22;hb=HEAD
-    <braunr> mcsim: btw, in mem_cache_reap(), you can clearly see there are two
-      loops, just as in zone_gc, to reduce contention and avoid deadlocks
-    <braunr> this is really common in memory allocators
-
-
-# IRC, freenode, #hurd, 2011-04-23
-
-    <mcsim> I've looked through some allocators and all of them use different
-      per cpu cache policy. AFAIK gnuhurd doesn't support multiprocessing, but
-      still multiprocessing must be kept in mind. So, what do you think what
-      kind of cpu caches is better? As for me I like variant with only per-cpu
-      caches (like in slqb).
-    <antrik> mcsim: well, have you looked at the allocator braunr wrote
-      himself? :-)
-    <antrik> I'm not sure I suggested that explicitly to you; but probably it
-      makes most sense to use that in gnumach
-
-
-# IRC, freenode, #hurd, 2011-04-24
-
-    <mcsim> antrik: Yes, I have. He uses both global and per cpu caches. But he
-      also suggested to look through slqb, where there are only per cpu
-      caches.\
-    <braunr> i don't remember slqb in detail
-    <braunr> what do you mean by "only per-cpu caches" ?
-    <braunr> a whole slab sytem for each cpu ?
-    <mcsim> I mean that there are no global queues in caches, but there are
-      special queues for each cpu.
-    <mcsim> I've just started investigating slqb's code, but I've read an
-      article on lwn about it. And I've read that it is used for zen kernel.
-    <braunr> zen ?
-    <mcsim> Here is this article http://lwn.net/Articles/311502/
-    <mcsim> Yes, this is linux kernel with some patches which haven't been
-      approved to torvald's tree
-    <mcsim> http://zen-kernel.org/
-    <braunr> i see
-    <braunr> well it looks nice
-    <braunr> but as for slub, the problem i can see is cross-CPU freeing
-    <braunr> and I think nick piggins mentions it
-    <braunr> piggin*
-    <braunr> this means that sometimes, objects are "burst-free" from one cpu
-      cache to another
-    <braunr> which has the same bad effects as in most other allocators, mainly
-      fragmentation
-    <mcsim> There is a special list for freeing object allocated for another
-      CPU
-    <mcsim> And garbage collector frees such object on his own
-    <braunr> so what's your question ?
-    <mcsim> It is described in the end of article.
-    <mcsim> What cpu-cache policy do you think is better to implement?
-    <braunr> at this point, any
-    <braunr> and even if we had a kernel that perfectly supports
-      multiprocessor, I wouldn't care much now
-    <braunr> it's very hard to evaluate such allocators
-    <braunr> slqb looks nice, but if you have the same amount of fragmentation
-      per slab as other allocators do (which is likely), you have tat amount of
-      fragmentation multiplied by the number of processors
-    <braunr> whereas having shared queues limit the problem somehow
-    <braunr> having shared queues mean you have a bit more contention
-    <braunr> so, as is the case most of the time, it's a tradeoff
-    <braunr> by the way, does pigging say why he "doesn't like" slub ? :)
-    <braunr> piggin*
-    <mcsim> http://lwn.net/Articles/311093/
-    <mcsim> here he describes what slqb is better.
-    <braunr> well it doesn't describe why slub is worse
-    <mcsim> but not very particularly 
-    <braunr> except for order-0 allocations
-    <braunr> and that's a form of fragmentation like i mentioned above
-    <braunr> in mach those problems have very different impacts
-    <braunr> the backend memory isn't physical, it's the kernel virtual space
-    <braunr> so the kernel allocator can request chunks of higher than order-0
-      pages
-    <braunr> physical pages are allocated one at a time, then mapped in the
-      kernel space
-    <mcsim> Doesn't order of page depend on buffer size?
-    <braunr> it does
-    <mcsim> And why does gnumach allocates higher than order-0 pages more?
-    <braunr> why more ?
-    <braunr> i didn't say more
-    <mcsim> And why in mach those problems have very different impact?
-    <braunr> ?
-    <braunr> i've just explained why :)
-    <braunr> 09:37 < braunr> physical pages are allocated one at a time, then
-      mapped in the kernel space
-    <braunr> "one at a time" means order-0 pages, even if you allocate higher
-      than order-0 chunks
-    <mcsim> And in Linux they allocated more than one at time because of
-      prefetching page reading?
-    <braunr> do you understand what virtual memory is ?
-    <braunr> linux allocators allocate "physical memory"
-    <braunr> mach kernel allocator allocates "virtual memory"
-    <braunr> so even if you allocate a big chunk of virtual memory, it's backed
-      by order-0 physical pages
-    <mcsim> yes, I understand this
-    <braunr> you don't seem to :/
-    <braunr> the problem of higher than order-0 page allocations is
-      fragmentation
-    <braunr> do you see why ?
-    <mcsim> yes
-    <braunr> so
-    <braunr> fragmentation in the kernel space is less likely to create issues
-      than it does in physical memory
-    <braunr> keep in mind physical memory is almost always full because of the
-      page cache
-    <braunr> and constantly under some pressure
-    <braunr> whereas the kernel space is mostly empty
-    <braunr> so allocating higher then order-0 pages in linux is more dangerous
-      than it is in Mach or BSD
-    <mcsim> ok
-    <braunr> on the other hand, linux focuses pure performance, and not having
-      to map memory means less operations, less tlb misses, quicker allocations
-    <braunr> the Mach VM must map pages "one at a time", which can be expensive
-    <braunr> it should be adapted to handle multiple page sizes (e.g. 2 MiB) so
-      that many allocations can be made with few mappings
-    <braunr> but that's not easy
-    <braunr> as always: tradeoffs
-    <mcsim> There are other benefits of physical allocating. In big DMA
-      transfers can be needed few continuous physical pages. How does mach
-      handles such cases?
-    <braunr> gnumach does that awfully
-    <braunr> it just reserves the whole DMA-able memory and uses special
-      allocation functions on it, IIRC
-    <braunr> but kernels which have a MAch VM like memory sytem such as BSDs
-      have cleaner methods
-    <braunr> NetBSD provides a function to allocate contiguous physical memory
-    <braunr> with many constraints
-    <braunr> FreeBSD uses a binary buddy system like Linux
-    <braunr> the fact that the kernel allocator uses virtual memory doesn't
-      mean the kernel has no mean to allocate contiguous physical memory ...
-
-
-# IRC, freenode, #hurd, 2011-05-02
-
-    <braunr> hm nice, my allocator uses less memory than glibc (squeeze
-      version) on both 32 and 64 bits systems
-    <braunr> the new per-cpu layer is proving effective
-    <neal> braunr: Are you reimplementation malloc?
-    <braunr> no
-    <braunr> it's still the slab allocator for mach, but tested in userspace
-    <braunr> so i wrote malloc wrappers
-    <neal> Oh.
-    <braunr> i try to heavily test most of my code in userspace now
-    <neal> it's easier :-)
-    <neal> I agree
-    <braunr> even the physical memory allocator has been implemented this way
-    <neal> is this your mach version?
-    <braunr> virtual memory allocation will follow
-    <neal> or are you working on gnu mach?
-    <braunr> for now it's my version
-    <braunr> but i intend to spend the summer working on ipc port names
-      management
-
-[[rework_gnumach_IPC_spaces]].
-
-    <braunr> and integrate the result in gnu mach
-    <neal> are you keeping the same user-space API?
-    <neal> Or are you experimenting with something new?
-    <antrik> braunr: to be fair, it's not terribly hard to use less memory than
-      glibc :-)
-    <braunr> yes
-    <braunr> antrik: well ptmalloc3 received some nice improvements
-    <braunr> neal: the goal is to rework some of the internals only
-    <braunr> neal: namely, i simply intend to replace the splay tree with a
-      radix tree
-    <antrik> braunr: the glibc allocator is emphasising performace, unlike some
-      other allocators that trade some performance for much better memory
-      utilisation...
-    <antrik> ptmalloc3?
-    <braunr> that's the allocator used in glibc
-    <braunr> http://www.malloc.de/en/
-    <antrik> OK. haven't seen any recent numbers... the comparision I have in
-      mind is many years old...
-    <braunr> i also made some additions to my avl and red-black trees this week
-      end, which finally make them suitable for almost all generic uses
-    <braunr> the red-black tree could be used in e.g. gnu mach to augment the
-      linked list used in vm maps
-    <braunr> which is what's done in most modern systems
-    <braunr> it could also be used to drop the overloaded (and probably over
-      imbalanced) page cache hash table
-
-[[gnumach_vm_map_red-black_trees]].
-
-
-# IRC, freenode, #hurd, 2011-05-03
-
-    <mcsim> antrik: How should I start porting? Have I just include rbraun's
-      allocator to gnumach and make it compile?
-    <antrik> mcsim: well, basically yes I guess... but you will have to look at
-      the code in question first before we know anything more specific :-)
-    <antrik> I guess braunr might know better how to start, but he doesn't
-      appear to be here :-(
-    <braunr> mcsim: you can't juste put my code into gnu mach and make it run,
-      it really requires a few careful changes
-    <braunr> mcsim: you will have to analyse how the current zone allocator
-      interacts with regard to locking
-    <braunr> if it is used in interrupt handlers
-    <braunr> what kind of locks it should use instead of the pthread stuff
-      available in userspace
-    <braunr> you will have to change the reclamiing policy, so that caches are
-      reaped on demand
-    <braunr> (this basically boils down to calling the new reclaiming function
-      instead of zone_gc())
-    <braunr> you must be careful about types too
-    <braunr> there is work to be done ;)
-    <braunr> (not to mention the obvious about replacing all the calls to the
-      zone allocator, and testing/debugging afterwards)
-
-
-# IRC, freenode, #hurd, 2011-07-14
-
-    <braunr> can you make your patch available ?
-    <mcsim> it is available in gnumach repository at savannah 
-    <mcsim> tree mplaneta/libbraunr/master
-    <braunr> mcsim: i'll test your branch
-    <mcsim> ok. I'll give you a link in a minute
-    <braunr> hm why balloc ?
-    <mcsim> Braun's allocator
-    <braunr> err
-    <braunr>
-      http://git.sceen.net/rbraun/x15mach.git/?a=blob;f=kern/kmem.c;h=37173fa0b48fc9d7e177bf93de531819210159ab;hb=HEAD
-    <braunr> mcsim: this is the interface i had in mind for a kernel version :)
-    <braunr> very similar to the original slab allocator interface actually
-    <braunr> well, you've been working
-    <mcsim> But I have a problem with this patch. When I apply it to gnumach
-      code from debian repository. I have to make a change in file ramdisk.c
-      with sed -i 's/kernel_map/\&kernel_map/' device/ramdisk.c
-    <mcsim> because in git repository there is no such file
-    <braunr> mcsim: how do you configure the kernel before building ?
-    <braunr> mcsim: you should keep in touch more often i think, so that you
-      get feedback from us and don't spend too much time "off course"
-    <mcsim> I didn't configure it. I just run dpkg-buildsource -b.
-    <braunr> oh you build the debian package
-    <braunr> well my version was by configure --enable-kdb --enable-rtl8139
-    <braunr> and it seems stuck in an infinite loop during bootstrap
-    <mcsim> and printf doesn't work. The first function called by c_boot_entry
-      is printf(version).
-    <braunr> mcsim: also, you're invited to get the x15mach version of my
-      files, which are gplv2+ licensed
-    <braunr> be careful of my macros.h file, it can conflict with the
-      macros_help.h file from gnumach iirc
-    <mcsim> There were conflicts with MACRO_BEGIN and MACRO_END. But I solved
-      it
-    <braunr> ok
-    <braunr> it's tricky
-    <braunr> mcsim: try to find where the first use of the allocator is made
-
-
-# IRC, freenode, #hurd, 2011-07-22
-
-    <mcsim> braunr, hello. Kernel with your allocator already compiles and
-      runs. There still some problems, but, certainly, I'm on the final stage
-      already. I hope I'll finish in a few days.
-    <tschwinge> mcsim: Oh, cool!  Have you done some measurements already?
-    <mcsim> Not yet
-    <tschwinge> OK.
-    <tschwinge> But if it able to run a GNU/Hurd system, then that already is
-      something, a big milestone!
-    <braunr> nice
-    <braunr> although you'll probably need to tweak the garbage collecting
-      process
-    <mcsim> tschwinge: thanks
-    <mcsim> braunr: As back-end for allocating memory I use
-      kmem_alloc_wired. But in zalloc was an opportunity to use as back-end
-      kmem_alloc_pageable. Although there was no any zone that used
-      kmem_alloc_pageable. Do I need to implement this functionality?
-    <braunr> mcsim: do *not* use kmem_alloc_pageable()
-    <mcsim> braunr: Ok. This is even better)
-    <braunr> mcsim: in x15, i've taken this even further: there is *no* kernel
-      vm object, which means all kernel memory is wired and unmanaged
-    <braunr> making it fast and safe
-    <braunr> pageable kernel memory was useful back when RAM was really scarce
-    <braunr> 20 years ago
-    <braunr> but it's a source of deadlock
-    <mcsim> Indeed. I'll won't use kmem_alloc_pageable.
-
-
-# IRC, freenode, #hurd, 2011-08-09
-
-    < braunr> mcsim: what's the "bug related to MEM_CF_VERIFY" you refer to in
-      one of your commits ?
-    < braunr> mcsim: don't use spin_lock_t as a member of another structure
-    < mcsim> braunr: I confused with types in *_verify functions, so they
-      didn't work. Than I fixed it in the commit you mentioned.
-    < braunr> in gnumach, most types are actually structure pointers
-    < braunr> use simple_lock_data_t
-    < braunr> mcsim: ok
-    < mcsim> > use simple_lock_data_t
-    < mcsim> braunr: ok
-    < braunr> mcsim: don't make too many changes to the code base, and if
-      you're unsure, don't hesitate to ask
-    < braunr> also, i really insist you rename the allocator, as done in x15
-      for example
-      (http://git.sceen.net/rbraun/x15mach.git/?a=blob;f=vm/kmem.c), instead of
-      a name based on mine :/
-    < mcsim> braunr: Ok. It was just work name. When I finish I'll rename the
-      allocator.
-    < braunr> other than that, it's nice to see progress
-    < braunr> although again, it would be better with some reports along
-    < braunr> i won't be present at the meeting tomorrow unfortunately, but you
-      should use those to report the status of your work
-    < mcsim> braunr: You've said that I have to tweak gc process. Did you mean
-      to call mem_gc() when physical memory ends instead of calling it every x
-      seconds? Or something else?
-    < braunr> there are multiple topics, alhtough only one that really matters
-    < braunr> study how zone_gc was called
-    < braunr> reclaiming memory should happen when there is pressure on the VM
-      subsystem
-    < braunr> but it shouldn't happen too ofte, otherwise there is trashing
-    < braunr> and your caches become mostly useless
-    < braunr> the original slab allocator uses a 15-second period after a
-      reclaim during which reclaiming has no effect
-    < braunr> this allows having a somehow stable working set for this duration
-    < braunr> the linux slab allocator uses 5 seconds, but has a more
-      complicated reclaiming mechanism
-    < braunr> it releases memory gradually, and from reclaimable caches only
-      (dentry for example)
-    < braunr> for x15 i intend to implement the original 15 second interval and
-      then perform full reclaims
-    < mcsim> In zalloc mem_gc is called by vm_pageout_scan, but not often than
-      once a second.
-    < mcsim> In balloc I've changed interval to once in 15 seconds.
-    < braunr> don't use the code as it is
-    < braunr> the version you've based your work on was meant for userspace
-    < braunr> where there isn't memory pressure
-    < braunr> so a timer is used to trigger reclaims at regular intervals
-    < braunr> it's different in a kernel
-    < braunr> mcsim: where did you see vm_pageout_scan call the zone gc once a
-      second ?
-    < mcsim> vm_pageout_scan calls consider_zone_gc and consider_zone_gc checks
-      if second is passed.
-    < braunr> where ?
-    < mcsim> Than zone_gc can be called.
-    < braunr> ah ok, it's in zaclloc.c then
-    < braunr> zalloc.c
-    < braunr> yes this function is fine
-    < mcsim> so old gc didn't consider vm pressure. Or I missed something.
-    < braunr> it did
-    < mcsim> how?
-    < braunr> well, it's called by the pageout daemon
-    < braunr> under memory pressure
-    < braunr> so it's fine
-    < mcsim> so if mem_gc is called by pageout daemon is it fine?
-    < braunr> it must be changed to do something similar to what
-      consider_zone_gc does
-    < mcsim> It does. mem_gc does the same work as consider_zone_gc and
-      zone_gc.
-    < braunr> good
-    < mcsim> so gc process is fine?
-    < braunr> should be
-    < braunr> i see mem.c only includes mem.h, which then includes other
-      headers
-    < braunr> don't do that
-    < braunr> always include all the headers you need where you need them
-    < braunr> if you need avltree.h in both mem.c and mem.h, include it in both
-      files
-    < braunr> and by the way, i recommend you use the red black tree instead of
-      the avl type
-    < braunr> (it's the same interface so it shouldn't take long)
-    < mcsim> As to report. If you won't be present at the meeting, I can tell
-      you what I have to do now.
-    < braunr> sure
-    < braunr> in addition, use GPLv2 as the license, teh BSD one is meant for
-      the userspace version only
-    < braunr> GPLv2+ actually
-    < braunr> hm you don't need list.c
-    < braunr> it would only add dead code
-    < braunr> "Zone for dynamical allocator", don't mix terms
-    < braunr> this comment refers to a vm_map, so call it a map
-    < mcsim> 1. Change constructor for kentry_alloc_cache.
-    < mcsim> 2. Make measurements.
-    < mcsim> +
-    < mcsim> 3. Use simple_lock_data_t
-    < mcsim> 4. Replace license
-    < braunr> kentry_alloc_cache <= what is that ?
-    < braunr> cache for kernel map entries in vm_map ?
-    < braunr> the comment for mem_cpu_pool_get doesn't apply in gnumach, as
-      there is no kernel preemption
-
-[[microkernel/mach/gnumach/preemption]].
-
-    < braunr> "Don't attempt mem GC more frequently than hz/MEM_GC_INTERVAL
-      times a second.
-    < braunr> "
-    < mcsim> sorry. I meant vm_map_kentry_cache
-    < braunr> hm nothing actually about this comment
-    < braunr> mcsim: ok
-    < braunr> yes kernel map entries need special handling
-    < braunr> i don't know how it's done in gnumach though
-    < braunr> static preallocation ?
-    < mcsim> yes
-    < braunr> that's ugly :p
-    < mcsim> but it uses dynamic allocation further even for vm_map kernel
-      entries
-    < braunr> although such bootstrapping issues are generally difficult to
-      solve elegantly
-    < braunr> ah
-    < mcsim> now I use only static allocation, but I'll add dynamic allocation
-      too
-    < braunr> when you have time, mind the coding style (convert everything to
-      gnumach style, which mostly implies using tabs instead of 4-spaces
-      indentation)
-    < braunr> when you'll work on dynamic allocation for the kernel map
-      entries, you may want to review how it's done in x15
-    < braunr> the mem_source type was originally intended for that purpose, but
-      has slightly changed once the allocator was adapted to work in my kernel
-    < mcsim> ok
-    < braunr> vm_map_kentry_zone is the only zone created with ZONE_FIXED
-    < braunr> and it is zcram()'ed immediately after
-    < braunr> so you can consider it a statically allocated zone
-    < braunr> in x15 i use another strategy: there is a special kernel submap
-      named kentry_map which contains only one map entry (statically allocated)
-    < braunr> this map is the backend (mem_source) for the kentry_cache
-    < braunr> the kentry_cache is created with a special flag that tells it
-      memory can't be reclaimed
-    < braunr> when the cache needs to grow, the single map entry is extended to
-      cover the allocated memory
-    < braunr> it's similar to the way pmap_growkernel() works for kernel page
-      table pages
-    < braunr> (and is actually based on that idea)
-    < braunr> it's a compromise between full static and dynamic allocation
-      types
-    < braunr> the advantage is that the allocator code can be used (so there is
-      no need for a special allocator like in netbsd)
-    < braunr> the drawback is that some resources can never be returned to
-      their source (and under peaks, the amount of unfreeable resources could
-      become large, but this is unexpected)
-    < braunr> mcsim: for now you shouldn't waste your time with this
-    < braunr> i see the number of kernel map entries is fixed at 256
-    < braunr> and i've never seen the kernel use more than around 30 entries
-    < mcsim> Do you think that I have to left this problem to the end?
-    < braunr> yes
-
-
-# IRC, freenode, #hurd, 2011-08-11
-
-    < mcsim> braunr: Hello. Can you give me an advice how can I make
-      measurements better?
-    < braunr> mcsim: what kind of measurements
-    < mcsim> braunr: How much is your allocator better than zalloc.
-    < braunr> slightly :p
-    < braunr> that's why i never took the time to put it in gnumach
-    < mcsim> braunr: Just I thought that there are some rules or
-      recommendations of such measurements. Or I can do them any way I want?
-    < braunr> mcsim: i don't know
-    < braunr> mcsim: benchmarking is an art of its own, and i don't even know
-      how to use the bits of profiling code available in gnumach (if it still
-      works)
-    < antrik> mcsim: hm... are you saying you already have a running system
-      with slab allocator?... :-)
-    < braunr> mcsim: the main advantage i can see is the removal of many
-      arbitrary hard limits
-    < mcsim> antrik: yes
-    < antrik> \o/
-    < antrik> nice work!
-    < braunr> :)
-    < braunr> the cpu layer should also help a bit, but it's hard to measure
-    < braunr> i guess it could be seen on the ipc path for very small buffers
-    < mcsim> antrik: Thanks. But I still have to 1. Change constructor for
-      kentry_alloc_cache. and 2. Make measurements.
-    < braunr> and polish the whole thing :p
-    < antrik> mcsim: I'm not sure this can be measured... the performance
-      differente in any real live usage is probably just a few percent at most
-      -- it's hard to construct a benchmark giving enough precision so it's not
-      drowned in noise...
-    < antrik> perhaps it conserves some memory -- but that too would be hard to
-      measure I fear
-    < braunr> yes
-    < braunr> there *should* be better allocation times, less fragmentation,
-      better accounting ... :)
-    < braunr> and no arbitrary limits !
-    < antrik> :-)
-    < braunr> oh, and the self debugging features can be nice too
-    < mcsim> But I need to prove that my work wasn't useless
-    < braunr> well it wasn't, but that's hard to measure
-    < braunr> it's easy to prove though, since there are additional features
-      that weren't present in the zone allocator
-    < mcsim> Ok. If there are some profiling features in gnumach can you give
-      me a link with their description?
-    < braunr> mcsim: sorry, no
-    < braunr> mcsim: you could still write the basic loop test, which counts
-      the number of allocations performed in a fixed time interval
-    < braunr> but as it doesn't match many real life patterns, it won't be very
-      useful
-    < braunr> and i'm afraid that if you consider real life patterns, you'll
-      see how negligeable the improvement can be compared to other operations
-      such as memory copies or I/O (ouch)
-    < mcsim> Do network drivers use this allocator?
-    < mcsim> ok. I'll scrape up some test and than I'll report results.
-
-
-# IRC, freenode, #hurd, 2011-08-26
-
-    < mcsim> hello. Are there any analogs of copy_to_user and copy_from_user in
-      linux for gnumach?
-    < mcsim> Or how can I determine memory map if I know address? I need this
-      for vm_map_copyin
-    < guillem> mcsim: vm_map_lookup_entry?
-    < mcsim> guillem: but I need to transmit map to this function and it will
-      return an entry which contains specified address.
-    < mcsim> And I don't know what map have I transmit.
-    < mcsim> I need to transfer static array from kernel to user. What map
-      contains static data?
-    < antrik> mcsim: Mach doesn't have copy_{from,to}_user -- instead, large
-      chunks of data are transferred as out-of-line data in IPC messages
-      (i.e. using VM magic)
-    < mcsim> antrik: can you give me an example? I just found using
-      vm_map_copyin in host_zone_info.
-    < antrik> no idea what vm_map_copyin is to be honest...
-
-
-# IRC, freenode, #hurd, 2011-08-27
-
-    < braunr> mcsim: the primitives are named copyin/copyout, and they are used
-      for messages with inline data
-    < braunr> or copyinmsg/copyoutmsg
-    < braunr> vm_map_copyin/out should be used for chunks larger than a page
-      (or roughly a page)
-    < braunr> also, when writing to a task space, see which is better suited:
-      vm_map_copyout or vm_map_copy_overwrite
-    < mcsim> braunr: and what will be src_map for vm_map_copyin/out?
-    < braunr> the caller map
-    < braunr> which you can get with current_map() iirc
-    < mcsim> braunr: thank you
-    < braunr> be careful not to leak anything in the transferred buffers
-    < braunr> memset() to 0 if in doubt
-    < mcsim> braunr:ok
-    < braunr> antrik: vm_map_copyin() is roughly vm_read()
-    < antrik> braunr: what is it used for?
-    < braunr> antrik: 01:11 < antrik> mcsim: Mach doesn't have
-      copy_{from,to}_user -- instead, large chunks of data are transferred as
-      out-of-line data in IPC messages (i.e. using VM magic)
-    < braunr> antrik: that "VM magic" is partly implemented using vm_map_copy*
-      functions
-    < antrik> braunr: oh, you mean it doesn't actually copy data, but only page
-      table entries? if so, that's *not* really comparable to
-      copy_{from,to}_user()...
-
-
-# IRC, freenode, #hurd, 2011-08-28
-
-    < braunr> antrik: the equivalent of copy_{from,to}_user are
-      copy{in,out}{,msg}
-    < braunr> antrik: but when the data size is about a page or more, it's
-      better not to copy, of course
-    < antrik> braunr: it's actually not clear at all that it's really better to
-      do VM magic than to copy...
-
-
-# IRC, freenode, #hurd, 2011-08-29
-
-    < braunr> antrik: at least, that used to be the general idea, and with a
-      simpler VM i suspect it's still true
-    < braunr> mcsim: did you progress on your host_zone_info replacement ?
-    < braunr> mcsim: i think you should stick to what the original
-      implementation did
-    < braunr> which is making an inline copy if caller provided enough space,
-      using kmem_alloc_pageable otherwise
-    < braunr> specify ipc_kernel_map if using kmem_alloc_pageable
-    < mcsim> braunr: yes. And it works. But I use kmem_alloc, not pageable. Is
-      it worse?
-    < mcsim> braunr: host_zone_info replacement is pushed to savannah
-      repository. 
-    < braunr> mcsim: i'll have a look
-    < mcsim> braunr: I've pushed one more commit just now, which has attitude
-      to host_zone_info.
-    < braunr> mem_alloc_early_init should be renamed mem_bootstrap
-    < mcsim> ok
-    < braunr> mcsim: i don't understand your call to kmem_free
-    < mcsim> braunr: It shouldn't be there?
-    < braunr> why should it be there ?
-    < braunr> you're freeing what the copy object references
-    < braunr> it's strange that it even works
-    < braunr> also, you shouldn't pass infop directly as the copy object
-    < braunr> i guess you get a warning for that
-    < braunr> do what the original code does: use an intermediate copy object
-      and a cast
-    < mcsim> ok
-    < braunr> another error (without consequence but still, you should mind it)
-    < braunr> simple_lock(&mem_cache_list_lock);
-    < braunr> [...]
-    < braunr> kr = kmem_alloc(ipc_kernel_map, &info, info_size);
-    < braunr> you can't hold simple locks while allocating memory
-    < braunr> read how the original implementation works around this
-    < mcsim> ok
-    < braunr> i guess host_zone_info assumes the zone list doesn't change much
-      while unlocked
-    < braunr> or that's it's rather unimportant since it's for debugging
-    < braunr> a strict snapshot isn't required
-    < braunr> list_for_each_entry(&mem_cache_list, cache, node) max_caches++;
-    < braunr> you should really use two separate lines for readability
-    < braunr> also, instead of counting each time, you could just maintain a
-      global counter
-    < braunr> mcsim: use strncpy instead of strcpy for the cache names
-    < braunr> not to avoid overflow but rather to clear the unused bytes at the
-      end of the buffer
-    < braunr> mcsim: about kmem_alloc vs kmem_alloc_pageable, it's a minor
-      issue
-    < braunr> you're handing off debugging data to a userspace application
-    < braunr> a rather dull reporting tool in most cases, which doesn't require
-      wired down memory
-    < braunr> so in order to better use available memory, pageable memory
-      should be used
-    < braunr> in the future i guess it could become a not-so-minor issue though
-    < mcsim> ok. I'll fix it
-    < braunr> mcsim: have you tried to run the kernel with MC_VERIFY always on
-      ?
-    < braunr> MEM_CF_VERIFY actually
-    < mcsim1> yes.
-    < braunr> oh
-    < braunr> nothing wrong 
-    < braunr> ?
-    < mcsim1> it is always set
-    < braunr> ok
-    < braunr> ah, you set it in macros.h ..
-    < braunr> don't
-    < braunr> put it in mem.c if you want, or better, make it a compile-time
-      option
-    < braunr> macros.h is a tiny macro library, it shouldn't define such
-      unrelated options
-    < mcsim1> ok.
-    < braunr> mcsim1: did you try fault injection to make sure the checking
-      code actually works and how it behaves when an error occurs ?
-    < mcsim1> I think that when I finish I'll merge files cpu.h and macros.h
-      with mem.c
-    < braunr> yes that would simplify things
-    < mcsim1> Yes. When I confused with types mem_buf_fill worked wrong and
-      panic occurred.
-    < braunr> very good
-    < braunr> have you progressed concerning the measurements you wanted to do
-      ?
-    < mcsim1> not much.
-    < braunr> ok
-    < mcsim1> I think they will be ready in a few days.
-    < antrik> what measurements are these?
-    < mcsim1> braunr: What maximal size for static data and stack in kernel?
-    < braunr> what do you mean ?
-    < braunr> kernel stacks are one page if i'm right
-    < braunr> static data (rodata+data+bss) are limited by grub bugs only :)
-    < mcsim1> braunr: probably they are present, because when I created too big
-      array I couldn't boot kernel
-    < braunr> local variable or static ?
-    < mcsim1> static
-    < braunr> how large ?
-    < mcsim1> 4Mb
-    < braunr> hm
-    < braunr> it's not a grub bug then
-    < braunr> i was able to embed as much as 32 MiB in x15 while doing this
-      kind of tests
-    < braunr> I guess it's the gnu mach boot code which only preallocates one
-      page for the initial kernel mapping
-    < braunr> one PTP (page table page) maps 4 MiB
-    < braunr> (x15 does this completely dynamically, unlike mach or even
-      current BSDs)
-    < mcsim1> antrik: First I want to measure time of each cache
-      creation/allocation/deallocation and then compile kernel.
-    < braunr> cache creation is irrelevant
-    < braunr> because of the cpu pools in the new allocator, you should test at
-      least two different allocation patterns
-    < braunr> one with quick allocs/frees
-    < braunr> the other with large numbers of allocs then their matching frees
-    < braunr> (larger being at least 100)
-    < braunr> i'd say the cpu pool layer is the real advantage over the
-      previous zone allocator
-    < braunr> (from a performance perspective)
-    < mcsim1> But there is only one cpu
-    < braunr> it doesn't matter
-    < braunr> it's stil a very effective cache
-    < braunr> in addition to reducing contention
-    < braunr> compare mem_cpu_pool_pop() against mem_cache_alloc_from_slab()
-    < braunr> mcsim1: work is needed to polish the whole thing, but getting it
-      actually working is a nice achievement for someone new on the project
-    < braunr> i hope it helped you learn about memory allocation, virtual
-      memory, gnu mach and the hurd in general :)
-    < antrik> indeed :-)
-
-
-# IRC, freenode, #hurd, 2011-09-06
-
-    [some performance testing]
-    <braunr> i'm not sure such long tests are relevant but let's assume balloc
-      is slower
-    <braunr> some tuning is needed here
-    <braunr> first, we can see that slab allocation occurs more often in balloc
-      than page allocation does in zalloc
-    <braunr> so yes, as slab allocation is slower (have you measured which part
-      actually is slow ? i guess it's the kmem_alloc call)
-    <braunr> the whole process gets a bit slower too
-    <mcsim> I used alloc_size = 4096 for zalloc
-    <braunr> i don't know what that is exactly
-    <braunr> but you can't hold 500 16 bytes buffers in a page so zalloc must
-      have had free pages around for that
-    <mcsim> I use kmem_alloc_wired
-    <braunr> if you have time, measure it, so that we know how much it accounts
-      for
-    <braunr> where are the results for dealloc ?
-    <mcsim> I can't give you result right now because internet works very
-      bad. But for first DEALLOC result are the same, exept some cases when it
-      takes balloc for more than 1000 ticks
-    <braunr> must be the transfer from the cpu layer to the slab layer
-    <mcsim> as to kmem_alloc_wired. I think zalloc uses this function too for
-      allocating objects in zone I test.
-    <braunr> mcsim: yes, but less frequently, which is why it's faster
-    <braunr> mcsim: another very important aspect that should be measured is
-      memory consumption, have you looked into that ?
-    <mcsim> I think that I made too little iterations in test SMALL
-    <mcsim> If I increase constant SMALL_TESTS will it be good enough?
-    <braunr> mcsim: i don't know, try both :)
-    <braunr> if you increase the number of iterations, balloc average time will
-      be lower than zalloc, but this doesn't remove the first long
-      initialization step on the allocated slab
-    <mcsim> SMALL_TESTS to 500, I mean
-    <braunr> i wonder if maintaining the slabs sorted through insertion sort is
-      what makes it slow
-    <mcsim> braunr: where do you sort slabs? I don't see this.
-    <braunr> mcsim: mem_cache_alloc_from_slab and its free counterpart
-    <braunr> mcsim: the mem_source stuff is useless in gnumach, you can remove
-      it and directly call the kmem_alloc/free functions
-    <mcsim> But I have to make special allocator for kernel map entries.
-    <braunr> ah right
-    <mcsim> btw. It turned out that 256 entries are not enough.
-    <braunr> that's weird
-    <braunr> i'll make a patch so that the mem_source code looks more like what
-      i have in x15 then
-    <braunr> about the results, i don't think the slab layer is that slow
-    <braunr> it's the cpu_pool_fill/drain functions that take time
-    <braunr> they preallocate many objects (64 for your objects size if i'm
-      right) at once
-    <braunr> mcsim: look at the first result page: some times, a number around
-      8000 is printed
-    <braunr> the common time (ticks, whatever) for a single object is 120
-    <braunr> 8132/120 is 67, close enough to the 64 value
-    <mcsim> I forgot about SMALL tests here are they:
-      http://paste.debian.net/128533/ (balloc) http://paste.debian.net/128534/
-      (zalloc)
-    <mcsim> braunr: why do you divide 8132 by 120?
-    <braunr> mcsim: to see if it matches my assumption that the ~8000 number
-      matches the cpu_pool_fill call
-    <mcsim> braunr: I've got it
-    <braunr> mcsim: i'd be much interested in the dealloc results if you can
-      paste them too
-    <mcsim> dealloc: http://paste.debian.net/128589/
-      http://paste.debian.net/128590/
-    <braunr> mcsim: thanks
-    <mcsim> second dealloc: http://paste.debian.net/128591/
-      http://paste.debian.net/128592/
-    <braunr> mcsim: so the main conclusion i retain from your tests is that the
-      transfers from the cpu and the slab layers are what makes the new
-      allocator a bit slower
-    <mcsim> OPERATION_SMALL dealloc: http://paste.debian.net/128593/
-      http://paste.debian.net/128594/
-    <braunr> mcsim: what needs to be measured now is global memory usage
-    <mcsim> braunr: data from /proc/vmstat after kernel compilation will be
-      enough?
-    <braunr> mcsim: let me check
-    <braunr> mcsim: no it won't do, you need to measure kernel memory usage
-    <braunr> the best moment to measure it is right after zone_gc is called
-    <mcsim> Are there any facilities in gnumach for memory measurement?
-    <braunr> it's specific to the allocators
-    <braunr> just count the number of used pages
-    <braunr> after garbage collection, there should be no free page, so this
-      should be rather simple
-    <mcsim> ok
-    <mcsim> braunr: When I measure memory usage in balloc, what formula is
-      better cache->nr_slabs * cache->bufs_per_slab * cache->buf_size or
-      cache->nr_slabs * cache->slab_size?
-    <braunr> the latter
-
-
-# IRC, freenode, #hurd, 2011-09-07
-
-    <mcsim> braunr: I've disabled calling of mem_cpu_pool_fill and allocator
-      became faster
-    <braunr> mcsim: sounds nice
-    <braunr> mcsim: i suspect the free path might not be as fast though
-    <mcsim> results for first calling: http://paste.debian.net/128639/ second:
-      http://paste.debian.net/128640/ and with many alloc/free:
-      http://paste.debian.net/128641/
-    <braunr> mcsim: thanks
-    <mcsim> best result are for second call: average time decreased from 159.56
-      to 118.756
-    <mcsim> First call slightly worse, but this is because I've added some
-      profiling code
-    <braunr> i still see some ~8k lines in 128639
-    <braunr> even some around ~12k
-    <mcsim> I think this is because of mem_cache_grow I'm investigating it now
-    <braunr> i guess so too
-    <mcsim> I've measured time for first call in cache and from about 22000
-      mem_cache_grow takes 20000
-    <braunr> how did you change the code so that it doesn't call
-      mem_cpu_pool_fill ?
-    <braunr> is the cpu layer still used ?
-    <mcsim> http://paste.debian.net/128644/
-    <braunr> don't forget the free path
-    <braunr> mcsim: anyway, even with the previous slightly slower behaviour we
-      could observe, the performance hit is negligible
-    <mcsim> Is free path a compilation? (I'm sorry for my english)
-    <braunr> mcsim: mem_cache_free
-    <braunr> mcsim: the last two measurements i'd advise are with big (>4k)
-      object sizes and, really, kernel allocator consumption
-    <mcsim> http://paste.debian.net/128648/ http://paste.debian.net/128646/
-      http://paste.debian.net/128649/ (first, second, small)
-    <braunr> mcsim: these numbers are closer to the zalloc ones, aren't they ?
-    <mcsim> deallocating slighty faster too
-    <braunr> it may not be the case with larger objects, because of the use of
-      a tree
-    <mcsim> yes, they are closer
-    <braunr> but then, i expect some space gains
-    <braunr> the whole thing is about compromise
-    <mcsim> ok. I'll try to measure them today. Anyway I'll post result and you
-      could read them in the morning
-    <braunr> at least, it shows that the zone allocator was actually quite good
-    <braunr> i don't like how the code looks, there are various hacks here and
-      there, it lacks self inspection features, but it's quite good
-    <braunr> and there was little room for true improvement in this area, like
-      i told you :)
-    <braunr> (my allocator, like the current x15 dev branch, focuses on mp
-      machines)
-    <braunr> mcsim: thanks again for these numbers
-    <braunr> i wouldn't have had the courage to make the tests myself before
-      some time eh
-    <mcsim> braunr: hello. Look at the small_4096 results
-      http://paste.debian.net/128692/ (balloc) http://paste.debian.net/128693/
-      (zalloc)
-    <braunr> mcsim: wow, what's that ? :)
-    <braunr> mcsim: you should really really include your test parameters in
-      the report
-    <braunr> like object size, purpose, and other similar details
-    <mcsim> for balloc I specified only object_size = 4096
-    <mcsim> for zalloc object_size = 4096, alloc_size = 4096, memtype = 0;
-    <braunr> the results are weird
-    <braunr> apart from the very strange numbers (e.g. 0 or 4429543648), none
-      is around 3k, which is the value matching a kmem_alloc call
-    <braunr> happy to see balloc behaves quite good for this size too
-    <braunr> s/good/well/
-    <mcsim> Oh
-    <mcsim> here is significant only first 101 lines
-    <mcsim> I'm sorry
-    <braunr> ok
-    <braunr> what does the test do again ? 10 loops of 10 allocs/frees ?
-    <mcsim> yes
-    <braunr> ok, so the only slowdown is at the beginning, when the slabs are
-      created
-    <braunr> the two big numbers (31844 and 19548) are strange
-    <mcsim> on the other hand time of compilation is 
-    <mcsim> balloc               zalloc
-    <mcsim> 38m28.290s  38m58.400s 
-    <mcsim> 38m38.240s  38m42.140s 
-    <mcsim> 38m30.410s  38m52.920s 
-    <braunr> what are you compiling ?
-    <mcsim> gnumach kernel
-    <braunr> in 40 mins ?
-    <mcsim> yes
-    <braunr> you lack hvm i guess
-    <mcsim> is it long?
-    <mcsim> I use real PC
-    <braunr> very
-    <braunr> ok
-    <braunr> so it's normal
-    <mcsim> in vm it was about 2 hours)
-    <braunr> the difference really is negligible
-    <braunr> ok i can explain the big numbers
-    <braunr> the slab size depends on the object size, and for 4k, it is 32k
-    <braunr> you can store 8 4k buffers in a slab (lines 2 to 9)
-    <mcsim> so we need use kmem_alloc_* 8 times?
-    <braunr> on line 10, the ninth object is allocated, which adds another slab
-      to the cache, hence the big number
-    <braunr> no, once for a size of 32k
-    <braunr> and then the free list is initialized, which means accessing those
-      pages, which means tlb misses
-    <braunr> i guess the zone allocator already has free pages available
-    <mcsim> I see
-    <braunr> i think you can stop performance measurements, they show the
-      allocator is slightly slower, but so slightly we don't care about that
-    <braunr> we need numbers on memory usage now (at the page level)
-    <braunr> and this isn't easy
-    <mcsim> For balloc I can get numbers if I summarize nr_slabs*slab_size for
-      each cache, isn't it?
-    <braunr> yes
-    <braunr> you can have a look at the original implementation, function
-      mem_info
-    <mcsim> And for zalloc I have to summarize of cur_size and then add
-      zalloc_wasted_space?
-    <braunr> i don't know :/
-    <braunr> i think the best moment to obtain accurate values is after zone_gc
-      removes the collected pages
-    <braunr> for both allocators, you could fill a stats structure at that
-      moment, and have an rpc copy that structure when a client tool requests
-      it
-    <braunr> concerning your tests, there is another point to have in mind
-    <braunr> the very first loop in your code shows a result of 31844
-    <braunr> although you disabled the call to cpu_pool_fill
-    <braunr> but the reason why it's so long is that the cpu layer still exists
-    <braunr> and if you look carefully, the cpu pools are created as needed on
-      the free path
-    <mcsim> I removed cpu_pool_drain
-    <braunr> but not cpu_pool_push/pop i guess
-    <mcsim> http://paste.debian.net/128698/
-    <braunr> see, you still allocate the cpu pool array on the free path
-    <mcsim> but I don't fill it
-    <braunr> that's not the point
-    <braunr> it uses mem_cache_alloc
-    <braunr> so in a call to free, you can also have an allocation, that can
-      potentially create a new slab
-    <mcsim> I see, so I have to create cpu_pool at the initialization stage?
-    <braunr> no, you can't
-    <braunr> there is a reason why they're allocated on the free path
-    <braunr> but since you don't have the fill/drain functions, i wonder if you
-      should just comment out the whole cpu layer code
-    <braunr> but hmm
-    <braunr> no really, it's not worth the effort
-    <braunr> even with drains/fills, the results are really good enough
-    <braunr> it makes the allocator smp ready
-    <braunr> we should just keep it that way
-    <braunr> mcsim: fyi, the reason why cpu pool arrays are allocated on the
-      free path is to avoid recursion
-    <braunr> because cpu pool arrays are allocated from caches just as almost
-      everything else
-    <mcsim> ok
-    <mcsim> summ of cur_size and then adding zalloc_wasted_space gives 0x4e1954
-    <mcsim> but this value isn't even page aligned
-    <mcsim> For balloc I've got 0x4c6000 0x4aa000 0x48d000
-    <braunr> hm can you report them in decimal, >> 10 so that values are in KiB
-      ?
-    <mcsim> 4888 4776 4660 for balloc
-    <mcsim> 4998 for zalloc
-    <braunr> when ?
-    <braunr> after boot ?
-    <mcsim> boot, compile, zone_gc
-    <mcsim> and then measure
-    <braunr> ?
-    <mcsim> I call garbage collector before measuring
-    <mcsim> and I measure after kernel compilation
-    <braunr> i thought it took you 40 minutes
-    <mcsim> for balloc I got results at night
-    <braunr> oh so you already got them
-    <braunr> i can't beleive the kernel only consumes 5 MiB
-    <mcsim> before gc it takes about 9052 Kib
-    <braunr> can i see the measurement code ?
-    <braunr> oh, and how much ram does your machine have ?
-    <mcsim> 758 mb
-    <mcsim> 768
-    <braunr> that's really weird
-    <braunr> i'd expect the kernel to consume much more space
-    <mcsim> http://paste.debian.net/128703/
-    <mcsim> it's only dynamically allocated data
-    <braunr> yes
-    <braunr> ipc ports, rights, vm map entries, vm objects, and lots of other
-      hanging buffers
-    <braunr> about how much is zalloc_wasted_space ?
-    <braunr> if it's small or constant, i guess you could ignore it
-    <mcsim> about 492
-    <mcsim> KiB
-    <braunr> well it's another good point, mach internal structures don't imply
-      much overhead
-    <braunr> or, the zone allocator is underused
-
-    <tschwinge> mcsim, braunr: The memory allocator project is coming along
-      good, as I get from your IRC messages?
-    <braunr> tschwinge: yes, but as expected, improvements are minor
-    <tschwinge> But at the very least it's now well-known, maintainable code.
-    <braunr> yes, it's readable, easier to understand, provides self inspection
-      and is smp ready
-    <braunr> there also are less hacks, but a few less features (there are no
-      way to avoid sleeping so it's unusable - and unused - in interrupt
-      handlers)
-    <braunr> is* no way
-    <braunr> tschwinge: mcsim did a good job porting and measuring it
-
-
-# IRC, freenode, #hurd, 2011-09-08
-
-    <antrik> braunr: note that the zalloc map used to be limited to 8 MiB or
-      something like that a couple of years ago... so it doesn't seems
-      surprising that the kernel uses "only" 5 MiB :-)
-    <antrik> (yes, we had a *lot* of zalloc panics back then...)
-
-
-# IRC, freenode, #hurd, 2011-09-14
-
-    <mcsim> braunr: hello. I've written a constructor for kernel map entries
-      and it can return resources to their source. Can you have a look at it?
-      http://paste.debian.net/130037/ If all be OK I'll push it tomorrow.
-    <braunr> mcsim: send the patch through mail please, i'll apply it on my
-      copy
-    <braunr> are you sure the cache is reapable ?
-    <mcsim> All slabs, except first I allocate with kmem_alloc_wired.
-    <braunr> how can you be sure ?
-    <mcsim> First slab I allocate during bootstrap and use pmap_steal_memory
-      and further I use only kmem_alloc_wired
-    <braunr> no, you use kmem_free
-    <braunr> in kentry_dealloc_cache()
-    <braunr> which probably creates a recursion
-    <braunr> using the constructor this way isn't a good idea
-    <braunr> constructors are good for preconstructed state (set counters to 0,
-      init lists and locks, that kind of things, not allocating memory)
-    <braunr> i don't think you should try to make this special cache reapable
-    <braunr> mcsim: keep in mind constructors are applied on buffers at *slab*
-      creation, not at object allocation
-    <braunr> so if you allocate a single slab with, say, 50 or 100 objects per
-      slab, kmem_alloc_wired would be called that number of times
-    <mcsim> why kentry_dealloc_cache can create recursion? kentry_dealloc_cache
-      is called only by mem_cache_reap.
-    <braunr> right
-    <braunr> but are you totally sure mem_cache_reap() can't be called by
-      kmem_free() ?
-    <braunr> i think you're right, it probably can't
-
-
-# IRC, freenode, #hurd, 2011-09-25
-
-    <mcsim> braunr: hello. I rewrote constructor for kernel entries and seems
-      that it works fine. I think that this was last milestone. Only moving of
-      memory allocator sources to more appropriate place and merge with main
-      branch left.
-    <braunr> mcsim: it needs renaming and reindenting too
-    <mcsim> for reindenting C-x h Tab in emacs will be enough?
-    <braunr> mcsim: make sure which style must be used first
-    <mcsim> and what should I rename and where better to place allocator? For
-      example, there is no lib directory, like in x15. Should I create it and
-      move list.* and rbtree.* to lib/ or move these files to util/ or
-      something else?
-    <braunr> mcsim: i told you balloc isn't a good name before, use something
-      more meaningful (kmem is already used in gnumach unfortunately if i'm
-      right)
-    <braunr> you can put the support files in kern/
-    <mcsim> what about vm_alloc?
-    <braunr> you should prefix it with vm_
-    <braunr> shouldn't
-    <braunr> it's a top level allocator
-    <braunr> on top of the vm system
-    <braunr> maybe mcache
-    <braunr> hm no
-    <braunr> maybe just km_
-    <mcsim> kern/km_alloc.*?
-    <braunr> no
-    <braunr> just km
-    <mcsim> ok.
-
-
-# IRC, freenode, #hurd, 2011-09-27
-
-    <mcsim> braunr: hello. When I've tried to speed of new allocator and bad
-      I've removed function mem_cpu_pool_fill. But you've said to undo this. I
-      don't understand why this function is necessary. Can you explain it,
-      please?
-    <mcsim> When I've tried to compare speed of new allocator and old*
-    <braunr> i'm not sure i said that
-    <braunr> i said the performance overhead is negligible
-    <braunr> so it's better to leave the cpu pool layer in place, as it almost
-      doesn't hurt
-    <braunr> you can implement the KMEM_CF_NO_CPU_POOL I added in the x15 mach
-      version
-    <braunr> so that cpu pools aren't used by default, but the code is present
-      in case smp is implemented
-    <mcsim> I didn't remove cpu pool layer. I've just removed filling of cpu
-      pool during creation of slab.
-    <braunr> how do you fill the cpu pools then ?
-    <mcsim> If object is freed than it is added to cpu poll
-    <braunr> so you don't fill/drain the pools ?
-    <braunr> you try to get/put an object and if it fails you directly fall
-      back to the slab layer ?
-    <mcsim> I drain them during garbage collection
-    <braunr> oh
-    <mcsim> yes
-    <braunr> you shouldn't touch the cpu layer during gc
-    <braunr> the number of objects should be small enough so that we don't care
-      much
-    <mcsim> ok. I can drain cpu pool at any other time if it is prohibited to
-      in mem_gc.
-    <mcsim> But why do we need to fill cpu poll during slab creation?
-    <mcsim> In this case allocation consist of: get object from slab -> put it
-      to cpu pool -> get it from cpu pool
-    <mcsim> I've just remove last to stages
-    <braunr> hm cpu pools aren't filled at slab creation
-    <braunr> they're filled when they're empty, and drained when they're full
-    <braunr> so that the number of objects they contain is increased/reduced to
-      a value suitable for the next allocations/frees
-    <braunr> the idea is to fall back as little as possible to the slab layer
-      because it requires the acquisition of the cache lock
-    <mcsim> oh. You're right. I'm really sorry. The point is that if cpu pool
-      is empty we don't need to fill it first
-    <braunr> uh, yes we do :)
-    <mcsim> Why cache locking is so undesirable? If we have free objects in
-      slabs locking will not take a lot if time.
-    <braunr> mcsim: it's undesirable on a smp system
-    <mcsim> ok.
-    <braunr> mcsim: and spin locks are normally noops on a up system
-    <braunr> which is the case in gnumach, hence the slightly better
-      performances without the cpu layer
-    <braunr> but i designed this allocator for x15, which only supports mp
-      systems :)
-    <braunr> mcsim: sorry i couldn't look at your code, sick first, busy with
-      server migration now (new server almost ready for xen hurds :))
-    <mcsim> ok.
-    <mcsim> I ended with allocator if didn't miss anything important:)
-    <braunr> i'll have a look soon i hope :)
-
-
-# IRC, freenode, #hurd, 2011-09-27
-
-    <antrik> braunr: would it be realistic/useful to check during GC whether
-      all "used" objects are actually in a CPU pool, and if so, destroy them so
-      the slab can be freed?...
-    <antrik> mcsim: BTW, did you ever do any measurements of memory
-      use/fragmentation?
-    <mcsim> antrik: I couldn't do this for zalloc
-    <antrik> oh... why not?
-    <antrik> (BTW, I would be interested in a comparision between using the CPU
-      layer, and bare slab allocation without CPU layer)
-    <mcsim> Result I've got were strange. It wasn't even aligned to page size.
-    <mcsim> Probably is it better to look into /proc/vmstat?
-    <mcsim> Because I put hooks in the code and probably I missed something
-    <antrik> mcsim: I doubt vmstat would give enough information to make any
-      useful comparision...
-    <braunr> antrik: isn't this draining cpu pools at gc time ?
-    <braunr> antrik: the cpu layer was found to add a slight overhead compared
-      to always falling back to the slab layer
-    <antrik> braunr: my idea is only to drop entries from the CPU cache if they
-      actually prevent slabs from being freed... if other objects in the slab
-      are really in use, there is no point in flushing them from the CPU cache
-    <antrik> braunr: I meant comparing the fragmentation with/without CPU
-      layer. the difference in CPU usage is probably negligable anyways...
-    <antrik> you might remember that I was (and still am) sceptical about CPU
-      layer, as I suspect it worsens the good fragmentation properties of the
-      pure slab allocator -- but it would be nice to actually check this :-)
-    <braunr> antrik: right
-    <braunr> antrik: the more i think about it, the more i consider slqb to be
-      a better solution ...... :>
-    <braunr> an idea for when there's time
-    <braunr> eh
-    <antrik> hehe :-)
-
-
-# IRC, freenode, #hurd, 2011-10-13
-
-    <braunr> mcsim: what's the current state of your gnumach branch ?
-    <mcsim> I've merged it with master in September
-    <braunr> yes i've seen that, but does it build and run fine ?
-    <mcsim> I've tested it on gnumach from debian repository, but for building
-      I had to make additional change in device/ramdisk.c, as I mentioned.
-    <braunr> mcsim: why ?
-    <mcsim> And it runs fine for me.
-    <braunr> mcsim: why did you need to make other changes ?
-    <mcsim> because there is a patch which comes with from-debian-repository
-      kernel and it addes some code, where I have to make changes. Earlier
-      kernel_map was a pointer to structure, but I change that and now
-      kernel_map is structure. So handling to it should be by taking the
-      address (&kernel_map)
-    <braunr> why did you do that ?
-    <braunr> or put it another way: what made you do that type change on
-      kernel_map ?
-    <mcsim> Earlier memory for kernel_map was allocating with zalloc. But now
-      salloc can't allocate memory before it's initialisation
-    <braunr> that's not a good reason
-    <braunr> a simple workaround for your problem is this :
-    <braunr> static struct vm_map kernel_map_store;
-    <braunr> vm_map_t kernel_map = &kernel_map_store;
-    <mcsim> braunr: Ok. I'll correct this.
-
-
-# IRC, freenode, #hurd, 2011-11-01
-
-    <braunr> etenil: but mcsim's work is, for one, useful because the allocator
-      code is much clearer, adds some debugging support, and is smp-ready
-
-
-# IRC, freenode, #hurd, 2011-11-14
-
-    <braunr> i've just realized that replacing the zone allocator removes most
-      (if not all) static limit on allocated objects
-    <braunr> as we have nothing similar to rlimits, this means kernel resources
-      are actually exhaustible
-    <braunr> and i'm not sure every allocation is cleanly handled in case of
-      memory shortage
-    <braunr> youpi: antrik: tschwinge: is this acceptable anyway ?
-    <braunr> (although IMO, it's also a good thing to get rid of those limits
-      that made the kernel panic for no valid reason)
-    <youpi> there are actually not many static limits on allocated objects
-    <youpi> only a few have one
-    <braunr> those defined in kern/mach_param.h
-    <youpi> most of them are not actually enforced
-    <braunr> ah ?
-    <braunr> they are used at zinit() time
-    <braunr> i thought they were
-    <youpi> yes,  but most zones are actually fine with overcoming the max
-    <braunr> ok
-    <youpi> see zone->max_size += (zone->max_size >> 1);
-    <youpi> you need both !EXHAUSTIBLE and FIXED
-    <braunr> ok
-    <pinotree> making having rlimits enforced would be nice...
-    <pinotree> s/making//
-    <braunr> pinotree: the kernel wouldn't handle many standard rlimits anyway
-
-    <braunr> i've just committed my final patch on mcsim's branch, which will
-      serve as the starting point for integration
-    <braunr> which means code in this branch won't change (or only last minute
-      changes)
-    <braunr> you're invited to test it
-    <braunr> there shouldn't be any noticeable difference with the master
-      branch
-    <braunr> a bit less fragmentation
-    <braunr> more memory can be reclaimed by the VM system
-    <braunr> there are debugging features
-    <braunr> it's SMP ready
-    <braunr> and overall cleaner than the zone allocator
-    <braunr> although a bit slower on the free path (because of what's
-      performed to reduce fragmentation)
-    <braunr> but even "slower" here is completely negligible
-
-
-# IRC, freenode, #hurd, 2011-11-15
-
-    <mcsim> I enabled cpu_pool layer and kentry cache exhausted at "apt-get
-      source gnumach && (cd gnumach-* && dpkg-buildpackage)"
-    <mcsim> I mean kernel with your last commit
-    <mcsim> braunr: I'll make patch how I've done it in a few minutes, ok? It
-      will be more specific.
-    <braunr> mcsim: did you just remove the #if NCPUS > 1 directives ?
-    <mcsim> no. I replaced macro NCPUS > 1 with SLAB_LAYER, which equals NCPUS
-      > 1, than I redefined macro SLAB_LAYER
-    <braunr> ah, you want to make the layer optional, even on UP machines
-    <braunr> mcsim: can you give me the commands you used to trigger the
-      problem ?
-    <mcsim> apt-get source gnumach && (cd gnumach-* && dpkg-buildpackage)
-    <braunr> mcsim: how much ram & swap ?
-    <braunr> let's see if it can handle a quite large aptitude upgrade
-    <mcsim> how can I check swap size?
-    <braunr> free
-    <braunr> cat /proc/meminfo
-    <braunr> top
-    <braunr> whatever
-    <mcsim>              total       used       free     shared    buffers
-      cached
-    <mcsim> Mem:        786368     332296     454072          0          0
-      0
-    <mcsim> -/+ buffers/cache:     332296     454072
-    <mcsim> Swap:      1533948          0    1533948
-    <braunr> ok, i got the problem too
-    <mcsim> braunr: do you run hurd in qemu?
-    <braunr> yes
-    <braunr> i guess the cpu layer increases fragmentation a bit
-    <braunr> which means more map entries are needed
-    <braunr> hm, something's not right
-    <braunr> there are only 26 kernel map entries when i get the panic
-    <braunr> i wonder why the cache gets that stressed
-    <braunr> hm, reproducing the kentry exhaustion problem takes quite some
-      time
-    <mcsim> braunr: what do you mean?
-    <braunr> sometimes, dpkg-buildpackage finishes without triggering the
-      problem
-    <mcsim> the problem is in apt-get source gnumach
-    <braunr> i guess the problem happens because of drains/fills, which
-      allocate/free much more object than actually preallocated at boot time
-    <braunr> ah ?
-    <braunr> ok
-    <braunr> i've never had it at that point, only later
-    <braunr> i'm unable to trigger it currently, eh
-    <mcsim> do you use *-dbg kernel?
-    <braunr> yes
-    <braunr> well, i use the compiled kernel, with the slab allocator, built
-      with the in kernel debugger
-    <mcsim> when you run apt-get source gnumach, you run it in clean directory?
-      Or there are already present downloaded archives?
-    <braunr> completely empty
-    <braunr> ah just got it
-    <braunr> ok the limit is reached, as expected
-    <braunr> i'll just bump it
-    <braunr> the cpu layer drains/fills allocate several objects at once (64 if
-      the size is small enough)
-    <braunr> the limit of 256 (actually 252 since the slab descriptor is
-      embedded in its slab) is then easily reached
-    <antrik> mcsim: most direct way to check swap usage is vmstat
-    <braunr> damn, i can't live without slabtop and the amount of
-      active/inactive cache memory any more
-    <braunr> hm, weird, we have active/inactive memory in procfs, but not
-      buffers/cached memory
-    <braunr> we could set buffers to 0 and everything as cached memory, since
-      we're currently unable to communicate the purpose of cached memory
-      (whether it's used by disk servers or file system servers)
-    <braunr> mcsim: looks like there are about 240 kernel map entries (i forgot
-      about the ones used in kernel submaps)
-    <braunr> so yes, addin the cpu layer is what makes the kernel reach the
-      limit more easily
-    <mcsim> braunr: so just increasing limit will solve the problem?
-    <braunr> mcsim: yes
-    <braunr> slab reclaiming looks very stable
-    <braunr> and unfrequent
-    <braunr> (which is surprising)
-    <pinotree> braunr: "unfrequent"?
-    <braunr> pinotree: there isn't much memory pressure
-    <braunr> slab_collect() gets called once a minute on my hurd
-    <braunr> or is it infrequent ?
-    <braunr> :)
-    <pinotree> i have no idea :)
-    <braunr> infrequent, yes
-
-
-# IRC, freenode, #hurd, 2011-11-16
-
-    <braunr> for those who want to play with the slab branch of gnumach, the
-      slabinfo tool is available at http://git.sceen.net/rbraun/slabinfo.git/
-    <braunr> for those merely interested in numbers, here is the output of
-      slabinfo, for a hurd running in kvm with 512 MiB of RAM, an unused swap,
-      and a short usage history (gnumach debian packages built, aptitude
-      upgrade for a dozen of packages, a few git commands)
-    <braunr> http://www.sceen.net/~rbraun/slabinfo.out
-    <antrik> braunr: numbers for a long usage history would be much more
-      interesting :-)
-
-
-## IRC, freenode, #hurd, 2011-11-17
-
-    <braunr> antrik: they'll come :)
-    <etenil> is something going on on darnassus? it's mighty slow
-    <braunr> yes
-    <braunr> i've rebooted it to run a modified kernel (with the slab
-      allocator) and i'm building stuff on it to stress it
-    <braunr> (i don't have any other available machine with that amount of
-      available physical memory)
-    <etenil> ok
-    <antrik> braunr: probably would be actually more interesting to test under
-      memory pressure...
-    <antrik> guess that doesn't make much of a difference for the kernel object
-      allocator though
-    <braunr> antrik: if ram is larger, there can be more objects stored in
-      kernel space, then, by building something large such as eglibc, memory
-      pressure is created, causing caches to be reaped
-    <braunr> our page cache is useless because of vm_object_cached_max
-    <braunr> it's a stupid arbitrary limit masking the inability of the vm to
-      handle pressure correctly 
-    <braunr> if removing it, the kernel freezes soon after ram is filled
-    <braunr> antrik: it may help trigger the "double swap" issue you mentioned
-    <antrik> what may help trigger it?
-    <braunr> not checking this limit
-    <antrik> hm... indeed I wonder whether the freezes I see might have the
-      same cause
-
-
-## IRC, freenode, #hurd, 2011-11-19
-
-    <braunr> http://www.sceen.net/~rbraun/slabinfo.out <= state of the slab
-      allocator after building the debian libc packages and removing all files
-      once done
-    <braunr> it's mostly the same as on any other machine, because of the
-      various arbitrary limits in mach (most importantly, the max number of
-      objects in the page cache)
-    <braunr> fragmentation is still quite low
-    <antrik> braunr: actually fragmentation seems to be lower than on the other
-      run...
-    <braunr> antrik: what makes you think that ?
-    <antrik> the numbers of currently unused objects seem to be in a similar
-      range IIRC, but more of them are reclaimable I think
-    <antrik> maybe I'm misremembering the other numbers
-    <braunr> there had been more reclaims on the other run
-
-
-# IRC, freenode, #hurd, 2011-11-25
-
-    <braunr> mcsim: i've just updated the slab branch, please review my last
-      commit when you have time
-    <mcsim> braunr: Do you mean compilation/tests?
-    <braunr> no, just a quick glance at the code, see if it matches what you
-      intended with your original patch
-    <mcsim> braunr: everything is ok
-    <braunr> good
-    <braunr> i think the branch is ready for integration
-
-
-# IRC, freenode, #hurd, 2011-12-17
-
-    <braunr> in the slab branch, there now is no use for the defines in
-      kern/mach_param.h
-    <braunr> should the file be removed or left empty as a placeholder for
-      future arbitrary limits ?
-    <braunr> (i'd tend ro remove it as a way of indicating we don't want
-      arbitrary limits but there may be a good reason to keep it around .. :))
-    <youpi> I'd just drop it
-    <braunr> ok
-    <braunr> hmm maybe we do want to keep that one :
-    <braunr> #define IMAR_MAX        (1 << 10)       /* Max number of
-      msg-accepted reqs */
-    <antrik> whatever that is...
-    <braunr> it gets returned in ipc_marequest_info
-    <braunr> but the mach_debug interface has never been used on the hurd
-    <braunr> there now is a master-slab branch in the gnumach repo, feel free
-      to test it
-
-
-# IRC, freenode, #hurd, 2011-12-22
-
-    <youpi> braunr: does the new gnumach allocator has profiling features?
-    <youpi> e.g. to easily know where memory leaks reside
-    <braunr> youpi: you mean tracking call traces to allocated blocks ?
-    <youpi> not necessarily traces
-    <youpi> but at least means to know what kind of objects is filling memory
-    <braunr> it's very close to the zone allocator
-    <braunr> but instead of zones, there are caches
-    <braunr> each named after the type they store
-    <braunr> see http://www.sceen.net/~rbraun/slabinfo.out
-    <youpi> ok, so we can know, per-type, how much memory is used
-    <braunr> yes
-    <youpi> good
-    <braunr> if backtraces can easily be forged, it wouldn't be hard to add
-      that feature too
-    <youpi> does it dump such info when memory goes short?
-    <braunr> no but it can
-    <braunr> i've done this during tests
-    <youpi> it'd be good
-    <youpi> because I don't know in advance when a buildd will crash due to
-      that :)
-    <braunr> each time slab_collect() is called for example
-    <youpi> I mean not on collect, but when it's too late
-    <youpi> and thus always enabled
-    <braunr> ok
-    <youpi> (because there's nothing better to do than at least give infos)
-    <braunr> you just have to define "when it's too late", and i can add that
-    <youpi> when there is no memory left
-    <braunr> you mean when the number of free pages strictly reaches 0 ?
-    <youpi> yes
-    <braunr> ok
-    <youpi> i.e. just before crashing the kernel
-    <braunr> i see
-
-
-# IRC, freenode, #hurdfr, 2012-01-02
-
-    <youpi> braunr: le code du slab allocator, il est écrit from scratch ?
-    <youpi> il y a encore du copyright carnegie mellon
-    <youpi> (dans slab_info.h du moins)
-    <youpi> ipc_hash_global_size = 256;
-    <youpi> il faudrait mettre 256 comme constante dans un header
-    <youpi> sinon c'est encore une valeur arbitraire cachée dans du code
-    <youpi> de même pour ipc_marequest_size etc.
-    <braunr> youpi: oui, from scratch
-    <braunr> slab_info.h est à l'origine zone_info.h
-    <braunr> pour les valeurs fixes, elles étaient déjà présentes de cette
-      façon, j'ai pensé qu'il valait mieux laisser comme ça pour faciliter la
-      lecture des diffs
-    <braunr> je ferai des macros à la place
-    <braunr> du coup il faudra peut-être remettre mach_param.h
-    <braunr> ou alors dans les .h ipc
-
-
-# IRC, freenode, #hurd, 2012-01-18
-
-    <braunr> does the slab branch need other reviews/reports before being
-      integrated ?
-
-
-# IRC, freenode, #hurd, 2012-01-30
-
-    <braunr> youpi: do you have some idea about when you want to get the slab
-      branch in master ?
-    <youpi> I was considering as soon as mcsim gets his paper
-    <braunr> right
-
-
-# IRC, freenode, #hurd, 2012-02-22
-
-    <mcsim> Do I understand correct, that real memory page should be
-      necessarily in one of following lists: vm_page_queue_active,
-      vm_page_queue_inactive, vm_page_queue_free?
-    <braunr> cached pages are
-    <braunr> some special pages used only by the kernel aren't
-    <braunr> pages can be both wired and cached (i.e. managed by the page
-      cache), so that they can be passed to external applications and then
-      unwired (as is the case with your host_slab_info() function if you
-      remember)
-    <braunr> use "physical" instead of "real memory"
-    <mcsim> braunr: thank you.
-
-
-# IRC, freenode, #hurd, 2012-04-22
-
-    <braunr> youpi: tschwinge: when the slab code was added, a few new files
-      made into gnumach that come from my git repo and are used in other
-      projects as well
-    <braunr> they're licensed under BSD upstream and GPL in gnumach, and though
-      it initially didn't disturb me, now it does
-    <braunr> i think i should fix this by leaving the original copyright and
-      adding the GPL on top
-    <youpi> sure, submit a patch
-    <braunr> hm i have direct commit acces if im right
-    <youpi> then fix it :)
-    <braunr> do you want to review ?
-    <youpi> I don't think there is any need to
-    <braunr> ok
-
-
-# IRC, freenode, #hurd, 2012-12-08
-
-    <mcsim> braunr: hi. Do I understand correct that merely the same technique
-      is used in linux to determine the slab where, the object to be freed,
-      resides?
-    <braunr> yes but it's faster on linux since it uses a direct mapping of
-      physical memory
-    <braunr> it just has to shift the virtual address to obtain the physical
-      one, whereas x15 has to walk the pages tables
-    <braunr> of course it only works for kmalloc, vmalloc is entirely different
-    <mcsim> btw, is there sense to use some kind of B-tree instead of AVL to
-      decrease number of cache misses? AFAIK, in modern processors size of L1
-      cache line is at least 64 bytes, so in one node we can put at least 4
-      leafs (key + pointer to data) making search faster.
-    <braunr> that would be a b-tree
-    <braunr> and yes, red-black trees were actually developed based on
-      properties observed on b-trees
-    <braunr> but increasing the size of the nodes also increases memory
-      overhead
-    <braunr> and code complexity
-    <braunr> that's why i have a radix trees for cases where there are a large
-      number of entries with keys close to each other :)
-    <braunr> a radix-tree is basically a b-tree using the bits of the key as
-      indexes in the various arrays it walks instead of comparing keys to each
-      other
-    <braunr> the original avl tree used in my slab allocator was intended to
-      reduce the average height of the tree (avl is better for that)
-    <braunr> avl trees are more suited for cases where there are more lookups
-      than inserts/deletions
-    <braunr> they make the tree "flatter" but the maximum complexity of
-      operations that change the tree is 2log2(n), since rebalancing the tree
-      can make the algorithm reach back to the tree root
-    <braunr> red-black trees have slightly bigger heights but insertions are
-      limited to 2 rotations and deletions to 3
-    <mcsim> there should be not much lookups in slab allocators
-    <braunr> which explains why they're more generally found in generic
-      containers
-    <mcsim> or do I misunderstand something?
-    <braunr> well, there is a lookup for each free()
-    <braunr> whereas there are insertions/deletions when a slab becomes
-      non-empty/empty
-    <mcsim> I see
-    <braunr> so it was very efficient for caches of small objects, where slabs
-      have many of them
-    <braunr> also, i wrote the implementation in userspace, without
-      functionality pmap provides (although i could have emulated it
-      afterwards)
-
-
-# IRC, freenode, #hurd, 2013-01-06
-
-    <youpi> braunr: panic: vm_map: kentry memory exhausted
-    <braunr> youpi: ouch
-    <youpi> that's what I usually get
-    <braunr> ok
-    <braunr> the kentry area is a preallocated memory area that is used to back
-      the vm_map_kentry cache
-    <braunr> objects from this cache are used to describe kernel virtual memory
-    <braunr> so in this case, i simply assume the kentry area must be enlarged
-    <braunr> (currently, both virtual and physical memory is preallocated, an
-      improvement could be what is now done in x15, to preallocate virtual
-      memory only
-    <braunr> )
-    <youpi> Mmm, why do we actually have this limit?
-    <braunr> the kentry area must be described by one entry
-    <youpi> ah, sorry, vm/vm_resident.c:       kentry_data =
-      pmap_steal_memory(kentry_data_size);
-    <braunr> a statically allocated one
-    <youpi> I had missed that one
-    <braunr> previously, the zone allocator would do that
-    <braunr> the kentry area is required to avoid recursion when allocating
-      memory
-    <braunr> another solution would be a custom allocator in vm_map, but i
-      wanted to use a common cache for those objects too
-    <braunr> youpi: you could simply try doubling KENTRY_DATA_SIZE
-    <youpi> already doing that
-    <braunr> we might even consider a much larger size until it's reworked
-    <youpi> well, it's rare enough on buildds already
-    <youpi> doubling should be enough
-    <youpi> or else we have leaks
-    <braunr> right
-    <braunr> it may not be leaks though
-    <braunr> it may be poor map entry merging
-    <braunr> i'd expected the kernel map entries to be easier to merge, but it
-      may simply not be the case
-    <braunr> (i mean, when i made my tests, it looked like there were few
-      kernel map entries, but i may have missed corner cases that could cause
-      more of them to be needed)
-
-
-## IRC, freenode, #hurd, 2014-02-11
-
-    <braunr> youpi: what's the issue with kentry_data_size ?
-    <youpi> I don't know
-    <braunr> so back to 64pages from 256 ?
-    <youpi> in debian for now yes
-    <braunr> :/
-    <braunr> from what i recall with x15, grub is indeed allowed to put modules
-      and command lines around as it likes
-    <braunr> restricted to 4G
-    <braunr> iirc, command lines were in the first 1M while modules could be
-      loaded right after the kernel or at the end of memory, depending on the
-      versions
-    <youpi> braunr: possibly VM_KERNEL_MAP_SIZE is then not big enough
-    <braunr> youpi: what's the size of the ramdisk ?
-    <braunr> youpi: or kmem_map too big
-    <braunr> we discussed this earlier with teythoon 
-
-[[user-space_device_drivers]], *Open Issues*, *System Boot*, *IRC, freenode,
-\#hurd, 2011-07-27*, *IRC, freenode, #hurd, 2014-02-10*
-
-    <braunr> or maybe we want to remove kmem_map altogether and directly use
-      kernel_map
-    <youpi> it's 6.2MiB big
-    <braunr> hm
-    <youpi> err no
-    <braunr> looks small
-    <youpi> 70MiB
-    <braunr> ok yes
-    <youpi> (uncompressed)
-    <braunr> well
-    <braunr> kernel_map is supposed to have 64M on i386 ...
-    <braunr> it's 192M large, with kmem_map taking 128M
-    <braunr> so at most 64M, with possible fragmentation
-    <teythoon> i believe the compressed initrd is stored in the ramdisk
-    <youpi> ah, right it's ext2fs which uncompresses it
-    <braunr> uncompresses it where 
-    <braunr> ?
-    <teythoon> libstore does that
-    <youpi> module --nounzip /boot/${gtk}initrd.gz 
-    <youpi> braunr: in userland memory
-    <youpi> it's not grub which uncompresses it for sure
-    <teythoon> braunr: so my ramdisk isn't 64 megs either
-    <braunr> which explains why it sometimes works
-    <teythoon> yes
-    <teythoon> mine is like 15 megs
-    <braunr> kentry_data_size calls pmap_steal_memory, an early allocation
-      function which changes virtual_space_start, which is later used to create
-      the first kernel map entry
-    <braunr> err, pmap_steal_memory is called with kentry_data_size as its
-      argument
-    <braunr> this first kernel map entry is installed inside kernel_map and
-      reduces the amount of available virtual memory there
-    <braunr> so yes, it all points to a layout problem
-    <braunr> i suggest reducing kmem_map down to 64M
-    <youpi> that's enough to get d-i back to boot
-    <youpi> what would be the downside?
-    <youpi> (why did you raise it to 128 actually? :) )
-    <braunr> i merged the map used by generic kalloc allocations into kmem_map
-    <braunr> both were 64M
-    <braunr> i don't see any downside for the moment
-    <braunr> i rarely see more than 50M used by the slab allocator
-    <braunr> and with the recent code i added to collect reclaimable memory on
-      kernel allocation failures, it's unlikely the slab allocator will be
-      starved
-    <youpi> but then we need that patch too
-    <braunr> no
-    <braunr> it would be needed if kmem_map gets filled
-    <braunr> this very rarely happens
-    <youpi> is "very rarely" enough ? :)
-    <braunr> actualy i've never seen it happen
-    <braunr> i added it because i had port leaks with fakeroot
-    <braunr> port rights are a bit special because they're stored in a table in
-      kernel space
-    <braunr> this table is enlarged with kmem_realloc
-    <braunr> when an ipc space gets very large, fragmentation makes it very
-      difficult to successfully resize it
-    <braunr> that should be the only possible issue
-    <braunr> actually, there is another submap that steals memory from
-      kernel_map: device_io_map is 16M large
-    <braunr> so kernel_map gets down to 48M
-    <braunr> if the initial entry (that is, kentry_data_size + the physical
-      page table size) gets a bit large, kernel_map may have very little
-      available room
-    <braunr> the physical page table size obviously varies depending on the
-      amount of physical memory loaded, which may explain why the installer
-      worked on some machines
-    <youpi> well, it works up to 1855M
-    <youpi> at 1856 it doesn't work any more :)
-    <braunr> heh :)
-    <youpi> and that's about the max gnumach can handle anyway
-    <braunr> then reducing kmem_map down to 96M should be enough
-    <youpi> it works indeed
-    <braunr> could you check the amount of available space in kernel_map ?
-    <braunr> the value of kernel_map->size should do
-    <youpi> printing it "multiboot modules" print should be fine I guess?
-
-
-### IRC, freenode, #hurd, 2014-02-12
-
-    <braunr> probably
-    <teythoon> ?
-    <braunr> i expect a bit more than 160M
-    <braunr> (for the value of kernel_map->size)
-    <braunr> teythoon: ?
-    <youpi> well, it's 2110210048
-    <teythoon> what is multiboot modules printing ?
-    <youpi> almost last in gnumach bootup
-    <braunr> humm
-    <braunr> it must account directly mapped physical pages
-    <braunr> considering the kernel has exactly 2G, this means there is 36M
-      available in kernel_map
-    <braunr> youpi: is the ramdisk loaded at that moment ?
-    <youpi> what do you mean by "loaded" ? :)
-    <braunr> created
-    <youpi> where?
-    <braunr> allocated in kernel memory
-    <youpi> the script hasn't started yet
-    <braunr> ok
-    <braunr> its size was 6M+ right ?
-    <braunr> so it leaves around 30M
-    <youpi> something like this yes
-    <braunr> and changing kmem_map from 128M to 96M gave us 32M
-    <braunr> so that's it
-
-
-# IRC, freenode, #hurd, 2013-04-18
-
-    <braunr> oh nice, i've found a big scalability issue with my slab allocator
-    <braunr> it shouldn't affect gnumach much though
-
-
-## IRC, freenode, #hurd, 2013-04-19
-
-    <ArneBab> braunr: is it fixable?
-    <braunr> yes
-    <braunr> well, i'll do it in x15 for a start
-    <braunr> again, i don't think gnumach is much affected
-    <braunr> it's a scalability issue
-    <braunr> when millions of objects are in use
-    <braunr> gnumach rarely has more than a few hundred thousands
-    <braunr> it's also related to heavy multithreading/smp
-    <braunr> and by multithreading, i also mean preemption
-    <braunr> gnumach isn't preemptible and uniprocessor
-    <braunr> if the resulting diff is clean enough, i'll push it to gnumach
-      though :)
-
-
-### IRC, freenode, #hurd, 2013-04-21
-
-    <braunr> ArneBab_: i fixed the scalability problems btw
-
-
-## IRC, freenode, #hurd, 2013-04-20
-
-    <braunr> well, there is also a locking error in the slab allocator,
-      although not a problem for a non preemptible kernel like gnumach
-    <braunr> non preemptible / uniprocessor
author	https://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14 <diana@web>	2015-02-16 20:08:03 +0100
committer	GNU Hurd web pages engine <web-hurd@gnu.org>	2015-02-16 20:08:03 +0100
commit	95878586ec7611791f4001a4ee17abf943fae3c1 (patch)
tree	847cf658ab3c3208a296202194b16a6550b243cf /open_issues/gnumach_memory_management.mdwn
parent	8063426bf7848411b0ef3626d57be8cb4826715e (diff)
download	web-95878586ec7611791f4001a4ee17abf943fae3c1.tar.gz web-95878586ec7611791f4001a4ee17abf943fae3c1.tar.bz2 web-95878586ec7611791f4001a4ee17abf943fae3c1.zip