[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] # IRC, freenode, #hurd, 2012-04-23 <braunr> btw, i'm running a gnumach version using red-black trees for vm map entries <antrik> braunr: sounds fashionable ;-) <youpi> braunr: with some perf improvement? <braunr> looks promising for our ext2fs instances showing several thousands of map entries <braunr> youpi: i'm not using it for lookups yet <braunr> but the tree is there, maintained, used for both regular and copy maps, and it doesn't crash <youpi> good :) <braunr> antrik: isn't it ? :) <braunr> youpi: and the diff stat is like 50/15 <antrik> braunr: what's the goal of using the fashionable trees? <braunr> antrik: speeding up lookups in address spaces <antrik> braunr: so the idea is that if we have a heavily fragmented address space, the performance penalty is smaller? <braunr> yes <antrik> OK <antrik> I take it you gave up on attempts to actually decrease fragmentation?... <braunr> it's not as good as reducing fragmentation, which requires implementing a powerful merge, but it's still better <braunr> yes <braunr> it's too messy for my brain :/ <antrik> I see <antrik> oh <braunr> it will add some overhead though <youpi> I guess log(n) ? <braunr> but if there is a significant performance gain, it'll be worth it <braunr> yes <braunr> i was more thinking about the memory overhead <antrik> right now it's a linear list? <youpi> I don't think we care nowadays :) <braunr> antrik: yes <antrik> ouch <braunr> antrik: yes ... :> <braunr> the original authors expected vm maps to have like 30 entries <braunr> so they used a list for the maps, and a hash table for the object/offset to physical page lookups <braunr> there is a small lookup cache though, which is a nice optimization <braunr> my code now uses it first, and falls back to the RB tree if the hint didn't help <antrik> braunr: well, don't forget to check whether it actually *is* still an optimisation, when using fashionable trees ;-) <braunr> antrik: i checked that already :) <braunr> i did the same in x15 <antrik> I see <braunr> both bsd and linux uses a similar technique <braunr> use* <braunr> (well, bsd actually use what is done in mach :) <antrik> (or perhaps the other way around... ;-) ) <braunr> i don't think so, as the bsd vm is really the mach vm <braunr> but we don't care much <antrik> oh, right... that part actually went full circle <braunr> youpi: i have a patch ready for test on machines with significant amounts of map entries (e.g. the buildds ..) <braunr> youpi: i won't have time for tests tonight, are you interested ? <braunr> (i've been running it for 15 minutes without any issue for now) <youpi> I'd say post to the list <braunr> ok <youpi> braunr: your patch uses the rb tree for lookups, right? <youpi> braunr: the buildd using rbtree seems swift <youpi> but maybe it's just a psychologic effect :) <youpi> the chroot ext2fs already has 1392 lines in vminfo <youpi> an rbtree can't hurt there :) <youpi> braunr: it really seems faster <youpi> the reboot might have helped too <youpi> benchmarks shall say <youpi> for now, I'll just let ironforge use it <antrik> youpi: it's always fast after a reboot ;-) <youpi> sure <youpi> but still <youpi> I mean <youpi> *obviously* the reboot has helped <youpi> but it might not be all <youpi> at least it feels so <youpi> and obviously only benchmarks can say <antrik> the major benefit AIUI is rather that the slowdown happening over time will be less noticable [[performance/degradation]]. <youpi> "over time" is actually quite fast <youpi> ext2 fills up very quickly when you build a package <youpi> it went up to 1700 lines very quickly <youpi> and stabilized around there <antrik> yeah, it can be very fast under heavy load <youpi> that's why I say reboot seems not all <youpi> it's already not so fresh <youpi> with 1700 lines in vminfo <antrik> well, I don't know how much of the slowdown I'm experiencing (after doing stuff under memory pressure) is actually related to VM map fragmentation... <antrik> might be all, might be none, might be something in between <youpi> then try his patch <antrik> guess I should play a bit with vminfo... <antrik> well, currently I actually experience pretty little slowdown, as for certain reasons (only indirectly related to the Hurd) I'm not running mutt on this machine, so I don't really have memory pressure... ## IRC, freenode, #hurd, 2012-04-24 <braunr> youpi: yes, it uses bst lookups <braunr> youpi: FYI, the last time i checked, one ext2fs instance had 4k+ map entries, and another around 7.5k <braunr> (on ironforge) ## IRC, freenode, #hurd, 2012-04-24 <youpi> braunr: $ sudo vminfo 624 | wc -l <youpi> 22957 <youpi> there's no way it can not help :) <braunr> youpi: 23k, that's really huge ## IRC, freenode, #hurd, 2012-04-26 <braunr> youpi: any new numbers wrt the rbtree patch ? <youpi> well, buildd times are not really accurate :) <youpi> but what I can tell is that it managed to build qtwebkit fine <braunr> ok <youpi> so the patch is probably safe :) <braunr> i'll commit it soon then <youpi> I'd say feel free to, yes <braunr> thanks ## IRC, freenode, #hurd, 2012-04-27 <braunr> elmig: don't expect anything grand though, this patch is mostly useful when address spaces get really fragmented, which mainly happens on buildds <braunr> (and it only speeds lookups, which isn't as good as reducing fragmentation; things like fork still have to copy thousands of map entries) [[glibc/fork]]. ## IRC, freenode, #hurdfr, 2012-06-02 <youpi> braunr: oh, un bug de rbtree <youpi> Assertion `diff != 0' failed in file "vm/vm_map.c", line 1002 <youpi> c'est dans rbtree_insert() <youpi> vm_map_enter (vm/vm_map.c:1002). <youpi> vm_map (vm/vm_user.c:373). <youpi> syscall_vm_map (kern/ipc_mig.c:657). <youpi> erf j'ai tué mon débuggueur, je ne peux pas inspecter plus <youpi> le peu qui me reste c'est qu'apparemment target_map == 1, size == 0, mask == 0 <youpi> anywhere = 1 <braunr> youpi: ça signifie sûrement que des adresses overlappent <braunr> je rejetterai un coup d'oeil sur le code demain <braunr> (si ça se trouve c'est un bug rare de la vm, le genre qui fait crasher le noyau) <braunr> (enfin jveux dire, qui faisait crasher le noyau de façon très obscure avant le patch rbtree) ### IRC, freenode, #hurd, 2012-07-15 <bddebian> I get errors in vm_map.c whenever I try to "mount" a CD <bddebian> Hmm, this time it rebooted the machine <bddebian> braunr: The translator set this time and the machine reboots before I can get the full message about vm_map, but here is some of the crap I get: http://paste.debian.net/179191/ <braunr> oh <braunr> nice <braunr> that may be the bug youpi saw with my redblack tree patch <braunr> bddebian: assert(diff != 0); ? <bddebian> Aye <braunr> good <braunr> it means we're trying to insert a vm_map_entry at a region in a map which is already occupied <bddebian> Oh <braunr> and unlike the previous code, the tree actually checks that <braunr> it has to <braunr> so you just simply use the iso9660fs translator and it crashes ? <bddebian> Well it used to on just trying to set the translator. This time I was able to set the translator but as soon as I cd to the mount point I get all that crap <braunr> that's very good <braunr> more test cases to fix the vm ### IRC, freenode, #hurd, 2012-11-01 <youpi> braunr: Assertion `diff != 0' failed in file "vm/vm_map.c", line 1002 <youpi> that's in rbtree_insert <braunr> youpi: the problem isn't the tree, it's the map entries <braunr> some must overlap <braunr> if you can inspect that, it would be helpful <youpi> I have a kdb there <youpi> it's within a port_name_to_task system call <braunr> this assertion basically means there already is an item in the tree where the new item is supposed to be inserted <youpi> this port_name_to_task presence in the stack is odd <braunr> it's in vm_map_enter <youpi> there's a vm_map just after that (and the assembly trap code before) <youpi> I know <youpi> I'm wondering about the caller <braunr> do you have a way to inspect the inserted map entry ? <youpi> I'm actually wondering whether I have the right kernel in gdb <braunr> oh <youpi> better <youpi> with the right kernel :) <youpi> 0x80039acf (syscall_vm_map) (target_map=d48b6640,address=d3b63f90,size=0,mask=0,anywhere=1) <youpi> size == 0 seems odd to me <youpi> (same parameters for vm_map) <braunr> right <braunr> my code does assume an entry has a non null size <braunr> (in the entry comparison function) <braunr> EINVAL (since Linux 2.6.12) length was 0. <braunr> that's a quick glance at mmap(2) <braunr> might help track bugs from userspace (e.g. in exec .. :)) <braunr> posix says the saem <braunr> same* <braunr> the gnumach manual isn't that precise <youpi> I don't seem to manage to read the entry <youpi> but I guess size==0 is the problem anyway <mcsim> youpi, braunr: Is there another kernel fault? Was that in my kernel? <braunr> no that's another problem <braunr> which became apparent following the addition of red black trees in the vm_map code <braunr> (but which was probably present long before) <mcsim> braunr: BTW, do you know if there where some specific circumstances that led to memory exhaustion in my code? Or it just aggregated over time? <braunr> mcsim: i don't know <mcsim> s/where/were <mcsim> braunr: ok ### IRC, freenode, #hurd, 2012-11-05 <tschwinge> braunr: I have now also hit the diff != 0 assertion error; sitting in KDB, waiting for your commands. <braunr> tschwinge: can you check the backtrace, have a look at the system call and its parameters like youpi did ? <tschwinge> If I manage to figure out how to do that... :-) * tschwinge goes read scrollback. <braunr> "trace" i suppose <braunr> if running inside qemu, you can use the integrated gdb server <tschwinge> braunr: No, hardware. And work intervened. And mobile phone <-> laptop via bluetooth didn't work. But now: <tschwinge> Pretty similar to Samuel's: <tschwinge> Assert([...]) <tschwinge> vm_map_enter(0xc11de6c8, 0xc1785f94, 0, 0, 1) <tschwinge> vm_map(0xc11de6c8, 0xc1785f94, 0, 0, 1) <tschwinge> syscall_vm_map(1, 0x1024a88, 0, 0, 1) <tschwinge> mach_call_call(1, 0x1024a88, 0, 0, 1) <braunr> thanks <braunr> same as youpi observed, the requested size for the mapping is 0 <braunr> tschwinge: thanks <tschwinge> braunr: Anything else you'd like to see before I reboot? <braunr> tschwinge: no, that's enough for now, and the other kind of info i'd like are much more difficult to obtain <braunr> if we still have the problem once a small patch to prevent null size is applied, then it'll be worth looking more into it <pinotree> isn't it possible to find out who called with that size? <braunr> not easy, no <braunr> it's also likely that the call that fails isn't the first one <pinotree> ah sure <pinotree> braunr: making mmap reject 0 size length could help? posix says such size should be rejected straight away <braunr> 17:09 < braunr> if we still have the problem once a small patch to prevent null size is applied, then it'll be worth looking more into it <braunr> that's the idea <braunr> making faulty processes choke on it should work fine :) <pinotree> «If len is zero, mmap() shall fail and no mapping shall be established.» <pinotree> braunr: should i cook up such patch for mmap? <braunr> no, the change must be applied in gnumach <pinotree> sure, but that could simply such condition in mmap (ie avoiding to call io_map on a file) <braunr> such calls are erroneous and rare, i don't see the need <pinotree> ok <braunr> i bet it comes from the exec server anyway :p <tschwinge> braunr: Is the mmap with size 0 already a reproducible testcase you can use for the diff != 0 assertion? <tschwinge> Otherwise I'd have a reproducer now. <braunr> tschwinge: i'm not sure but probably yes <tschwinge> braunr: Otherwise, take GDB sources, then: gcc -fsplit-stack gdb/testsuite/gdb.base/morestack.c && ./a.out <tschwinge> I have not looked what exactly this does; I think -fsplit-stack is not really implemented for us (needs something in libgcc we might not have), is on my GCC TODO list already. <braunr> tschwinge: interesting too :) ### IRC, freenode, #hurd, 2012-11-19 <tschwinge> braunr: Hmm, I have now hit the diff != 0 GNU Mach assertion failure during some GCC invocation (GCC testsuite) that does not relate to -fsplit-stack (as the others before always have). <tschwinge> Reproduced: /media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/xgcc -B/media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/ /home/thomas/tmp/gcc/hurd/master/gcc/testsuite/gcc.dg/torture/pr42878-1.c -fno-diagnostics-show-caret -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fcompare-debug -S -o pr42878-1.s <tschwinge> Will check whether it's the same backtrace in GNU Mach. <tschwinge> Yes, same. <braunr> tschwinge: as youpi seems quite busy these days, i'll cook a patch and commit it directly <tschwinge> braunr: Thanks! I have, by the way, confirmed that the following is enough to trigger the issue: vm_map(mach_task_self(), 0, 0, 0, 1, 0, 0, 0, 0, 0, 0); <tschwinge> ... and before the allocator patch, GNU Mach did accept that and return 0 -- though I did not check what effect it actually has. (And I don't think it has any useful one.) I'm also reading that as of lately (Linux 2.6.12), mmap (length = 0) is to return EINVAL, which I think is the foremost user of vm_map. <pinotree> tschwinge: posix too says to return EINVAL for length = 0 <braunr> yes, we checked that earlier with youpi [[!message-id "87sj8522zx.fsf@kepler.schwinge.homeip.net"]]. <braunr> tschwinge: well, actually your patch is what i had in mind (although i'd like one in vm_map_enter to catch wrong kernel requests too) <braunr> tschwinge: i'll work on it tonight, and do some testing to make sure we don't regress critical stuff (exec is another major direct user of vm_map iirc) <tschwinge> braunr: Oh, OK. :-)