From 2603401fa1f899a8ff60ec6a134d5bd511073a9d Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Tue, 7 Aug 2012 23:25:26 +0200 Subject: IRC. --- open_issues/alarm_setitimer.mdwn | 8 + .../automatic_backtraces_when_assertions_hit.mdwn | 60 +- open_issues/bpf.mdwn | 8 + open_issues/dde.mdwn | 28 +- open_issues/ext2fs_deadlock.mdwn | 5 +- ...t2fs_libports_reference_counting_assertion.mdwn | 93 ++ open_issues/glibc/t/tls-threadvar.mdwn | 29 + open_issues/gnat.mdwn | 51 +- open_issues/gnumach_page_cache_policy.mdwn | 146 +++ open_issues/gnumach_vm_map_red-black_trees.mdwn | 26 + open_issues/libmachuser_libhurduser_rpc_stubs.mdwn | 11 +- open_issues/libpager_deadlock.mdwn | 165 +++ open_issues/libpthread.mdwn | 524 +++++++++ open_issues/libpthread_CLOCK_MONOTONIC.mdwn | 27 + open_issues/mission_statement.mdwn | 41 +- open_issues/multithreading.mdwn | 85 ++ open_issues/packaging_libpthread.mdwn | 50 + open_issues/pci_arbiter.mdwn | 256 +++++ open_issues/performance.mdwn | 29 + open_issues/performance/io_system/read-ahead.mdwn | 280 +++++ open_issues/pfinet_vs_system_time_changes.mdwn | 31 +- open_issues/select.mdwn | 1180 ++++++++++++++++++++ open_issues/strict_aliasing.mdwn | 10 + open_issues/synchronous_ipc.mdwn | 64 ++ open_issues/usleep.mdwn | 25 + open_issues/virtualbox.mdwn | 44 +- open_issues/wait_errors.mdwn | 25 + 27 files changed, 3272 insertions(+), 29 deletions(-) create mode 100644 open_issues/ext2fs_libports_reference_counting_assertion.mdwn create mode 100644 open_issues/libpager_deadlock.mdwn create mode 100644 open_issues/pci_arbiter.mdwn create mode 100644 open_issues/synchronous_ipc.mdwn create mode 100644 open_issues/usleep.mdwn create mode 100644 open_issues/wait_errors.mdwn (limited to 'open_issues') diff --git a/open_issues/alarm_setitimer.mdwn b/open_issues/alarm_setitimer.mdwn index 99b2d7b6..3255683c 100644 --- a/open_issues/alarm_setitimer.mdwn +++ b/open_issues/alarm_setitimer.mdwn @@ -21,3 +21,11 @@ See also the attached file: on other OSes (e.g. Linux) it blocks waiting for a signal, while on GNU/Hurd it gets a new alarm and exits. [[alrm.c]] + + +# IRC, freenode, #hurd, 2012-07-29 + + our setitimer is bugged + it seems doesn't seem to leave a timer disarmed when the interval + is set to 0 + (which means a one shot timer is actually periodic ..) diff --git a/open_issues/automatic_backtraces_when_assertions_hit.mdwn b/open_issues/automatic_backtraces_when_assertions_hit.mdwn index 1cfacaf5..71007f99 100644 --- a/open_issues/automatic_backtraces_when_assertions_hit.mdwn +++ b/open_issues/automatic_backtraces_when_assertions_hit.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -10,9 +10,65 @@ License|/fdl]]."]]"""]] [[!tag open_issue_glibc]] -IRC, unknown channel, unknown date. + +# IRC, unknown channel, unknown date tschwinge: ext2fs.static: thread-cancel.c:55: hurd_thread_cancel: Assertion `! __spin_lock_locked (&ss->critical_section_lock)' failed. it'd be great if we could have backtraces in such case at least just the function names and in this case (static), just addresses would be enough + + +# IRC, freenode, #hurd, 2012-07-19 + +In context of the [[ext2fs_libports_reference_counting_assertion]]. + + pinotree: tschwinge: do you know if our packages are built with + -rdynamic ? + braunr: debian's cflags don't include it, so unless the upstream + build systems do, -rdynamic is not added + i doubt glibc' backtrace() is able to find debugging symbol files + on its own + what do you mean? + the port reference bug youpi noticed is rare + even on linux, a program compiled with normal optimizations (eg + -O2 -g) can give just pointer values in backtrace()'s output + core dumps are unreliable at best + +[[crash_server]]. + + uh, no, backtrace does give names + but not with -fomit-frame-pointer + unless the binary is built with -rdynamic + at least it used to + not really, when being optimized some steps can be optimized + away (eg inlines) + that's ok + anyway, the point is i'd like a way that can give us as much + information as possible when the problem happens + the stack trace being the most useful imo + do you face issues currently with backtrace()? + not tried yet + i guess i could make the application trap in the kernel, and fault + there, so we can attach gdb while still in the pager address space :> + that would imply the need for interactivity when the fault + happens, wouldn't it? + no + it would remain this way until someone comes, hours, days later + pinotree: well ok, it would require interactivity, but not *when* + it happens ;p + pinotree: right, it needs -rdynamic + + +## IRC, freenode, #hurd, 2012-07-21 + + tschwinge: my current "approach" is to introduce an infinite loop + it makes the faulting task mapped in often enough to use gdb + through qemu + ... :) + My understanding is that glibc already does have some mechanism + for that: I have seen it print backtraces whendetecting malloc + inconsistencies (double free and the lite). + yes, i thought it used the backtrace functions internally though + that is, execinfo + but this does require -rdynamic diff --git a/open_issues/bpf.mdwn b/open_issues/bpf.mdwn index e24d761b..02dc7f87 100644 --- a/open_issues/bpf.mdwn +++ b/open_issues/bpf.mdwn @@ -585,3 +585,11 @@ This is a collection of resources concerning *Berkeley Packet Filter*s. in libpcap, and let users of that library benefit from it instead of implementing the low level bpf interface, which nonetheless has some system-specific variants .. + + +## IRC, freenode, #hurd, 2012-08-03 + +In context of the [[select]] issue. + + i understand now why my bpf translator was so buggy + the condition_timedwait i wrote at the time was .. incomplete :) diff --git a/open_issues/dde.mdwn b/open_issues/dde.mdwn index aff988d5..8f00c950 100644 --- a/open_issues/dde.mdwn +++ b/open_issues/dde.mdwn @@ -31,6 +31,18 @@ A similar problem is described in [[community/gsoc/project_ideas/unionfs_boot]], and needs to be implemented. +### IRC, freenode, #hurd, 2012-07-17 + + OK, here is a stupid question I have always had. If you move + PCI and disk drivers in to userspace, how do do initial bootstrap to get + the system booting? + that's hard + basically you make the boot loader load all the components you + need in ram + then you make it give each component something (ports) so they can + communicate + + # Upstream Status @@ -90,6 +102,9 @@ At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: automatically, or you have to settrans yourself to setup a device? there's no autoloader for now we'd need a bus arbitrer that'd do autoprobing + +[[PCI_arbiter]]. + i see (you see i'm not really that low level, so pardon the flood of posssibly-noobish questions ;) ) @@ -200,21 +215,10 @@ At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: right -# IRC, freenode, #hurd, 2012-02-19 - - antrik: we should probably add a gsoc idea on pci bus arbitration - DDE is still experimental for now so it's ok that you have to - configure it by hand, but it should be automatic at some ponit - +# [[PCI_Arbiter]] ## IRC, freenode, #hurd, 2012-02-21 - i'm not familiar with the new gnumach interface for userspace - drivers, but can this pci enumerator be written with it as it is ? - (i'm not asking for a precise answer, just yes - even probably - - or no) - (idk or utsl will do as well) - I'd say yes since all drivers need is interrupts, io ports and iomem the latter was already available through /dev/mem io ports through the i386 rpcs diff --git a/open_issues/ext2fs_deadlock.mdwn b/open_issues/ext2fs_deadlock.mdwn index 369875fe..23f54a4a 100644 --- a/open_issues/ext2fs_deadlock.mdwn +++ b/open_issues/ext2fs_deadlock.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -44,9 +44,8 @@ pull the information out of the process' memory manually (how to do that, anyways?), and also didn't have time to continue with debugging GDB itself, but this sounds like a [[!taglink open_issue_gdb]]...) ---- -IRC, #hurd, 2010-10-27 +# IRC, freenode, #hurd, 2010-10-27 thread 8 hung on ports_begin_rpc that's probably where one could investigated first diff --git a/open_issues/ext2fs_libports_reference_counting_assertion.mdwn b/open_issues/ext2fs_libports_reference_counting_assertion.mdwn new file mode 100644 index 00000000..ff1c4c38 --- /dev/null +++ b/open_issues/ext2fs_libports_reference_counting_assertion.mdwn @@ -0,0 +1,93 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + + libports/port-ref.c:31: ports_port_ref: Assertion `pi->refcnt || pi->weakrefcnt' failed + +This is seen every now and then. + + +# [[gnumach_page_cache_policy]] + +With that patch in place, the assertion failure is seen more often. + + +## IRC, freenode, #hurd, 2012-07-14 + + braunr: I'm getting ext2fs.static: + /usr/src/hurd-debian/./libports/port-ref.c:31: ports_port_ref: Assertion + `pi->refcnt || pi->weakrefcnt' failed. + oddly enough, that happens on one of the buildds only + :/ + i fear the patch can wake many of these issues + + +## IRC, freenode, #hurd, 2012-07-15 + + braunr: same assertion failed on a second buildd + can you paste it again please ? + ext2fs.static: /usr/src/hurd-debian/./libports/port-ref.c:31: + ports_port_ref: Assertion `pi->refcnt || pi->weakrefcnt' failed. + or better, answer the ml thread for future reference + thanks + braunr: I can't keep your patch on the buildds, it makes them too + unreliable + youpi: ok + i never got this error though, that's weird + youpi: was the failure during the same build ? + no, it was during package installation, and not the same + braunr: note that I've already seen such errors, it's not new, but + it was way rarer + like every month only + ah ok + yes it's less surprising then + a tricky reference counting / locking mistake somewhere in the + hurd :) ... + ah ! just got it ! + braunr: Got the error or found the problem? :) + the former unfortunately :/ + + +## IRC, freenode, #hurd, 2012-07-19 + + hm, i think those ext2fs port refs errors may also be due to stack + overflows + --verbose + hm ? + http://lists.gnu.org/archive/html/bug-hurd/2012-07/msg00051.html + i mean, why do you think they could be due to that? + the error is that both strong and weak refs in a port are 0 when + adding a reference + weak refs are almost never used so let's forget about them + when a ref count drops to 0, the port is automatically deallocated + so what other than memory corruption setting this counter to 0 + could possibly do that ? :) + one could also guess an unbalanced ref/unref logic, somehow + what do you mean ? + that for a bug, an early return, etc a port gets unref'ed often + than it is ref'ed + highly unlikely, as they're protected by a lock + pinotree: ah you mean, the object gets deallocated early because + of an deref overflow ? + pinotree: could be, yes + pinotree: i wonder if it could happen because of the periodic sync + duplicating the node table without holding references + rah, libports uses a big lock in many places :( + braunr: yes, i meant that + we could try using libduma some day + i wonder if it could work out of the box + but that wouldn't help to find out whether a port gets deref'ed + too often, for instance + although it could be adapted to do so, i guess + reproducing + a call trace or core would be best, but i'm not even + sure we can get that easily lol + +[[automatic_backtraces_when_assertions_hit]]. diff --git a/open_issues/glibc/t/tls-threadvar.mdwn b/open_issues/glibc/t/tls-threadvar.mdwn index e72732ab..4afd8a1a 100644 --- a/open_issues/glibc/t/tls-threadvar.mdwn +++ b/open_issues/glibc/t/tls-threadvar.mdwn @@ -29,3 +29,32 @@ IRC, freenode, #hurd, 2011-10-23: After this has been done, probably the whole `__libc_tsd_*` stuff can be dropped altogether, and `__thread` directly be used in glibc. + + +# IRC, freenode, #hurd, 2012-08-07 + + r5219: Update libpthread patch to replace threadvar with tls + for pthread_self + r5224: revert r5219 too, it's not ready either + as the changelog says, the __thread revertal is because it posed + problems + and I just didn't have any time to check them while the freeze was + so close + OK. What kind of problems? Should it be reverted upstream, + too? + I don't remember exactly + it should just be fixed + we can revert it upstream, but it'd be good that we manage to + progress, at some point... + Of course -- however as long as we don't know what kind of + problem, it is a bit difficult. ;-) + since I didn't left a note, it was most probably a mere glibc run, + or boot with the patched libpthread + *testsuite run + OK. + The libpthread testsuite doesn't show any issues with that + patch applied, though. But I didn'T test anything else. + youpi: Also, you have probably seen my glibc __thread errno + email -- rmcgrath wanted to find some time this week to comment/help, and + I take it you don't have any immediate comments to that issue? + I saw the mails, but didn't investigate at all diff --git a/open_issues/gnat.mdwn b/open_issues/gnat.mdwn index fb624fad..2d17e275 100644 --- a/open_issues/gnat.mdwn +++ b/open_issues/gnat.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -38,6 +38,55 @@ svn://svn.debian.org/gcccvs/branches/sid@5638 6ca36cf4-e1d1-0310-8c6f-e303bb2178ca' +## IRC, freenode, #hurd, 2012-07-17 + + I've found the remaining problem with gnat backtrace for Hurd! + Related to the stack frame. + This version does not work: one relying on static assumptions + about the frame layout + Causing segfaults. + Any interest to create a test case out of that piece of code, + taken from gcc/ada/tracebak.c? + gnu_srs: sure + + +### IRC, freenode, #hurd, 2012-07-18 + + "Digging further revealed that the GNU/Hurd stack frame does not + seem to + be static enough to define USE_GENERIC_UNWINDER in + gcc/ada/tracebak.c. + " + what do you mean by a "stack frame does not seem to be static + enough" ? + I can qoute from the source file if you want. Otherwise look at + the code yourself: gcc/ada/tracebak,c + I mean that something is wrong with the stack frame for + Hurd. This is the code I wanted to use as a test case for the stack. + Remember? + more or less + ah, "static assumptions" + all right, i don't think anything is "wrong" with stack frames + but if you use a recent version of gcc, as indicated in the code, + -fomit-frame-pointer is enabled by default + so your stack frame won't look like it used to be without the + option + hence the need for USE_GCC_UNWINDER + http://en.wikipedia.org/wiki/Call_stack explains this very well + However, kfreebsd does not seem to need USE_GCC_UNWINDER, how + come? + i guess they don't omit the frame pointer + your fix is good btw + thanks + + +### IRC, freenode, #hurd, 2012-07-19 + + tschwinge: The bug in #681998 should go upstream. Applied in + Debian already. Hopefully this is the last patch needed for the port of + GNAT to Hurd. + + --- diff --git a/open_issues/gnumach_page_cache_policy.mdwn b/open_issues/gnumach_page_cache_policy.mdwn index 03cb3725..375e153b 100644 --- a/open_issues/gnumach_page_cache_policy.mdwn +++ b/open_issues/gnumach_page_cache_policy.mdwn @@ -108,6 +108,9 @@ License|/fdl]]."]]"""]] 12k random data i'll try with other values i get crashes, deadlocks, livelocks, and it's not pretty :) + +[[libpager_deadlock]]. + and always in ext2, mach doesn't seem affected by the issue, other than the obvious (well i get the usual "deallocating an invalid port", but as @@ -625,3 +628,146 @@ License|/fdl]]."]]"""]] ## [[metadata_caching]] + + +## IRC, freenode, #hurd, 2012-07-12 + + i'm only adding a cached pages count you know :) + (well actually, this is now a vm_stats call that can replace + vm_statistics, and uses flavors similar to task_info) + my goal being to see that yellow bar in htop + ... :) + yellow? + yes, yellow + as in http://www.sceen.net/~rbraun/htop.png + ah + + +## IRC, freenode, #hurd, 2012-07-13 + + i always get a "no more room for vm_map_enter" error when building + glibc :/ + but the build continues, probably a failed test + ah yes, i can see the yellow bar :> + braunr: congrats :-) + antrik: thanks + but i think my patch can't make it into the git repo until the + swap deadlock is solved (or at least very infrequent ..) + +[[libpager_deadlock]]. + + well, the page cache accounting tells me something is wrong there + too lol + during a build 112M of data was created, of which only 28M made it + into the cache + which may imply something is still holding references on the + others objects (shadow objects hold references to their underlying + object, which could explain this) + ok i'm stupid, i just forgot to subtract the cached pages from the + used pages .. :> + (hm, actually i'm tired, i don't think this should be done) + ahh yes much better + i simply forgot to convert pages in kilobytes .... :> + with the fix, the accounting of cached files is perfect :) + + +## IRC, freenode, #hurd, 2012-07-14 + + braunr: btw, if you want to stress big builds, you might want to + try webkit, ppl, rquantlib, rheolef, yade + they don't pass on bach (1.3GiB), but do on ironforge (1.8GiB) + youpi: i don't need to, i already know my patch triggers swap + deadlocks more often, which was expected + k + there are 3 tasks concerning my work : 1/ page cache accounting + (i'm sending the patch right now) 2/ removing the fixed limit and 3/ + hunting the swap deadlock and fixing as much as possible + 2/ can't get in the repository without 3/ imo + btw, the increase of PAGE_FREE_* in your 2/ could go already, + couldn't it? + yes + but we should test with higher thresholds + well + it really depends on the usage pattern :/ + + +## [[ext2fs_libports_reference_counting_assertion]] + + +## IRC, freenode, #hurd, 2012-07-15 + + concerning the page cache patch, i've been using for quite some + time now, did lots of builds with it, and i actually wonder if it hurts + stability as much as i think + considering i didn't stress the system as much before + and it really improves performance + + cached memobjs: 138606 + cache: 1138M + i bet ext2fs can have a hard time scanning 138k entries in a + linked list, using callback functions on each of them :x + + + +## IRC, freenode, #hurd, 2012-07-16 + + braunr: Sorry that I didn't have better results to present. + :-/ + eh, that was expected :) + my biggest problem is the hurd itself :/ + for my patch to be useful (and the rest of the intended work), the + hurd needs some serious fixing + not syncing from the pagers + and scalable algorithms everywhere of course + + +## IRC, freenode, #hurd, 2012-07-23 + + youpi: FYI, the branches rbraun/page_cache in the gnupach and hurd + repos are ready to be merged after review + gnumach* + so you fixed the hangs & such? + they only the cache stats, not the "improved" cache + no + it requires much more work for that :) + braunr: my concern is that the tests on buildds show stability + regression + youpi: tschwinge also reported performance degradation + and not the minor kind + uh + :-/ + far less pageins, but twice as many pageouts, and probably high + cpu overhead + building (which is what buildds do) means lots of small files + so lots of objects + huge lists, long scans, etc.. + so it definitely requires more work + the stability issue comes first in mind, and i don't see a way to + obtain a usable trace + do you ? + nope + (except making it loop forever instead of calling assert() and + attach gdb to a qemu instance) + youpi: if you think the infinite loop trick is ok, we could + proceed with that + which assert? + the port refs one + which one? + whicih prevented you from using the page cache patch on buildds + ah, the libports one + for that one, I'd tend to take the time to perhaps use coccicheck + actually + +[[code_analysis]]. + + oh + it's one of those which is supposed to be statically ananyzable + s/n/l + that would be great + :-) + And set precedence. + + +## IRC, freenode, #hurd, 2012-07-26 + + hm i killed darnassus, probably the page cache patch again diff --git a/open_issues/gnumach_vm_map_red-black_trees.mdwn b/open_issues/gnumach_vm_map_red-black_trees.mdwn index d7407bfe..7a54914f 100644 --- a/open_issues/gnumach_vm_map_red-black_trees.mdwn +++ b/open_issues/gnumach_vm_map_red-black_trees.mdwn @@ -172,3 +172,29 @@ License|/fdl]]."]]"""]] crasher le noyau) (enfin jveux dire, qui faisait crasher le noyau de façon très obscure avant le patch rbtree) + + +### IRC, freenode, #hurd, 2012-07-15 + + I get errors in vm_map.c whenever I try to "mount" a CD + Hmm, this time it rebooted the machine + braunr: The translator set this time and the machine reboots + before I can get the full message about vm_map, but here is some of the + crap I get: http://paste.debian.net/179191/ + oh + nice + that may be the bug youpi saw with my redblack tree patch + bddebian: assert(diff != 0); ? + Aye + good + it means we're trying to insert a vm_map_entry at a region in a + map which is already occupied + Oh + and unlike the previous code, the tree actually checks that + it has to + so you just simply use the iso9660fs translator and it crashes ? + Well it used to on just trying to set the translator. This time + I was able to set the translator but as soon as I cd to the mount point I + get all that crap + that's very good + more test cases to fix the vm diff --git a/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn b/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn index 80fc9fcd..57eb403d 100644 --- a/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn +++ b/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -104,3 +105,11 @@ License|/fdl]]."]]"""]] of embedding it ? right now that's a good question... no idea TBH :-) + + +# IRC, freenode, #hurd, 2012-07-23 + + aren't libmachuser and libhurduser supposed to be slowly faded + out? + pinotree: That discussion has not yet come to a conclusion, I + think. (I'd say: yes.) diff --git a/open_issues/libpager_deadlock.mdwn b/open_issues/libpager_deadlock.mdwn new file mode 100644 index 00000000..017ecff6 --- /dev/null +++ b/open_issues/libpager_deadlock.mdwn @@ -0,0 +1,165 @@ +[[!meta copyright="Copyright © 2010, 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +Deadlocks in libpager/periodic sync have been found. + + +# [[gnumach_page_cache_policy]] + + +## IRC, freenode, #hurd, 2012-07-12 + + ah great, a paper about the mach pageout daemon ! + braunr: Where is paper about the mach pageout daemon? + ftp://ftp.cs.cmu.edu/project/mach/doc/published/defaultmm.ps + might give us a clue about the swap deadlock (although i still + have a few ideas to check) + + http://www.sceen.net/~rbraun/moving_the_default_memory_manager_out_of_the_mach_kernel.pdf + we should more seriously consider sergio's advisory pageout branch + some day + i'll try to get in touch with him about that before he completely + looses interest + i'll include it in my "make that page cache as decent as possible" + task + many of his comments match what i've seen + and we both did a few optimizations the same way + (like not deactivating pages when they enter the cache) + + +## IRC, freenode, #hurd, 2012-07-13 + + antrik: i'm able to consistently reproduce the swap deadlocks you + regularly had when using apt with my page cache patch + it happens when lots of dirty pages are write back to their pagers + so apt, or a big file copy or anything that writes several MiB + very quickly is a good candidate + written* + braunr: nice... + antrik: well in a way, yes, as it will allow us to track it more + easily + + +## IRC, freenode, #hurd, 2012-07-15 + + oh btw, i think i can say with confidence that the hurd *doesn't* + deadlock + (at least, concerning swapping) + lol, one of my hurd systems has been hitting the "swap deadlock" + for more than an hour, and suddenly got out of it + something is really wrong in the pageout daemon, but it's not a + deadlock + a livelock then + do you get out of livelocks ? + i mean, it's not even a "lock" + just a big damn tricky slowdown + yes, you can, by giving a few more resources for instance + depends on the kind of livelock of course + i think it's that + the pageout daemon clearly throttles itself, waiting for pagers to + complete + and another dangerous thing is the line in vm_resident, which only + wakes on thread to avoid starvation + hum, during the livelock, the kernel spends much time waiting in + db_read_address + could be a bad stack + so, the pageout daemon seems to slow itself as much as waiting + several seconds between each iteration when under load + but each iteration possibly removes clean pages + so at some point, there is enough memory to unblock waiting pagers + for now i'll try a simple solution, like limiting the pausing + delay + but we'll need more page lists in the future (inactive-clean, + inactive-dirty, etc..) + limiting the amount of dirty pages is the only way to really make + it safe actually + wow, the pageout loop is still running even after many pages were + freed, and it unable to free more pages + i think i have an idea about the livelock + i think it comes from the periodic syncing + Too often? + that's not the problem + the problem is that it can happen at the same time with paging + Oh + if paging gets slow, it won't stop the periodic syncing + which will grab any page it can as soon as some are free + but then, before it even finishes, another sync may occur + i have yet to check that it is possible + and i don't understand why syncing isn't done by the kernel + the kernel is supposed to handle the paging policy + and it would make paging really scale + It's done on the Hurd side? + (instead of having external pagers make one request for each + object, even if they're clean) + yes + Hmm, interesting + ofc, with ext2fs --debug, i can't reproduce anything + Ugh + sync are serialized + grmbl + there is a big lock taken at sync time though + uhg + + +## IRC, freenode, #hurd, 2012-07-16 + + all right so, there *is* a deadlock, and it may be due to the + default pager actually + the vm_page_laundry_count doesn't decrease at some point, even + when there are more than enough free pages + antrik: the thing is, i think the deadlock concerns the default + pager + the deadlock? + yes + when swapping + + +## IRC, freenode, #hurd, 2012-07-17 + + i can't even reproduce the swap deadlock when using upstrea ext2fs + :( + upstream* + + +## IRC, freenode, #hurd, 2012-07-19 + + the libpager deadlock patch looks wrong to me + hm no, the libpager patch is ok acually + + +## [[synchronous_ipc]] + +### IRC, freenode, #hurd, 2012-07-20 + + but actually after reviewing more, the debian patch for this + particular issue seems correct + well, it's most probably done by youpi, so I would be shocked if + it wasn't correct... ;-) + he wasn't sure at all about it + still ;-) + :) + well, if you also think it's correct, I guess it's time to push it + upstream... + + +## IRC, freenode, #hurd, 2012-07-23 + + i still can't conclude if we have any pageout deadlock, or if it's + simply a side effect of the active and inactive lists getting very very + large + but almost every time this issue happens, it somehow recovers, + sometimes hours later + + +# See Also + + * [[ext2fs_deadlock]] diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn index c5054b7f..03a52218 100644 --- a/open_issues/libpthread.mdwn +++ b/open_issues/libpthread.mdwn @@ -42,3 +42,527 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task. there'll still be the issue that only one will be initialized and one that provides libc thread safety functions, etc. that's what i wanted to knew, thanks :) + + +## IRC, freenode, #hurd, 2012-07-23 + + So I am not sure what to do with the hurd_condition_wait stuff + i would also like to know what's the real issue with cancellation + here + because my understanding is that libpthread already implements it + does it look ok to you to make hurd_condition_timedwait return an + errno code (like ETIMEDOUT and ECANCELED) ? + braunr: that's what pthread_* function usually do, yes + i thought they used their own code + no + thanks + well, first, do you understand what hurd_condition_wait is ? + it's similar to condition_wait or pthread_cond_wait with a subtle + difference + it differs from the original cthreads version by handling + cancellation + but it also differs from the second by how it handles cancellation + instead of calling registered cleanup routines and leaving, it + returns an error code + (well simply !0 in this case) + so there are two ways + first, change the call to pthread_cond_wait + Are you saying we could fix stuff to use pthread_cond_wait() + properly? + it's possible but not easy + because you'd have to rewrite the cancellation code + probably writing cleanup routines + this can be hard and error prone + and is useless if the code already exists + so it seems reasonable to keep this hurd extension + but now, as it *is* a hurd extension noone else uses + braunr: BTW, when trying to figure out a tricky problem with the + auth server, cfhammer digged into the RPC cancellation code quite a bit, + and it's really a horrible complex monstrosity... plus the whole concept + is actually broken in some regards I think -- though I don't remember the + details + antrik: i had the same kind of thoughts + antrik: the hurd or pthreads ones ? + not sure what you mean. I mean the RPC cancellation code -- which + is involves thread management too + ok + I don't know how it is related to hurd_condition_wait though + well i found two main entry points there + hurd_thread_cancel and hurd_condition_wait + and it didn't look that bad + whereas in the pthreads code, there are many corner cases + and even the standard itself looks insane + well, perhaps the threading part is not that bad... + it's not where we saw the problems at any rate :-) + rpc interruption maybe ? + oh, right... interruption is probably the right term + yes that thing looks scary + :)) + the migration thread paper mentions some things about the problems + concerning threads controllability + I believe it's a very strong example for why building around + standard Mach features is a bad idea, instead of adapting the primitives + to our actual needs... + i wouldn't be surprised if the "monstrosities" are work arounds + right + + +## IRC, freenode, #hurd, 2012-07-26 + + Uhm, where does /usr/include/hurd/signal.h come from? + head -n4 /usr/include/hurd/signal. + h + Ohh glibc? + That makes things a little more difficult :( + why ? + Hurd includes it which brings in cthreads + ? + the hurd already brings in cthreads + i don't see what you mean + Not anymore :) + the system cthreads header ? + well it's not that difficult to trick the compiler not to include + them + signal.h includes cthreads.h I need to stop that + just define the _CTHREADS_ macro before including anything + remember that header files are normally enclosed in such macros to + avoid multiple inclusions + this isn't specific to cthreads + converting hurd from cthreads to pthreads will make hurd and + glibc break source and binary compatibility + Of course + reminds me of the similar issues of the late 90s + Ugh, why is he using _pthread_self()? + maybe because it accesses to the internals + "he" ? + Thomas in his modified cancel-cond.c + well, you need the internals to implement it + hurd_condition_wait is similar to pthread_condition_wait, except + that instead of stopping the thread and calling cleanup routines, it + returns 1 if cancelled + not that i looked at it, but there's really no way to implement + it using public api? + Even if I am using glibc pthreads? + unlikely + God I had all of this worked out before I dropped off for a + couple years.. :( + this will come back :p + that makes you the perfect guy to work on it ;) + I can't find a pt-internal.h anywhere.. :( + clone the hurd/libpthread.git repo from savannah + Of course when I was doing this libpthread was still in hurd + sources... + So if I am using glibc pthread, why can't I use pthread_self() + instead? + that won't give you access to the internals + OK, dumb question time. What internals? + the libpthread ones + that's where you will find if your thread has been cancelled or + not + pinotree: But isn't that assuming that I am using hurd's + libpthread? + if you aren't inside libpthread, no + pthread_self is normally not portable + you can only use it with pthread_equal + so unless you *know* the internals, you can't use it + and you won't be able to do much + so, as it was done with cthreads, hurd_condition_wait should be + close to the libpthread implementation + inside, normally + now, if it's too long for you (i assume you don't want to build + glibc) + you can just implement it outside, grabbing the internal headers + for now + another "not that i looked at it" question: isn't there no way + to rewrite the code using that custom condwait stuff to use the standard + libpthread one? + and once it works, it'll get integrated + pinotree: it looks very hard + braunr: But the internal headers are assuming hurd libpthread + which isn't in the source anymore + from what i could see while working on select, servers very often + call hurd_condition_wait + and they return EINTR if canceleld + so if you use the standard pthread_cond_wait function, your thread + won't be able to return anything, unless you push the reply in a + completely separate callback + i'm not sure how well mig can cope with that + i'd say it can't :) + no really it looks ugly + it's far better to have this hurd specific function and keep the + existing user code as it is + bddebian: you don't need the implementation, only the headers + the thread, cond, mutex structures mostly + I should turn to "pt-internal.h" and just put it + in libshouldbelibc, no? + no, that header is not installed + Obviously not the "best" way + pinotree: ?? + pinotree: what does it change ? + braunr: it == ? + bddebian: you could even copy it entirely in your new + cancel-cond.C and mention where it was copied from + pinotree: it == pt-internal.H not being installed + that he cannot include it in libshouldbelibc sources? + ah, he wants to copy it? + yes + i want him to copy it actually :p + it may be hard if there are a lot of macro options + the __pthread struct changes size and content depending on other + internal sysdeps headers + well he needs to copy those too :p + Well even if this works we are going to have to do something + more "correct" about hurd_condition_wait. Maybe even putting it in + glibc? + sure + but again, don't waste time on this for now + make it *work*, then it'll get integrated + Like it has already? This "patch" is only about 5 years old + now... ;-P + but is it complete ? + Probably not :) + Hmm, I wonder how many undefined references I am going to get + though.. :( + Shit, 5 + One of which is ___pthread_self.. :( + Does that mean I am actually going to have to build hurds + libpthreads in libshouldbeinlibc? + Seriously, do I really need ___pthread_self, __pthread_self, + _pthread_self and pthread_self??? + I'm still unclear what to do with cancel-cond.c. It seems to me + that if I leave it the way it is currently I am going to have to either + re-add libpthreads or still all of the libpthreads code under + libshouldbeinlibc. + then add it in libc + glib + glibc + maybe under the name __hurd_condition_wait + Shouldn't I be able to interrupt cancel-cond stuff to use glibc + pthreads? + interrupt ? + Meaning interject like they are doing. I may be missing the + point but they are just obfuscating libpthreads thread with some other + "namespace"? (I know my terminology is wrong, sorry). + they ? + Well Thomas in this case but even in the old cthreads code, + whoever wrote cancel-cond.c + but they use internal thread structures .. + Understood but at some level they are still just getting to a + libpthread thread, no? + absolutely not .. + there is *no* pthread stuff in the hurd + that's the problem :p + Bah damnit... + cthreads are directly implement on top of mach threads + implemeneted* + implemented* + Sure but hurd_condition_wait wasn't + of course it is + it's almost the same as condition_wait + but returns 1 if a cancelation request was made + Grr, maybe I am just confusing myself because I am looking at + the modified (pthreads) version instead of the original cthreads version + of cancel-cond.c + well if the modified version is fine, why not directly use that ? + normally, hurd_condition_wait should sit next to other pthread + internal stuff + it could be renamed __hurd_condition_wait, i'm not sure + that's irrelevant for your work anyway + I am using it but it relies on libpthread and I am trying to use + glibc pthreads + hum + what's the difference between libpthread and "glibc pthreads" ? + aren't glibc pthreads the merged libpthread ? + quite possibly but then I am missing something obvious. I'm + getting ___pthread_self in libshouldbeinlibc but it is *UND* + bddebian: with unmodified binaries ? + braunr: No I added cancel-cond.c to libshouldbeinlibc + And some of the pt-xxx.h headers + well it's normal then + i suppose + braunr: So how do I get those defined without including + pthreads.c from libpthreads? :) + pinotree: hm... I think we should try to make sure glibc works + both whith cthreads hurd and pthreads hurd. I hope that shoudn't be so + hard. + breaking binary compatibility for the Hurd libs is not too + terrible I'd say -- as much as I'd like that, we do not exactly have a + lot of external stuff depending on them :-) + bddebian: *sigh* + bddebian: just add cancel-cond to glibc, near the pthread code :p + braunr: Wouldn't I still have the same issue? + bddebian: what issue ? + is hurd_condition_wait() the name of the original cthreads-based + function? + antrik: the original is condition_wait + I'm confused + is condition_wait() a standard cthreads function, or a + Hurd-specific extension? + antrik: as standard as you can get for something like cthreads + braunr: Where hurd_condition_wait is looking for "internals" as + you call them. I.E. there is no __pthread_self() in glibc pthreads :) + hurd_condition_wait is the hurd-specific addition for cancelation + bddebian: who cares ? + bddebian: there is a pthread structure, and conditions, and + mutexes + you need those definitions + so you either import them in the hurd + braunr: so hurd_condition_wait() *is* also used in the original + cthread-based implementation? + or you write your code directly where they're available + antrik: what do you call "original" ? + not transitioned to pthreads + ok, let's simply call that cthreads + yes, it's used by every hurd servers + virtually + if not really everyone of them + braunr: That is where you are losing me. If I can just use + glibc pthreads structures, why can't I just use them in the new pthreads + version of cancel-cond.c which is what I was originally asking.. :) + you *have* to do that + but then, you have to build the whole glibc + * bddebian shoots himself + and i was under the impression you wanted to avoid that + do any standard pthread functions use identical names to any + standard cthread functions? + what you *can't* do is use the standard pthreads interface + no, not identical + but very close + bddebian: there is a difference between using pthreads, which + means using the standard posix interface, and using the glibc pthreads + structure, which means toying with the internale implementation + you *cannot* implement hurd_condition_wait with the standard posix + interface, you need to use the internal structures + hurd_condition_wait is actually a shurd specific addition to the + threading library + hurd* + well, in that case, the new pthread-based variant of + hurd_condition_wait() should also use a different name from the + cthread-based one + so it's normal to put it in that threading library, like it was + done for cthreads + 21:35 < braunr> it could be renamed __hurd_condition_wait, i'm not + sure + Except that I am trying to avoid using that threading library + what ? + If I am understanding you correctly it is an extention to the + hurd specific libpthreads? + to the threading library, whichever it is + antrik: although, why not keeping the same name ? + braunr: I don't think having hurd_condition_wait() for the cthread + variant and __hurd_condition_wait() would exactly help clarity... + I was talking about a really new name. something like + pthread_hurd_condition_wait() or so + braunr: to avoid confusion. to avoid accidentally pulling in the + wrong one at build and/or runtime. + to avoid possible namespace conflicts + ok + well yes, makes sense + braunr: Let me state this as plainly as I hope I can. If I want + to use glibc's pthreads, I have no choice but to add it to glibc? + and pthread_hurd_condition_wait is a fine name + bddebian: no + bddebian: you either add it there + bddebian: or you copy the headers defining the internal structures + somewhere else and implement it there + but adding it to glibc is better + it's just longer in the beginning, and now i'm working on it, i'm + really not sure + add it to glibc directly :p + That's what I am trying to do but the headers use pthread + specific stuff would should be coming from glibc's pthreads + yes + well it's not the headers you need + you need the internal structure definitions + sometimes they're in c files for opacity + So ___pthread_self() should eventually be an obfuscation of + glibcs pthread_self(), no? + i don't know what it is + read the cthreads variant of hurd_condition_wait, understand it, + do the same for pthreads + it's easy :p + For you bastards that have a clue!! ;-P + I definitely vote for adding it to the hurd pthreads + implementation in glibc right away. trying to do it externally only adds + unnecessary complications + and we seem to agree that this new pthread function should be + named pthread_hurd_condition_wait(), not just hurd_condition_wait() :-) + + +## IRC, freenode, #hurd, 2012-07-27 + + OK this hurd_condition_wait stuff is getting ridiculous the way + I am trying to tackle it. :( I think I need a new tactic. + bddebian: what do you mean ? + braunr: I know I am thick headed but I still don't get why I + cannot implement it in libshouldbeinlibc for now but still use glibc + pthreads internals + I thought I was getting close last night by bringing in all of + the hurd pthread headers and .c files but it just keeps getting uglier + and uglier + youpi: Just to verify. The /usr/lib/i386-gnu/libpthread.so that + ships with Debian now is from glibc, NOT libpthreads from Hurd right? + Everything I need should be available in glibc's libpthreads? (Except for + hurd_condition_wait obviously). + 22:35 < antrik> I definitely vote for adding it to the hurd + pthreads implementation in glibc right away. trying to do it externally + only adds unnecessary complications + bddebian: yes + same as antrik + fuck + libpthread *already* provides some odd symbols (cthread + compatibility), it can provide others + bddebian: don't curse :p it will be easier in the long run + * bddebian breaks out glibc :( + but you should tell thomas that too + braunr: I know it just adds a level of complexity that I may not + be able to deal with + we wouldn't want him to waste too much time on the external + libpthread + which one ? + glibc for one. hurd_condition_wait() for another which I don't + have a great grasp on. Remember my knowledge/skillsets are limited + currently. + bddebian: tschwinge has good instructions to build glibc + keep your tree around and it shouldn't be long to hack on it + for hurd_condition_wait, i can help + Oh I was thinking about using Debian glibc for now. You think I + should do it from git? + no + debian rules are even more reliable + (just don't build all the variants) + `debian/rules build_libc` builds the plain i386 variant only + So put pthread_hurd_cond_wait in it's own .c file or just put it + in pt-cond-wait.c ? + i'd put it in pt-cond-wait.C + youpi or braunr: OK, another dumb question. What (if anything) + should I do about hurd/hurd/signal.h. Should I stop it from including + cthreads? + it's not a dumb question. it should probably stop, yes, but there + might be uncovered issues, which we'll have to take care of + Well I know antrik suggested trying to keep compatibility but I + don't see how you would do that + compability between what ? + and source and/or binary ? + hurd/signal.h implicitly including cthreads.h + ah + well yes, it has to change obviously + Which will break all the cthreads stuff of course + So are we agreeing on pthread_hurd_cond_wait()? + that's fine + Ugh, shit there is stuff in glibc using cthreads?? + like what ? + hurdsig, hurdsock, setauth, dtable, ... + it's just using the compatibility stuff, that pthread does provide + but it includes cthreads.h implicitly + s/it/they in many cases + not a problem, we provide the functions + Hmm, then what do I do about signal.h? It includes chtreads.h + because it uses extern struct mutex ... + ah, then keep the include + the pthread mutexes are compatible with that + we'll clean that afterwards + arf, OK + that's what I meant by "uncover issues" + + +## IRC, freenode, #hurd, 2012-07-28 + + Well crap, glibc built but I have no symbol for + pthread_hurd_cond_wait in libpthread.so :( + Hmm, I wonder if I have to add pthread_hurd_cond_wait to + forward.c and Versions? (Versions obviously eventually) + bddebian: most probably not about forward.c, but definitely you + have to export public stuff using Versions + + +## IRC, freenode, #hurd, 2012-07-29 + + braunr: http://paste.debian.net/181078/ + ugh, inline functions :/ + "Tell hurd_thread_cancel how to unblock us" + i think you need that one too :p + ?? + well, they work in pair + one cancels, the other notices it + hurd_thread_cancel is in the hurd though, iirc + or uh wait + no it's in glibc, hurd/thread-cancel.c + otherwise it looks like a correct reuse of the original code, but + i need to understand the pthreads internals better to really say anything + + +## IRC, freenode, #hurd, 2012-08-03 + + pinotree: what do you think of + condition_implies/condition_unimplies ? + the work on pthread will have to replace those + + +## IRC, freenode, #hurd, 2012-08-06 + + bddebian: so, where is the work being done ? + braunr: Right now I would just like to testing getting my glibc + with pthread_hurd_cond_wait installed on the clubber subhurd. It is in + /home/bdefreese/glibc-debian2 + we need a git branch + braunr: Then I want to rebuild hurd with Thomas's pthread + patches against that new libc + Aye + i don't remember, did thomas set a git repository somewhere for + that ? + He has one but I didn't have much luck with it since he is using + an external libpthreads + i can manage the branches + I was actually patching debian/hurd then adding his patches on + top of that. It is in /home/bdefreese/debian-hurd but he has updateds + some stuff since then + Well we need to agree on a strategy. libpthreads only exists in + debian/glibc + it would be better to have something upstream than to work on a + debian specific branch :/ + tschwinge: do you think it can be done + ? + + +## IRC, freenode, #hurd, 2012-08-07 + + braunr: You mean to create on Savannah branches for the + libpthread conversion? Sure -- that's what I have been suggesting to + Barry and Thomas D. all the time. + + braunr: OK, so I installed my glibc with + pthread_hurd_condition_wait in the subhurd and now I have built Debian + Hurd with Thomas D's pthread patches. + bddebian: i'm not sure we're ready for tests yet :p + braunr: Why not? :) + bddebian: a few important bits are missing + braunr: Like? + like condition_implies + i'm not sure they have been handled everywhere + it's still interesting to try, but i bet your system won't finish + booting + Well I haven't "installed" the built hurd yet + I was trying to think of a way to test a little bit first, like + maybe ext2fs.static or something + Ohh, it actually mounted the partition + How would I actually "test" it? + git clone :p + building a debian package inside + removing the whole content after + that sort of things + Hmm, I think I killed clubber :( + Yep.. Crap! :( + ? + how did you do that ? + Mounted a new partition with the pthreads ext2fs.static then did + an apt-get source hurd to it.. + what partition, and what mount point ? + I added a new 2Gb partition on /dev/hd0s6 and set the translator + on /home/bdefreese/part6 + shouldn't kill your hurd + Well it might still be up but killed my ssh session at the very + least :) + ouch + braunr: Do you have debugging enabled in that custom kernel you + installed? Apparently it is sitting at the debug prompt. diff --git a/open_issues/libpthread_CLOCK_MONOTONIC.mdwn b/open_issues/libpthread_CLOCK_MONOTONIC.mdwn index 2c8f10f8..86a613d3 100644 --- a/open_issues/libpthread_CLOCK_MONOTONIC.mdwn +++ b/open_issues/libpthread_CLOCK_MONOTONIC.mdwn @@ -76,3 +76,30 @@ License|/fdl]]."]]"""]] kind of, yes I have reverted the change in libc for now ok + + +## IRC, freenode, #hurd, 2012-07-22 + + pinotree, youpi: I once saw you discussing issue with librt + usage is libpthread -- is it this issue? http://sourceware.org/PR14304 + tschwinge: (librt): no + it's the converse + tschwinge: kind of + unexpectedly loading libpthread is almost never a problem + it's unexpectedly loading librt which was a problem for glib + tschwinge: basically what happened with glib is that at configure + time, it could find clock_gettime without any -lrt, because of pulling + -lpthread, but at link time that wouldn't happen + + +## IRC, freenode, #hurd, 2012-07-23 + + pinotree: oh, i see you changed __pthread_timedblock to use + clock_gettime + i wonder if i should do the same in libthreads + yeah, i realized later it was a bad move + ok + i'll stick to gettimeofday for now + it'll be safe when implementing some private + __hurd_clock_get{time,res} in libc proper, making librt just forward to + it and adapting the gettimeofday to use it diff --git a/open_issues/mission_statement.mdwn b/open_issues/mission_statement.mdwn index 17f148a9..b32d6ba6 100644 --- a/open_issues/mission_statement.mdwn +++ b/open_issues/mission_statement.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -658,3 +658,42 @@ License|/fdl]]."]]"""]] FUSE in this case though... it doesn't really change the functionality of the VFS; only rearranges the tree a bit (might even be doable with standard Linux features) + + +# IRC, freenode, #hurd, 2012-07-25 + + because it has design problems, because it has implementation + problems, lots of problems, and far too few people to keep up with other + systems that are already dominating + also, considering other research projects get much more funding + than we do, they probably have a better chance at being adopted + you consider the Hurd to be a research project? + and as they're more recent, they sometimes overcome some of the + issues we have + yes and no + yes because it was, at the time of its creation, and it hasn't + changed much, and there aren't many (any?) other systems with such a + design + and no because the hurd is actually working, and being released as + part of something like debian + which clearly shows it's able to do the stuff it was intended for + i consider it a technically very interesting project for + developers who want to know more about microkernel based extensible + systems + rah: I don't expect the Hurd to achieve world domination, because + most people consider Linux "good enough" and will stick with it + I for my part think though we could do better than Linux (in + certain regards I consider important), which is why I still consider it + interesting and worthwhile + I think that in some respect the OS scene may evolve a bit + like the PL one, where everyone progressively adopts ideas from Lisp but + doesn't want to do Lisp: everyone slowly shifts towards what µ-kernels + OSes have done from the start, but they don't want µ-kernels... + nowhere_man: that's my opinion too + and this is why i think something like the hurd still has valuable + purpose + braunr: in honesty, I still ponder the fact that it's my + coping mechanism to accept being a Lisp and Hurd fan ;-) + nowhere_man: it can be used that way too + functional programming is getting more and more attention + so it's fine if you're a lisp fan really diff --git a/open_issues/multithreading.mdwn b/open_issues/multithreading.mdwn index 5924d3f9..c9567828 100644 --- a/open_issues/multithreading.mdwn +++ b/open_issues/multithreading.mdwn @@ -49,6 +49,91 @@ Tom Van Cutsem, 2009. right +## IRC, freenode, #hurd, 2012-07-16 + + hm interesting + when many threads are creating to handle requests, they + automatically create a pool of worker threads by staying around for some + time + this time is given in the libport call + but the thread always remain + they must be used in turn each time a new requet comes in + ah no :(, they're maintained by the periodic sync :( + hm, still not that, so weird + braunr: yes, that's a known problem: unused threads should go away + after some time, but that doesn't actually happen + don't remember though whether it's broken for some reason, or + simply not implemented at all... + (this was already a known issue when thread throttling was + discussed around 2005...) + antrik: ok + hm threads actually do finish .. + libthreads retain them in a pool for faster allocations + hm, it's worse than i thought + i think the hurd does its job well + the cthreads code never reaps threads + when threads are finished, they just wait until assigned a new + invocation + + i don't understand ports_manage_port_operations_multithread :/ + i think i get it + why do people write things in such a complicated way .. + such code is error prone and confuses anyone + + i wonder how well nested functions interact with threads when + sharing variables :/ + the simple idea of nested functions hurts my head + do you see my point ? :) variables on the stack automatically + shared between threads, without the need to explicitely pass them by + address + braunr: I don't understand. why would variables on the stack be + shared between threads?... + antrik: one function declares two variables, two nested functions, + and use these in separate threads + are the local variables still "local" + ? + braunr: I would think so? why wouldn't they? threads have separate + stacks, right?... + I must admit though that I have no idea how accessing local + variables from the parent function works at all... + me neither + + why don't demuxers get a generic void * like every callback does + :(( + ? + antrik: they get pointers to the input and output messages only + why is this a problem? + ports_manage_port_operations_multithread can be called multiple + times in the same process + each call must have its own context + currently this is done by using nested functions + also, why demuxers return booleans while mach_msg_server_timeout + happily ignores them :( + callbacks shouldn't return anything anyway + but then you have a totally meaningless "return 1" in the middle + of the code + i'd advise not using a single nested function + I don't understand the remark about nested function + they're just horrible extensions + the compiler completely hides what happens behind the scenes, and + nasty bugs could come out of that + i'll try to rewrite ports_manage_port_operations_multithread + without them and see if it changes anything + but it's not easy + also, it makes debugging harder :p + i suspect gdb hangs are due to that, since threads directly start + on a nested function + and if i'm right, they are created on the stack + (which is also horrible for security concerns, but that's another + story) + (at least the trampolines) + I seriously doubt it will change anything... but feel free to + prove me wrong :-) + well, i can see really weird things, but it may have nothing to do + with the fact functions are nested + (i still strongly believe those shouldn't be used at all) + + # Alternative approaches: * diff --git a/open_issues/packaging_libpthread.mdwn b/open_issues/packaging_libpthread.mdwn index d243aaaa..528e0b01 100644 --- a/open_issues/packaging_libpthread.mdwn +++ b/open_issues/packaging_libpthread.mdwn @@ -137,3 +137,53 @@ License|/fdl]]."]]"""]] I know, I've asked tschwinge about it it's not urging anyway right + + +## IRC, freenode, #hurd, 2012-07-21 + + tschwinge: btw, samuel suggested to rename in libpthread ia32 → + i386, to better fit with glibc + pinotree: Hmm, that'd somewhat break interopability with + Viengoos' use of libpthread. + how would it break with viengoos? + I assume it is using the i386 names. Hmm, no isn't it x86_64 + only? + I'll check. + does it use automake (with the Makefile.am in repo)? + I have no idea what the current branch arrangement is. + tschwinge: it looks like ia32 is hardcoded in Makefile and + Makefile.am + + +## IRC, freenode, #hurd, 2012-08-07 + + Also, the Savannah hurd/glibc.git one does not/not yet include + libpthread. + But that could easily be added as a Git submodule. + youpi: To put libpthread into glibc it is literally enough to + make Savannah hurd/libpthread.git appear at [glibc]/libpthread? + tschwinge: there are some patches needed in the rest of the tree + see in debian, libpthread_clean.diff, tg-libpthread_depends.diff, + unsubmitted-pthread.diff, unsubmitted-pthread_posix_options.diff + The libpthread in Debian glibc is + hurd/libpthread.git:b428baaa85c0adca9ef4884c637f289a0ab5e2d6 but with + 25260994c812050a5d7addf125cdc90c911ca5c1 »Store self in __thread variable + instead of threadvar« reverted (why?), and the following additional + change applied to Makefile: + ifeq ($(IN_GLIBC),yes) + $(inst_libdir)/libpthread.so: + $(objpfx)libpthread.so$(libpthread.so-version) \ + $(+force) + - ln -sf $(slibdir)/libpthread.so$(libpthread.so-version) + $@ + + ln -sf libpthread.so$(libpthread.so-version) $@ + tschwinge: is there any plan to merge libpthread.git in glibc.git + upstream ? + braunr, youpi: Has not yet been discussed with Roland, as far + as I know. + has not + libpthread.diff is supposed to be a verbatim copy of the repository + and then there are a couple patches which don't (yet) make sense + upstream + the slibdir change, however, is odd + it must be a leftover diff --git a/open_issues/pci_arbiter.mdwn b/open_issues/pci_arbiter.mdwn new file mode 100644 index 00000000..7730cee0 --- /dev/null +++ b/open_issues/pci_arbiter.mdwn @@ -0,0 +1,256 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +For [[DDE]]/X.org/... + + +# IRC, freenode, #hurd, 2012-02-19 + + antrik: we should probably add a gsoc idea on pci bus arbitration + DDE is still experimental for now so it's ok that you have to + configure it by hand, but it should be automatic at some ponit + + +## IRC, freenode, #hurd, 2012-02-21 + + i'm not familiar with the new gnumach interface for userspace + drivers, but can this pci enumerator be written with it as it is ? + (i'm not asking for a precise answer, just yes - even probably - + or no) + (idk or utsl will do as well) + I'd say yes + since all drivers need is interrupts, io ports and iomem + the latter was already available through /dev/mem + io ports through the i386 rpcs + the changes provide both interrupts, and physical-contiguous + allocation + it should be way enough + youpi: ok + youpi: thanks for the details :) + braunr: this was mentioned in the context of the interrupt + forwarding interface... the original one implemented by zhengda isn't + suitable for a PCI server; but the ones proposed by youpi and tschwinge + would work + same for the physical memory interface: the current implementation + doesn't allow delegation; but I already said that it's wrong + + +# IRC, freenode, #hurd, 2012-07-15 + + youpi: Oh, BTW, I keep meaning to ask you. Could sound be done + with dde or would there still need to be some kernel work? + bddebian: we'd need a PCI arbitrer for that + for now just one userland poking with PCI is fine + but two can produce bonks + They can't use the same? + that's precisely the matter + they have to use the same + and not poke with it themselves + that's what an arbiter is for + OK, so if we don't have a PCI arbiter now, how do things like + netdde and video not collide currently? + s/netdde/network/ + or disk for that matter + bddebian: ah currently, well currently, the network is the only + thing using the pci bus + How is that possible when I have a PCI video card and disk + controller? + they are accessed through compatible means + I suppose one of the hardest parts is prioritization? + i don't think it matters much, no + bddebian: netdde and Xorg don't collide essentially because they + are not started at the same time (hopefully) + braunr: What do you mean it doesn't matter? + bddebian: well the point is rather serializing access, we don't + need more + do other systems actually schedule access to the pci bus ? + From what I am reading, yes + ok + + +# IRC, freenode, #hurd, 2012-07-16 + + youpi: the lack of a PCI arbiter is a problem, but I wounldn't + consider it a precondition for adding another userspace driver + class... it's up to the user to make sure he has only one class active, + or take the risk of not doing so... + (plus, I suspect writing the arbiter is a smaller task than + implementing another DDE class anyways...) + Where would the arbiter need to reside, in gnumach? + bddebian: kernel would be one possible place (with the advantage + of running both userspace and kernel drivers without the potential for + conflicts) + but I think I would prefer a userspace server + antrik: we'd rather have PCI devices automatically set up + just like /dev/netdde is already set up for the user + so you can't count on the user + for the arbitrer, it could as well be userland, while still + interacting with the kernel for some devices + we however "just" need to get disk drivers in userland to drop PCI + drivers from kernel, actually + + +# IRC, freenode, #hurd, 2012-07-17 + + youpi: So this PCI arbiter should be a hurd server? + that'd be better + youpi: Is there anything existing to look at as a basis? + no idea off-hand + I mean you couldn't take what netdde does and generalize it? + netdde doesn't do any arbitration + + +# IRC, OFTC, #debian-hurd, 2012-07-19 + + youpi: Well at some point if you ever have time I'd like to + understand better how you see the PCI architecture working in Hurd. + I.E. would you expect the server to do enumeration and arbitration? + I'd expect both, yes, but that's probably to be discussed rather + with antrik, he's the one who took some time to think about it + netdde uses libpciaccess currently, right? + yes + libpciaccess would have to be fixed into using the arbitrer + (that'd fix xorg as well) + Man, I am still a bit unclear on how this all interacting + currently.. :( + currently it's not + and it's just by luck that it doesn't break + Long term xxxdde would use the new server, correct? + (well, we are also sure that the gnumach enumeration comes always + before the netdde enumeration, and xorg is currently not started + automatically, so its enumeration is also always after that) + yes + the server would essentially provide an interface equivalent to + libpciaccess + Right + In general, where does the pci map get "stored"? In GNU/Linux, + is it all /proc based? + what do you mean by "pci map" ? + Once I have enumerated all of the buses and devices, does it + stay stored or is it just redone for every call to a pci device? + in linux it's stored in the kernel + the abritrator would store it itself + + +# IRC, freenode, #hurd, 2012-07-20 + + antrik: BTW, youpi says you are the one to talk to for design of + a PCI server :) + oh, am I? + * antrik feels honoured :-) + I guess it's true though: I *did* spent a little thought on + it... even mentioned something in my thesis IIRC + there is one tricky aspect to it though, which I'm not sure how to + handle best: we need two different instances of libpciaccess + Why two instances of libpciaccess? + one used by the PCI server to access the hardware directly (using + the existing port poking backend), and one using a new backend to access + our PCI server... + bddebian: hum, both i guess ? + antrik: Why wouldn't the server access the hardware directly? I + thought libpciaccess was supposed to be generic on purpose? + hm... guess I wasn't clear + the point is that the PCI server should use the direct hardware + access backend of libpciaccess + however, *clients* should use the PCI server backend of + libpciaccess + I'm not sure backends can be selected at runtime... + which might mean that we actually have to compile two different + versions of the library. erk. + So you are saying the pci server should itself use libpci access + rather than having it's own? + admittedly, that's not the most fundamental design decision to + make ;-) + bddebian: yes. no need to rewrite (or copy) this code... + Hmm + actually that was the plan all along when I first suggested + implementing the register poking backend for libpciaccess + Hmm, not sure I like it but I am certainly in no position to + question it right now :) + why don't you like it ? + I shouldn't need an Xorg specific library to access PCI on my OS + :) + oh + Though I don't disagree that reinventing the wheel is a bit + tedious. :) + bddebian: although it originates from X.Org, I don't think there + is anything about the library technically making it X-specific... + yes that's my opinion too + (well, there are some X-specific functions IIRC, but these do not + hurt the other functionality) + But what is there is api/abi breakage? :) + s/is/if/ + BTW according to rdepends there appear to be a number of non-X + things using the library now + like, uhm, hurd + yeah, that too... we are already using it for DDE + if you have deb-src lines in your sources.list, use the + grep-dctrl power: + grep-dctrl -sPackage -FBuild-Depends libpciaccess-dev + /var/lib/apt/lists/*_source_Sources | sort -u + I know we are using it for netdde. + nice thing about it is that once we have the PCI server and an + appropriate backend for libpciaccess, the same netdde and X binaries + should work either with or without the PCI server + Then why have the server at all? + it's the arbiter + you can use the library directly only if you're the only user + and what antrik means is that the interface should be the same for + both modes + Ugh, that is where I am getting confused + In that case shouldn't everything use libpciaccess and the PCI + server has to arbitrate the requests? + bd ? + bddebian: yes + bddebian: but they use the indirect version of the library + whereas the server uses the raw version + OK, I gotcha (I think) + (but they both provide the same interface, so if you don't have a + pci server and you know you're the only user, the direct version can be + used) + But I am not sure I see the difference between creating a second + library or just moving the raw access to the PCI server :) + uh, there is no difference in that + and you shouldn't do it + (if that's what antrik meant at least) + if you can select the backend (raw or pci server) easily, then + stick to the same code base + That's where I struggle. In my worthless opinion, raw access + should be the OS job while indirect access would be the libraries + responsibility + that's true + but as an optimization, if an application is the only user, it can + directly use raw access + How would you know that? + I'm sorry if these are dumb questions + hum, don't try to make this behaviour automatic + it would be selected by the user through command line switches + But the OS itself uses PCI for things like disk access and + video, no? + (it could be automatic but it makes things more complicated) + you don't need an arbiter all the time + i can't tell you more, wait for antrik to return + i realize i might already have said some bullshit + bddebian: well, you have a point there that once we have the + arbiter and use it for everthing, it isn't strictly useful to still have + the register poking in the library + however, the code will remain in the library anyways, so we better + continue using it rather than introducing redundancy... + but again, that's rather a side issue concerning the design of the + PCI server + antrik: Fair enough. :) So how would I even start on this? + bddebian: actually, libpciaccess is a good starting point: + checking the API should give you a fairly good idea what functionality + the server needs to implement + (+1 on library (re)use) + antrik: KK + sorry, I'm a bit busy right now... diff --git a/open_issues/performance.mdwn b/open_issues/performance.mdwn index 8dbe1160..ec14fa52 100644 --- a/open_issues/performance.mdwn +++ b/open_issues/performance.mdwn @@ -52,3 +52,32 @@ call|/glibc/fork]]'s case. the more i study the code, the more i think a lot of time is wasted on cpu, unlike the common belief of the lack of performance being only due to I/O + + +## IRC, freenode, #hurd, 2012-07-23 + + there are several kinds of scalability issues + iirc, i found some big locks in core libraries like libpager and + libdiskfs + but anyway we can live with those + in the case i observed, ext2fs, relying on libdiskfs and libpager, + scans the entire file list to ask for writebacks, as it can't know if the + pages are dirty or not + the mistake here is moving part of the pageout policy out of the + kernel + so it would require the kernel to handle periodic synces of the + page cache + braunr: as for big locks: considering that we don't have any SMP + so far, does it really matter?... + antrik: yes + we have multithreading + there is no reason to block many threads while if most of them + could continue + -while + so that's more about latency than throughput? + considering sleeping/waking is expensive, it's also about + throughput + currently, everything that deals with sleepable locks (both + gnumach and the hurd) just wake every thread waiting for an event when + the event occurs (there are a few exceptions, but not many) + ouch diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index 710c746b..657318cd 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -1565,3 +1565,283 @@ License|/fdl]]."]]"""]] mcsim1: just use sane values inside the kernel :p this simplifies things by only adding the new vm_advise call and not change the existing external pager interface + + +## IRC, freenode, #hurd, 2012-07-12 + + mcsim: so, to begin with, tell us what state you've reached please + braunr: I'm writing code for hurd and gnumach. For gnumach I'm + implementing memory policies now. RANDOM and NORMAL seems work, but in + hurd I found error that I made during editing ext2fs. So for now ext2fs + does not work + policies ? + what about mechanism ? + also I moved some translators to new interface. + It works too + well that's impressive + braunr: I'm not sure yet that everything works + right, but that's already a very good step + i thought you were still working on the interfaces to be honest + And with mechanism I didn't implement moving pages to inactive + queue + what do you mean ? + ah you mean with the sequential policy ? + yes + you can consider this a secondary goal + sequential I was going to implement like you've said, but I still + want to support moving pages to inactive queue + i think you shouldn't + first get to a state where clustered transfers do work fine + policies are implemented in function calculate_clusters + then, you can try, and measure the difference + ok. I'm now working on fixing ext2fs + so, except from bug squashing, what's left to do ? + finish policies and ext2fs; move fatfs, ufs, isofs to new + interface; test this all; edit patches from debian repository, that + conflict with my changes; rearrange commits and fix code indentation; + update documentation; + think about measurements too + mcsim: Please don't spend a lot of time on ufs. No testing + required for that one. + and keep us informed about your progress on bug fixing, so we can + test soon + Forgot about moving system to new interfaces (I mean determine form + of vm_advise and memory_object_change_attributes) + s/determine/final/ + braunr: ok. + what do you mean "moving system to new interfaces" ? + braunr: I also pushed code changes to gnumach and hurd git + repositories + I met an issue with memory_object_change_attributes when I tried to + use it as I have to update all applications that use it. This includes + libc and translators that are not in hurd repository or use debian + patches. So I will not be able to run system with new + memory_object_change_attributes interface, until I update all software + that use this rpc + this is a bit like the problem i had with my change + the solution is : don't do it + i mean, don't change the interface in an incompatible way + if you can't change an existing call, add a new one + temporary I changed memory_object_set_attributes as it isn't used + any more. + braunr: ok. Adding new call is a good idea :) + + +## IRC, freenode, #hurd, 2012-07-16 + + mcsim: how did you deal with multiple page transfers towards the + default pager ? + braunr: hello. Didn't handle this yet, but AFAIR default pager + supports multiple page transfers. + mcsim: i'm almost sure it doesn't + braunr: indeed + braunr: So, I'll update it just other translators. + like other translators you mean ? + *just as + braunr: yes + ok + be aware also that it may need some support in vm_pageout.c in + gnumach + braunr: thank you + if you see anything strange in the default pager, don't hesitate + to talk about it + braunr: ok. I didn't finish with ext2fs yet. + so it's a good thing you're aware of it now, before you begin + working on it :) + braunr: I'm working on ext2 now. + yes i understand + i meant "before beginning work on the default pager" + ok + + mcsim: BTW, we were mostly talking about readahead (pagein) over + the past weeks, so I wonder what the status on clustered page*out* is?... + antrik: I don't work on this, but following, I think, is an example + of *clustered* pageout: _pager_seqnos_memory_object_data_return: object = + 113, seqno = 4, control = 120, start_address = 0, length = 8192, dirty = + 1. This is an example of debugging printout that shows that pageout + manipulates with chunks bigger than page sized. + antrik: Another one with bigger length + _pager_seqnos_memory_object_data_return: object = 125, seqno = 124, + control = 132, start_address = 131072, length = 126976, dirty = 1, kcopy + mcsim: that's odd -- I didn't know the functionality for that even + exists in our codebase... + my understanding was that Mach always sends individual pageout + requests for ever single page it wants cleaned... + (and this being the reason for the dreadful thread storms we are + facing...) + antrik: ok + antrik: yes that's what is happening + the thread storms aren't that much of a problem now + (by carefully throttling pageouts, which is a task i intend to + work on during the following months, this won't be an issue any more) + + +## IRC, freenode, #hurd, 2012-07-19 + + I moved fatfs, ufs, isofs to new interface, corrected some errors + in other that I already moved, moved kernel to new interface (renamed + vm_advice to vm_advise and added rpcs memory_object_set_advice and + memory_object_get_advice). Made some changes in mechanism and tried to + finish ext2 translator. + braunr: I've got an issue with fictitious pages... + When I determine bounds of cluster in external object I never know + its actual size. So, mo_data_request call could ask data that are behind + object bounds. The problem is that pager returns data that it has and + because of this fictitious pages that were allocated are not freed. + why don't you know the size ? + I see 2 solutions. First one is do not allocate fictitious pages at + all (but I think that there could be issues). Another lies in allocating + fictitious pages, but then freeing them with mo_data_lock. + braunr: Because pages does not inform kernel about object size. + i don't understand what you mean + I think that second way is better. + so how does it happen ? + you get a page fault + Don't you understand problem or solutions? + then a lookup in the map finds the map entry + and the map entry gives you the link to the underlying object + from vm_object.h: vm_size_t size; /* + Object size (only valid if internal) */ + mcsim: ugh + For external they are either 0x8000 or 0x20000... + and for internal ? + i'm very surprised to learn that + braunr: for internal size is actual + right sorry, wrong question + did you find what 0x8000 and 0x20000 are ? + for external I met only these 2 magic numbers when printed out + arguments of functions _pager_seqno_memory_object_... when they were + called. + yes but did you try to find out where they come from ? + braunr: no. I think that 0x2000(many zeros) is maximal possible + object size. + what's the exact value ? + can't tell exactly :/ My hurd box has broken again. + mcsim: how does the vm find the backing content then ? + braunr: Do you know if it is guaranteed that map_entry size will be + not bigger than external object size? + mcsim: i know it's not + but you can use the map entry boundaries though + braunr: vm asks pager + but if the page is already present + how does it know ? + it must be inside a vm_object .. + If I can use these boundaries than the problem, I described is not + actual. + good + it makes sense to use these boundaries, as the application can't + use data outside the mapping + I ask page with vm_page_lookup + it would matter for shared objects, but then they have their own + faults :p + ok + so the size is actually completely ignord + if it is present than I stop expansion of cluster. + which makes sense + braunr: yes, for external. + all right + use the mapping boundaries, it will do + mcsim: i have only one comment about what i could see + mcsim: there are 'advice' fields in both vm_map_entry and + vm_object + there should be something else in vm_object + i told you about pages before and after + mcsim: how are you using this per object "advice" currently ? + (in addition, using the same name twice for both mechanism and + policy is very sonfusing) + confusing* + braunr: I try to expand cluster as much as it possible, but not + much than limit + they both determine policy, but advice for entry has bigger + priority + that's wrong + mapping and content shouldn't compete for policy + the mapping tells the policy (=the advice) while the content tells + how to implement (e.g. how much content) + IMO, you could simply get rid of the per object "advice" field and + use default values for now + braunr: What sense these values for number of pages before and + after should have? + or use something well known, easy, and effective like preceding + and following pages + they give the vm the amount of content to ask the backing pager + braunr: maximal amount, minimal amount or exact amount? + neither + that's why i recommend you forget it for now + but + imagine you implement the three standard policies (normal, random, + sequential) + then the pager assigns preceding and following numbers for each of + them, say [5;5], [0;0], [15;15] respectively + these numbers would tell the vm how many pages to ask the pagers + in a single request and from where + braunr: but in fact there could be much more policies. + yes + also in kernel context there is no such unit as pager. + so there should be a call like memory_object_set_advice(int + advice, int preceding, int following); + for example + what ? + the pager is the memory manager + it does exist in kernel context + (or i don't understand what you mean) + there is only port, but port could be either pager or something + else + no, it's a pager + it's a port whose receive right is hold by a task implementing the + pager interface + either the default pager or an untrusted task + (or null if the object is anonymous memory not yet sent to the + default pager) + port is always pager? + the object port is, yes + struct ipc_port *pager; /* Where to get + data */ + So, you suggest to keep set of advices for each object? + i suggest you don't change anything in objects for now + keep the advice in the mappings only, and implement default + behaviour for the known policies + mcsim: if you understand this point, then i have nothing more to + say, and we should let nowhere_man present his work + braunr: ok. I'll implement only default behaviors for know policies + for now. + (actually, using the mapping boundaries is slightly unoptimal, as + we could have several mappings for the same content, e.g. a program with + read only executable mapping, then ro only) + mcsim: another way to know the "size" is to actually lookup for + pages in objects + hm no, that's not true + braunr: But if there is no page we have to ask it + and I don't understand why using mappings boundaries is unoptimal + here is bash + 0000000000400000 868K r-x-- /bin/bash + 00000000006d9000 36K rw--- /bin/bash + two entries, same file + (there is the anonymous memory layer for the second, but it would + matter for the first cow faults) + + +## IRC, freenode, #hurd, 2012-08-02 + + braunr: You said that I probably need some support in vm_pageout.c + to make defpager work with clustered page transfers, but TBH I thought + that I have to implement only pagein. Do you expect from me implementing + pageout either? Or I misunderstand role of vm_pageout.c? + no + you're expected to implement only pagins for now + pageins + well, I'm finishing merging of ext2fs patch for large stores and + work on defpager in parallel. + braunr: Also I didn't get your idea about configuring of paging + mechanism on behalf of pagers. + which one ? + braunr: You said that pager has somehow pass size of desired + clusters for different paging policies. + mcsim: i said not to care about that + and the wording isn't correct, it's not "on behalf of pagers" + servers? + pagers could tell the kernel what size (before and after a faulted + page) they prefer for each existing policy + but that's one way to do it + defaults work well too + as shown in other implementations diff --git a/open_issues/pfinet_vs_system_time_changes.mdwn b/open_issues/pfinet_vs_system_time_changes.mdwn index 46705047..09b00d30 100644 --- a/open_issues/pfinet_vs_system_time_changes.mdwn +++ b/open_issues/pfinet_vs_system_time_changes.mdwn @@ -11,14 +11,16 @@ License|/fdl]]."]]"""]] [[!tag open_issue_hurd]] -IRC, unknown channel, unknown date. + +# IRC, unknown channel, unknown date I did a sudo date... and the machine hangs -This was very likely a misdiagnosis: +This was very likely a misdiagnosis. + -IRC, freenode, #hurd, 2011-03-25: +# IRC, freenode, #hurd, 2011-03-25 antrik: I suspect it'S some timing stuff in pfinet that perhaps uses absolute time, and somehow wildely gets confused? @@ -42,7 +44,8 @@ IRC, freenode, #hurd, 2011-03-25: wrap-around, and thus the same result.) Yes. -IRC, freenode, #hurd, 2011-10-26: + +# IRC, freenode, #hurd, 2011-10-26 anyways, when ntpdate adjusts to the past, the connections hang, roughly for the amount of time being adjusted @@ -50,7 +53,8 @@ IRC, freenode, #hurd, 2011-10-26: (well, if it's long enough, they probably timeout on the other side...) -IRC, freenode, #hurd, 2011-10-27: + +# IRC, freenode, #hurd, 2011-10-27 oh, another interesting thing I observed is that the the subhurd pfinet did *not* drop the connection... only the main Hurd one. I thought @@ -60,7 +64,8 @@ IRC, freenode, #hurd, 2011-10-27: where I set the date is affected, and not the pfinet in the other instance -IRC, freenode, #hurd, 2012-06-28: + +# IRC, freenode, #hurd, 2012-06-28 great, now setting the date/time fucked my machine yes, we lack a monotonic clock @@ -80,3 +85,17 @@ IRC, freenode, #hurd, 2012-06-28: it fucked me because I now cannot get to it.. :) bddebian: that's odd... you should be able to just log in again IIRC + + +# IRC, freenode, #hurd, 2012-07-29 + + pfinet can't cope with larger system time changes because it can't + use a monotonic clock + +[[clock_gettime]]. + + well when librt becomes easily usable everywhere (it it's + possible), it will be quite easy to work around this issue + yes and no, you just need a monotonic clock and clock_gettime + able to use it + why "no" ? diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn index abec304d..6bed94ca 100644 --- a/open_issues/select.mdwn +++ b/open_issues/select.mdwn @@ -215,6 +215,1186 @@ IRC, unknown channel, unknown date: it's better than nothing yes +# IRC, freenode, #hurd, 2012-07-21 + + damn, select is actually completely misdesigned :/ + iiuc, it makes servers *block*, in turn :/ + can't be right + ok i understand it better + yes, timeouts should be passed along with the other parameters to + correctly implement non blocking select + (or the round-trip io_select should only ask for notification + requests instead of making a server thread block, but this would require + even more work) + adding the timeout in the io_select call should be easy enough for + whoever wants to take over a not-too-complicated-but-not-one-liner-either + task :) + braunr: why is a blocking server thread a problem? + antrik: handling the timeout at client side while server threads + block is the problem + the timeout must be handled along with blocking obviously + so you either do it at server side when async ipc is available, + which is the case here + or request notifications (synchronously) and block at client side, + waiting forthose notifications + braunr: are you saying the client has a receive timeout, but when + it elapses, the server thread keeps on blocking?... + antrik: no i'm referring to the non-blocking select issue we have + antrik: the client doesn't block in this case, whereas the servers + do + which obviously doesn't work .. + see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=79358 + this is the reason why vim (and probably others) are slow on the + hurd, while not consuming any cpu + the current work around is that whenevever a non-blocking select + is done, it's transformed into a blocking select with the smallest + possible timeout + whenever* + braunr: well, note that the issue only began after fixing some + other select issue... it was fine before + apparently, the issue was raised in 2000 + also, note that there is a delay between sending the io_select + requests and blocking on the replies + when machines were slow, this delay could almost guarantee a + preemption between these steps, making the servers reply soon enough even + for a non blocking select + the problem occurs when sending all the requests and checking for + replies is done before servers have a chance the send the reply + braunr: I don't know what issue was raised in 2000, but I do know + that vim worked perfectly fine until last year or so. then some select + fix was introduced, which in turn broke vim + antrik: could be the timeout rounding, Aug 2 2010 + hum but, the problem wasn't with vim + vim does still work fine (in fact, glibc is patched to check some + well known process names and selectively fix the timeout) + which is why vim is fast and view isn't + the problem was with other services apparently + and in order to fix them, that workaround had to be introduced + i think it has nothing to do with the timeout rounding + it must be the time when youpi added the patch to the debian + package + braunr: the problem is that with the patch changing the timeout + rounding, vim got extremely slow. this is why the ugly hacky exception + was added later... + after reading the report, I agree that the timeout needs to be + handled by the server. at least the timeout=0 case. + vim uses often 0-time selects to check whether there's input + client-side handling might still be OK for other timeout settings + I guess + I'm a bit ambivalent about that + I tend to agree with Neal though: it really doesn't make much + sense to have a client-side watchdog timer for this specific call, while + for all other ones we trust the servers not to block... + or perhaps not. for standard sync I/O, clients should expect that + an operation could take long (though not forever); but they might use + select() precisely to avoid long delays in I/O... so it makes some sense + to make sure that select() really doesn't delay because of a busy server + OTOH, unless the server is actually broken (in which anything + could happen), a 0-time select should never actually block for an + extended period of time... I guess it's not wrong to trust the servers on + that + pinotree: hm... that might explain a certain issue I *was* + observing with Vim on Hurd -- though I never really thought about it + being an actual bug, as opposed to just general Hurd sluggishness... + but it makes sense now + antrik: + http://patch-tracker.debian.org/patch/series/view/eglibc/2.13-34/hurd-i386/local-select.diff + so I guess we all agree that moving the select timeout to the + server is probably the most reasonably approach... + braunr: BTW, I wouldn't really consider the sync vs. async IPC + cases any different. the client blocks waiting for the server to reply + either way... + the only difference is that in the sync IPC case, the server might + want to take some special precaution so it doesn't have to block until + the client is ready to receive the reply + but that's optional and not really select-specific I'd say + (I'd say the only sane approach with sync IPC is probably for the + server never to wait -- if the client fails to set up for receiving the + reply in time, it looses...) + and with the receive buffer approach in Viengoos, this can be done + really easy and nice :-) + + +## IRC, freenode, #hurd, 2012-07-22 + + antrik: you can't block in servers with sync ipc + so in this case, "select" becomes a request for notifications + whereas with async ipc, you can, so it's less efficient to make a + full round trip just to ask for requests when you can just do async + requests (doing the actual blocking) and wait for any reply after + braunr: I don't understand. why can't you block in servers with + async IPC? + braunr: err... with sync IPC I mean + antrik: because select operates on more than one fd + braunr: and what does that got to do with sync vs. async IPC?... + maybe you are thinking of endpoints here, which is a whole + different story + traditional L4 has IPC ports bound to specific threads; so + implementing select requires a separate client thread for each + server. but that's not mandatory for sync IPC. Viengoos has endpoints not + bound to threads + antrik: i don't know what "endpoint" means here + but, you can't use sync IPC to implement select on multiple fds + (and thus possibly multiple servers) by blocking in the servers + you'd block in the first and completely miss the others + braunr: I still don't see why... or why async IPC would change + anything in that regard + antrik: well, you call select on 3 fds, each implemented by + different servers + antrik: you call a sync select on the first fd, obviously you'll + block there + antrik: if it's async, you don't block, you just send the + requests, and wait for any reply + like we do + braunr: I think you might be confused about the meaning of sync + IPC. it doesn't in any way imply that after sending an RPC request you + have to block on some particular reply... + antrik: what does sync mean then? + braunr: you can have any number of threads listening for replies + from the various servers (if using an L4-like model); or even a single + thread, if you have endpoints that can listen on replies from different + sources (which was pretty much the central concern in the Viengoos IPC + design AIUI) + antrik: I agree with your "so it makes some sense to make sure that + select() really doesn't delay because of a busy server" (for blocking + select) and "OTOH, unless the server is actually broken (in which + anything could happen), a 0-time select should never actually block" (for + non-blocking select) + youpi: regarding the select, I was thinking out loud; the former + statement was mostly cancelled by my later conclusions... + and I'm not sure the latter statement was quite clear + do you know when it was? + after rethinking it, I finally concluded that it's probably *not* + a problem to rely on the server to observe the timout. if it's really + busy, it might take longer than the designated timeout (especially if + timeout is 0, hehe) -- but I don't think this is a problem + and if it doens't observe the timout because it's + broken/malicious, that's not more problematic that any other RPC the + server doesn't handle as expected + ok + did somebody wrote down the conclusion "let's make select timeout + handled at server side" somewhere? + youpi: well, neal already said that in a followup to the select + issue Debian bug... and after some consideration, I completely agree with + his reasoning (as does braunr) + + +## IRC, freenode, #hurd, 2012-07-23 + + antrik: i was meaning sync in the most common meaning, yes, the + client blocking on the reply + braunr: I think you are confusing sync IPC with sync I/O ;-) + braunr: by that definition, the vast majority of Hurd IPC would be + sync... but that's obviously not the case + synchronous IPC means that send and receive happen at the same + time -- nothing more, nothing less. that's why it's called synchronous + antrik: yes + antrik: so it means the client can't continue unless he actually + receives + in a pure sync model such as L4 or EROS, this means either the + sender or the receiver has to block, so synchronisation can happen. which + one is server and which one is client is completely irrelevant here -- + this is about individual message transfer, not any RPC model on top of it + i the case of select, i assume sender == client + in Viengoos, the IPC is synchronous in the sense that transfer + from the send buffer to the receive buffer happens at the same time; but + it's asynchronous in the sense that the receiver doesn't necessarily have + to be actively waiting for the incoming message + ok, i was talking about a pure sync model + (though it most cases it will still do so...) + braunr: BTW, in the case of select, the sender is *not* the + client. the reply is relevant here, not the request -- so the client is + the receiver + (the select request is boring) + sorry, i don't understand, you seem to dismiss the select request + for no valid reason + I still don't see how sync vs. async affects the select reply + receive though... blocking seems the right approach in either case + blocking is required + but you either block in the servers, or in the client + (and if blocking in the servers, the client also blocks) + i'll explain how i see it again + there are two approaches to implementing select + 1/ send requests to all servers, wait for any reply, this is what + the hurd does + but it's possible because you can send all the requests without + waiting for the replies + 2/ send notification requests, wait for a notification + this doesn't require blocking in the servers (so if you have many + clients, you don't need as many threads) + i was wondering which approach was used by the hurd, and if it + made sense to change + TBH I don't see the difference between 1) and 2)... whether the + message from the server is called an RPC reply or a notification is just + a matter of definition + I think I see though what you are getting at + with sync IPC, if the client sent all requests and only afterwards + started to listen for replies, the servers might need to block while + trying to deliver the reply because the client is not ready yet + that's one thing yes + but even in the sync case, the client can immediately wait for + replies to each individual request -- it might just be more complicated, + depending on the specifics of the IPC design + what i mean by "send notification requests" is actually more than + just sending, it's a complete RPC + and notifications are non-blocking, yes + (with L4, it would require a separate client thread for each + server contacted... which is precisely why a different mechanism was + designed for Viengoos) + seems weird though + don't they have a portset like abstraction ? + braunr: well, having an immediate reply to the request and a + separate notification later is just a waste of resources... the immediate + reply would have no information value + no, in original L4 IPC is always directed to specific threads + antrik: some could see the waste of resource as being the + duplication of the number of client threads in the server + you could have one thread listening to replies from several + servers -- but then, replies can get lost + i see + (or the servers have to block on the reply) + so, there are really no capabilities in the original l4 design ? + though I guess in the case of select() it wouldn't really matter + if replies get lost, as long as at least one is handled... would just + require the listener thread by separate from the thread sending the + requests + braunr: right. no capabilities of any kind + that was my initial understanding too + thanks + so I partially agree: in a purely sync IPC design, it would be + more complicated (but not impossible) to make sure the client gets the + replies without the server having to block while sending replies + + arg, we need hurd_condition_timedwait (and possible + condition_timedwait) to cleanly fix io_select + luckily, i still have my old patch for condition_timedwait :> + bddebian: in order to implement timeouts in select calls, servers + now have to use a hurd_condition_timedwait function + is it possible that a thread both gets canceled and timeout on a + wait ? + looks unlikely to me + + hm, i guess the same kind of compatibility constraints exist for + hurd interfaces + so, should we have an io_select1 ? + braunr: I would use a more descriptive name: io_select_timeout() + antrik: ah yes + well, i don't really like the idea of having 2 interfaces for the + same call :) + because all select should be select_timeout :) + but ok + antrik: actually, having two select calls may be better + oh it's really minor, we do'nt care actually + braunr: two select calls? + antrik: one with a timeout and one without + the glibc would choose at runtime + right. that was the idea. like with most transitions, that's + probably the best option + there is no need to pass the timeout value if it's not needed, and + it's easier to pass NULL this way + oh + nah, that would make the transition more complicated I think + ? + ok + :) + this way, it becomes very easy + the existing io_select call moves into a select_common() function + the old variant doesn't know that the server has to return + immediately; changing that would be tricky. better just use the new + variant for the new behaviour, and deprecate the old one + and the entry points just call this common function with either + NULL or the given timeout + no need to deprecate the old one + that's what i'm saying + and i don't understand "the old variant doesn't know that the + server has to return immediately" + won't the old variant block indefinitely in the server if there + are no ready fds? + yes it will + oh, you mean using the old variant if there is no timeout value? + yes + well, I guess this would work + well of course, the question is rather if we want this or not :) + hm... not sure + we need something to improve the process of changing our + interfaces + it's really painful currnelty + inside the servers, we probably want to use common code + anyways... so in the long run, I think it simplifies the code when we can + just drop the old variant at some point + a lot of the work we need to do involves changing interfaces, and + we very often get to the point where we don't know how to do that and + hardly agree on a final version : + :/ + ok but + how do you tell the server you don't want a timeout ? + a special value ? like { -1; -1 } ? + hm... good point + i'll do it that way for now + it's the best way to test it + which way you mean now? + keeping io_select as it is, add io_select_timeout + yeah, I thought we agreed on that part... the question is just + whether io_select_timeout should also handle the no-timeout variant going + forward, or keep io_select for that. I'm really not sure + maybe I'll form an opinion over time :-) + but right now I'm undecided + i say we keep io_select + anyway it won't change much + we can just change that at the end if we decide otherwise + right + even passing special values is ok + with a carefully written hurd_condition_timedwait, it's very easy + to add the timeouts :) + antrik, braunr: I'm wondering, another solution is to add an + io_probe, i.e. the server has to return an immediate result, and the + client then just waits for all results, without timeout + that'd be a mere addition in the glibc select() call: when timeout + is 0, use that, and otherwise use the previous code + the good point is that it looks nicer in fs.defs + are there bad points? + (I don't have the whole issues in the mind now, so I'm probably + missing things) + youpi: the bad point is duplicating the implementation maybe + what duplication ? + ah you mean for the select case + yes + although it would be pretty much the same + that is, if probe only, don't enter the wait loop + could that be just some ifs here and there? + (though not making the code easier to read...) + hm i'm not sure it's fine + in that case oi_select_timeout looks ncier ideed :) + my problem with the current implementation is having the timeout + at the client side whereas the server side is doing the blocking + I wonder how expensive a notification is, compared to blocking + a blocking indeed needs a thread stack + (and kernel thread stuff) + with the kind of async ipc we have, it's still better to do it + that way + and all the code already exists + having the timeout at the client side also have its advantage + has* + latency is more precise + so the real problem is indeed the non blocking case only + isn't it bound to kernel ticks anyway ? + uh, not if your server sucks + or is loaded for whatever reason + ok, that's not what I understood by "precision" :) + I'd rather call it robustness :) + hm + right + there are several ways to do this, but the io_select_timeout one + looks fine to me + and is already well on its way + and it's reliable + (whereas i'm not sure about reliability if we keep the timeout at + client side) + btw make the timeout nanoseconds + ?? + pselect uses timespec, not timeval + do we want pselect ? + err, that's the only safe way with signals + not only, no + and poll is timespec also + not only?? + you mean ppol + ppoll + no, poll too + by "the only safe way", I mean for select calls + i understand the race issue + ppoll is a gnu extension + int poll(struct pollfd *fds, nfds_t nfds, int timeout); + ah, right, I was also looking at ppoll + any + way + we can use nanosecs + most event loops use a pipe or a socketpair + there's no reason not to + youpi: I briefly considered special-casisg 0 timeouts last time we + discussed this; but I concluded that it's probably better to handle all + timeouts server-side + I don't see why we should even discuss that + and translate signals to writes into the pipe/socketpair + antrik: ok + you can't count on select() timout precision anyways + a few ms more shouldn't hurt any sanely written program + braunr: "most" doesn't mean "all" + there *are* applications which use pselect + well mach only handles millisedonds + seconds + and it's not going out of the standard + mach is not the hurd + if we change mach, we can still keep the hurd ipcs + anyway + agagin + I reallyt don't see the point of the discussion + is there anything *against* using nanoseconds? + i chose the types specifically because of that :p + but ok i can change again + becaus what?? + i chose to use mach's native time_value_t + because it matches timeval nicely + but it doesn't match timespec nicely + no it doesn't + should i add a hurd specific time_spec_t then ? + "how do you tell the server you don't want a timeout ? a special + value ? like { -1; -1 } ?" + you meant infinite blocking? + youpi: yes + oh right, pselect is posix + actually posix says that there can be limitations on the maximum + timeout supported, which should be at least 31 days + -1;-1 is thus fine + yes + which is why i could choose time_value_t (a struct of 2 integer_t) + well, I'd say gnumach could grow a nanosecond-precision time value + e.g. for clock_gettime precision and such + so you would prefer me adding the time_spec_t time to gnumach + rather than the hurd ? + well, if hurd RPCs are using mach types and there's no mach type + for nanoseconds, it m akes sense to add one + I don't know about the first part + yes some hurd itnerfaces also use time_value_t + in general, I don't think Hurd interfaces should rely on a Mach + timevalue. it's really only meaningful when Mach is involved... + we could even pass the time value as an opaque struct. don't + really need an explicit MIG type for that. + opaque ? + an opaque type would be a step backward from multi-machine support + ;) + youpi: that's a sham anyways ;-) + what? + ah, using an opaque type, yes :) + probably why my head bugged while reading that + it wouldn't be fully opaque either. it would be two ints, right? + even if Mach doesn't know what these two ints mean, it still could to + byte order conversion, if we ever actually supported setups where it + matters... + so uh, should this new time_spec_t be added in gnumach or the hurd + ? + youpi: you're the maintainer, you decide :p + *** antrik (~olaf@port-92-195-60-96.dynamic.qsc.de) has joined channel + #hurd + well, I don't like deciding when I didn't even have read fs.defs :) + but I'd say the way forward is defining it in the hurd + and put a comment "should be our own type" above use of the mach + type + ok + *** antrik (~olaf@port-92-195-60-96.dynamic.qsc.de) has quit: Remote host + closed the connection + and, by the way, is using integer_t fine wrt the 64-bits port ? + I believe we settled on keeping integer_t a 32bit integer, like xnu + does + *** elmig (~elmig@a89-155-34-142.cpe.netcabo.pt) has quit: Quit: leaving + ok so it's not + *** antrik (~olaf@port-92-195-60-96.dynamic.qsc.de) has joined channel + #hurd + uh well + why "not" ? + keeping it 32-bits for the 32-bits userspace hurd + but i'm talking about a true 64-bits version + wouldn't integer_t get 64-bits then ? + I meant we settled on a no + like xnu does + xnu uses 32-bits integer_t even when userspace runs in 64-bits + mode ? + because things for which we'd need 64bits then are offset_t, + vm_size_t, and such + yes + ok + youpi: but then what is the type to use for long integers ? + or uintptr_t + braunr: uintptr_t + the mig type i mean + type memory_object_offset_t = uint64_t; + (and size) + well that's a 64-bits type + well, yes + natural_t and integer_t were supposed to have the processor word + size + probably I didn't understand your question + if we remove that property, what else has it ? + yes, but see rolands comment on this + ah ? + ah, no, he just says the same + braunr: well, it's debatable whether the processor word size is + really 64 bit on x86_64... + all known compilers still consider int to be 32 bit + (and int is the default word size) + not really + as in? + the word size really is 64-bits + the question concerns the data model + with ILP32 and LP64, int is always 32-bits, and long gets the + processor word size + and those are the only ones current unices support + (which is why long is used everywhere for this purpose instead of + uintptr_t in linux) + I don't think int is 32 bit on alpha? + (and probably some other 64 bit arches) + also, assuming we want to maintain the ability to support single + system images, do we really want RPC with variable size types ? + antrik: linux alpha's int is 32bit + sparc64 too + I don't know any 64bit port with 64bit int + i wonder how posix will solve the year 2038 problem ;p + time_t is a long + the hope is that there'll be no 32bit systems by 2038 :) + :) + but yes, that matters to us + number of seconds should not be just an int + we can force a 64-bits type then + i tend to think we should have no variable size type in any mig + interface + youpi: so, new hurd type, named time_spec_t, composed of two + 64-bits signed integers + braunr: i added that in my prototype of monotonic clock patch + for gnumach + oh + braunr: well, 64bit is not needed for the nanosecond part + right + it will be aligned anyway :p + I know + uh, actually linux uses long there + pinotree: i guess your patch is still in debian ? + youpi: well yes + youpi: why wouldn't it ? :) + no, never applied + braunr: because 64bit is not needed + ah, i see what you mean + oh, posix says longa ctually + *exactly* long + i'll use the same sizes + so it fits nicely with timespec + hm + but timespec is only used at the client side + glibc would simply move the timespec values into our hurd specific + type (which can use 32-bits nanosecs) and servers would only use that + type + all right, i'll do it that way, unless there are additional + comments next morning :) + braunr: we never supported federations, and I'm pretty sure we + never will. the remnants of network IPC code were ripped out some years + ago. some of the Hurd interfaces use opaque structs too, so it wouldn't + even work if it existed. as I said earlier, it's really all a sham + as for the timespec type, I think it's easier to stick with the + API definition at RPC level too + + +## IRC, freenode, #hurd, 2012-07-24 + + youpi: antrik: is vm_size_t an appropriate type for a c long ? + (appropriate mig type) + I wouldn't say so. while technically they are pretty much + guaranteed to be the same, conceptually they are entirely different + things -- it would be confusing at least to do it that way... + antrik: well which one then ? :( + braunr: no idea TBH + antrik_: that should have been natural_t and integer_t + so maybe we should new types to replace them + braunr: actually, RPCs should never have nay machine-specific + types... which makes me realise that a 1:1 translation to the POSIX + definition is actually not possible if we want to follow the Mach ideals + i agree + (well, the original mach authors used natural_t in quite a bunch + of places ..) + the mig interfaces look extremely messy to me because of this type + issue + and i just want to move forward with my work now + i could just use 2 integer_t, that would get converted in the + massive future revamp of the interfaces for the 64-bits userspace + or 2 64-bits types + i'd like us to agree on one of the two not too late so i can + continue + + +## IRC, freenode, #hurd, 2012-07-25 + + braunr: well, for actual kernel calls, machine-specific types are + probably hard to avoid... the problem is when they are used in other RPCs + antrik: i opted for a hurd specific time_data_t = struct[2] of + int64 + and going on with this for now + once it works we'll finalize the types if needed + I'm really not sure how to best handle such 32 vs. 64 bit issues + in Hurd interfaces... + you *could* consider time_t and long to be machine specific types + well, they clearly are + long is + time_t isn't really + didn't you say POSIX demands it to be longs? + we could decide to make it 64 bits in all versions of the hurd + no + posix requires the nanoseconds field of timespec to be long + the way i see it, i don't see any problem (other than a little bit + of storage and performance) using 64-bits types here + well, do we really want to use a machine-independent time format, + if the POSIX interfaces we are mapping do not?... + (perhaps we should; I'm just uncertain what's better in this case) + this would require creating new types for that + probably mach types for consistency + to replace natural_t and integer_t + now this concerns a totally different issue than select + which is how we're gonna handle the 64-bits port + because natural_t and integer_t are used almost everywhere + indeed + and we must think of 2 ports + the 32-bits over 64-bits gnumach, and the complete 64-bits one + what do we do for the interfaces that are explicitly 64 bit? + what do you mean ? + i'm not sure there is anything to do + I mean what is done in the existing ones? + like off64_t ? + yeah + they use int64 and unsigned64 + OK. so we shouldn't have any trouble with that at least... + braunr: were you adding a time_value_t in mach, but for + nanoseconds? + no i'm adding a time_data_t to the hurd + for nanoseconds yes + ah ok + (maybe sure it is available in hurd/hurd_types.defs) + yes it's there + \o/ + i mean, i didn't forget to add it there + for now it's a struct[2] of int64 + but we're not completely sure of that + currently i'm teaching the hurd how to use timeouts + cool + which basically involves adding a time_data_t *timeout parameter + to many functions + and replacing hurd_condition_wait with hurd_condition_timedwait + and making sure a timeout isn't an error on the return path + * pinotree has a simplier idea for time_data_t: add a file_utimesns to + fs.defs + hmm, some functions have a nonblocking parameter + i'm not sure if it's better to replace them with the timeout, or add the timeout parameter + considering the functions involved may return EWOULDBLOCK + for now i'll add a timeout parameter, so that the code requires as little modification as possible + tell me your opinion on that please + braunr: what functions? + connq_listen in pflocal for example + braunr: I don't really understand what you are talking about :-( + some servers implement select this way : + 1/ call a function in non-blocking mode, if it indicates data is available, return immediately + 2/ call the same function, in blocking mode + normally, with the new timeout parameter, non-blocking could be passed in the timeout parameter (with a timeout of 0) + operating in non-blocking mode, i mean + antrik: is it clear now ? :) + i wonder how the hurd managed to grow so much code without a cond_timedwait function :/ + i think i have finished my io_select_timeout patch on the hurd side + :) + a small step for the hurd, but a big one against vim latencies !! + (which is the true reason i'm working on this haha) + new hurd rbraun/io_select_timeout branch for those interested + hm, my changes clashes hard with the debian pflocal patch by neal :/ + clash* + braunr: replace I'd say. no need to introduce redundancy; and code changes not affecting interfaces are cheap + (in general, I'm always in favour of refactoring) + antrik: replace what ? + braunr: wow, didn't think moving the timeouts to server would be such a quick task :-) + antrik: :) + 16:57 < braunr> hmm, some functions have a nonblocking parameter + 16:58 < braunr> i'm not sure if it's better to replace them with the timeout, or add the timeout parameter + antrik: ah about that, ok + + +## IRC, freenode, #hurd, 2012-07-26 + + braunr: wrt your select_timeout branch, why not push only the + time_data stuff to master? + pinotree: we didn't agree on that yet + + ah better, with the correct ordering of io routines, my hurd boots + :) + and works too? :p + so far yes + i've spotted some issues in libpipe but nothing major + i "only" have to adjust the client side select implementation now + + +## IRC, freenode, #hurd, 2012-07-27 + + io_select should remain a routine (i.e. synchronous) for server + side stub code + but should be asynchronous (send only) for client side stub code + (since _hurs_select manually handles replies through a port set) + + +## IRC, freenode, #hurd, 2012-07-28 + + why are there both REPLY_PORTS and IO_SELECT_REPLY_PORT macros in + the hurd .. + and for the select call only :( + and doing the exact same thing unless i'm mistaken + the reply port is required for select anyway .. + i just want to squeeze them into a new IO_SELECT_SERVER macro + i don't think i can maintain the use the existing io_select call + as it is + grr, the io_request/io_reply files aren't synced with the io.defs + file + calls like io_sigio_request seem totally unused + yeah, that's a major shortcoming of MIG -- we shouldn't need to + have separate request/reply defs + they're not even used :/ + i did something a bit ugly but it seems to do what i wanted + + +## IRC, freenode, #hurd, 2012-07-29 + + good, i have a working client-side select + now i need to fix the servers a bit :x + arg, my test cases work, but vim doesn't :(( + i hate select :p + ah good, my problems are caused by a deadlock because of my glibc + changes + ah yes, found my locking problem + building my final libc now + * braunr crosses fingers + (the deadlock issue was of course a one liner) + grr deadlocks again + grmbl, my deadlock is in pfinet :/ + my select_timeout code makes servers deadlock on the libports + global lock :/ + wtf.. + youpi: it may be related to the failed asserttion + deadlocking on mutex_unlock oO + grr + actually, mutex_unlock sends a message to notify other threads + that the lock is ready + and that's what is blocking .. + i'm not sure it's a fundamental problem here + it may simply be a corruption + i have several (but not that many) threads blocked in mutex_unlock + and one blocked in mutex_lcok + i fail to see how my changes can create such a behaviour + the weird thing is that i can't reproduce this with my test cases + :/ + only vim makes things crazy + and i suppose it's related to the terminal + (don't terminals relay select requests ?) + when starting vim through ssh, pfinet deadlocks, and when starting + it on the mach console, the console term deadlocks + no help/hints when started with rpctrace? + i only get assertions with rpctrace + it's completely unusable for me + gdb tells vim is indeed blocked in a select request + and i can't see any in the remote servers :/ + this is so weird .. + when using vim with the unmodified c library, i clearly see the + select call, and everything works fine .... + 2e27: a1 c4 d2 b7 f7 mov 0xf7b7d2c4,%eax + 2e2c: 62 (bad) + 2e2d: f6 47 b6 69 testb $0x69,-0x4a(%edi) + what's the "bad" line ?? + ew, i think i understand my problem now + the timeout makes blocking threads wake prematurely + but on an mutex unlock, or a condition signal/broadcast, a message + is still sent, as it is expected a thread is still waiting + but the receiving thread, having returned sooner than expected + from mach_msg, doesn't dequeue the message + as vim does a lot of non blocking selects, this fills the message + queue ... + + +## IRC, freenode, #hurd, 2012-07-30 + + hm nice, the problem i have with my hurd_condition_timedwait seems + to also exist in libpthread + +[[!taglink open_issue_libpthread]]. + + although at a lesser degree (the implementation already correctly + removes a thread that timed out from a condition queue, and there is a + nice FIXME comment asking what to do with any stale wakeup message) + and the only solution i can think of for now is to drain the + message queue + ah yes, i know have vim running with my io_select_timeout code :> + but hum + eating all cpu + ah nice, an infinite loop in _hurd_critical_section_unlock + grmbl + braunr: But not this one? + http://www.gnu.org/software/hurd/open_issues/fork_deadlock.html + it looks similar, yes + let me try again to compare in detail + pretty much the same yes + there is only one difference but i really don't think it matters + (#3 _hurd_sigstate_lock (ss=0x2dff718) at hurdsig.c:173 + instead of + #3 _hurd_sigstate_lock (ss=0x1235008) at hurdsig.c:172) + ok so we need to review jeremie's work + tschwinge: thanks for pointing me at this + the good thing with my patch is that i can reproduce in a few + seconds + consistently + braunr: You're welcome. Great -- a reproducer! + You might also build a glibc without his patches as a + cross-test to see the issues goes away? + right + i hope they're easy to find :) + Hmm, have you already done changes to glibc? Otherwise you + might also simply use a Debian package from before? + yes i have local changes to _hurd_select + OK, too bad. + braunr: debian/patches/hurd-i386/tg-hurdsig-*, I think. + ok + hmmmmm + it may be related to my last patch on the select_timeout branch + (i mean, this may be caused by what i mentioned earlier this + morning) + damn i can't build glibc without the signal disposition patches :( + libpthread_sigmask.diff depends on it + tschwinge: doesn't libpthread (as implemented in the debian glibc + patches) depend on global signal dispositions ? + i think i'll use an older glibc for now + but hmm which one .. + oh whatever, let's fix the deadlock, it's simpler + and more productive anyway + braunr: May be that you need to revert some libpthread patch, + too. Or even take out the libpthread build completely (you don't need it + for you current work, I think). + braunr: Or, of course, you locate the deadlock. :-) + hum, now why would __io_select_timeout return + EMACH_SEND_INVALID_DEST :( + the current glibc code just transparently reports any such error + as a false positive oO + hm nice, segfault through recursion + "task foo destroying an invalid port bar" everywhere :(( + i still have problems at the server side .. + ok i think i have a solution for the "synchronization problem" + (by this name, i refer to the way mutex and condition variables + are implemented" + (the problem being that, when a thread unblocks early, because of + a timeout, another may still send a message to attempt it, which may fill + up the message queue and make the sender block, causing a deadlock) + s/attempt/attempt to wake/ + Attempts to wake a dead thread? + no + attempt to wake an already active thread + which won't dequeue the message because it's doing something else + bddebian: i'm mentioning this because the problem potentially also + exists in libpthread + +[[!taglink open_issue_libpthread]]. + + since the underlying algorithms are exactly the same + (fortunately the time-out versions are not often used) + for now :) + for reference, my idea is to make the wake call truely non + blocking, by setting a timeout of 0 + i also limit the message queue size to 1, to limit the amount of + spurious wakeups + i'll be able to test that in 30 mins or so + hum + how can mach_msg block with a timeout of 0 ?? + never mind :p + unfortunately, my idea alone isn't enough + for those interested in the problem, i've updated the analysis in + my last commit + (http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?h=rbraun/select_timeout&id=40fe717ba9093c0c893d9ea44673e46a6f9e0c7d) + + +## IRC, freenode, #hurd, 2012-08-01 + + damn, i can't manage to make threads calling condition_wait to + dequeue themselves from the condition queue :( + (instead of the one sending the signal/broadcast) + my changes on cthreads introduce 2 intrusive changes + the first is that the wakeup port is limited to 1 port, and the + wakeup operation is totally non blocking + which is something we should probably add in any case + the second is that condition_wait dequeues itself after blocking, + instead of condition_signal/broadcast + and this second change seems to introduce deadlocks, for reasons + completely unknown to me :(( + limited to 1 message* + if anyone has an idea about why it is bad for a thread to remove + itself from a condition/mutex queue, i'm all ears + i'm hitting a wall :( + antrik: if you have some motivation, can you review this please ? + http://www.sceen.net/~rbraun/0001-Rework-condition-signal-broadcast.patch + with this patch, i get threads blocked in condition_wait, + apparently waiting for a wakeup that never comes (or was already + consumed) + and i don't understand why : + :( + braunr: The condition never happens? + bddebian: it works without the patch, so i guess that's not the + problem + bddebian: hm, you could be right actually :p + braunr: About what? :) + 17:50 < bddebian> braunr: The condition never happens? + although i doubt it again + this problem is getting very very frustrating + :( + it frightens me because i don't see any flaw in the logic :( + + +## IRC, freenode, #hurd, 2012-08-02 + + ah, seems i found a reliable workaround to my deadlock issue, and + more than a workaround, it should increase efficiency by reducing + messaging + * braunr happy + congrats :) + the downside is that we may have a problem with non blocking send + calls :/ + which are used for signals + i mean, this could be a mach bug + let's try running a complete hurd with the change + arg, the boot doesn't complete with the patch .. :( + grmbl, by changing only a few bits in crtheads, the boot process + freezes in an infinite loop in somethign started after auth + (/etc/hurd/runsystem i assume) + + +## IRC, freenode, #hurd, 2012-08-03 + + glibc actually makes some direct use of cthreads condition + variables + and my patch seems to work with servers in an already working + hurd, but don't allow it to boot + and the hang happens on bash, the first thing that doesn't come + from the hurd package + (i mean, during the boot sequence) + which means we can't change cthreads headers (as some primitives + are macros) + *sigh* + the thing is, i can't fix select until i have a + condition_timedwait primitive + and i can't add this primitive until either 1/ cthreads are fixed + not to allow the inlining of its primitives, or 2/ the switch to pthreads + is done + which might take a loong time :p + i'll have to rebuild a whole libc package with a fixed cthreads + version + let's do this + pinotree: i see two __condition_wait calls in glibc, how is the + double underscore handled ? + where do you see it? + sysdeps/mach/hurd/setpgid.c and sysdeps/mach/hurd/setsid.c + i wonder if it's even used + looks like we use posix/setsid.c now + #ifdef noteven + ? + the two __condition_wait calls you pointed out are in such + preprocessor block + s + but what does it mean ? + no idea + ok + these two files should be definitely be used, they are found + earlier in the vpath + hum, posix/setsid.c is a nop stub + i don't see anything defining "noteven" in glibc itself nor in + hurd + :( + yes, most of the stuff in posix/, misc/, signal/, time/ are + ENOSYS stubs, to be reimplemented in a sysdep + hm, i may have made a small mistake in cthreads itself actually + right + when i try to debug using a subhurd, gdb tells me the blocked + process is spinning in ld .. + i mean ld.so + and i can't see any debugging symbol + some progress, it hangs at process_envvars + eh + i've partially traced my problem + when a "normal" program starts, libc creates the signal thread + early + the main thread waits for the creation of this thread by polling + its address + (i.e. while (signal_thread == 0); ) + for some reason, it is stuck in this loop + cthread creation being actually governed by + condition_wait/broadcast, it makes some sense + braunr: When you say the "main" thread, do you mean the main + thread of the program? + bddebian: yes + i think i've determined my mistake + glibc has its own variants of the mutex primitives + and i changed one :/ + Ah + it's good news for me :) + hum no, that's not exactly what i described + glibc has some stubs, but it's not the problem, the problem is + that mutex_lock/unlock are macros, and i changed one of them + so everything that used that macro inside glibc wasn't changed + yes! + my patched hurd now boots :) + * braunr relieved + this experience at least taught me that it's not possible to + easily change the singly linked queues of thread (waiting for a mutex or + a condition variable) :( + for now, i'm using a linear search from the start + so, not only does this patched hurd boot, but i was able to use + aptitude, git, build a whole hurd, copy the whole thing, and remove + everything, and it still runs fine (whereas usually it would fail very + early) + * braunr happy + and vim works fine now? + err, wait + this patch does only one thing + it alters the way condition_signal/broadcast and + {hurd_,}condition_wait operate + currently, condition_signal/broadcast dequeues threads from a + condition queue and wake them + my patch makes these functions only wake the target threads + which dequeue themselves + (a necessary requirement to allow clean timeout handling) + the next step is to fix my hurd_condition_wait patch + and reapply the whole hurd patch indotrucing io_select_timeout + introducing* + then i'll be able to tell you + one side effect of my current changes is that the linear search + required when a thread dequeues itself is ugly + so it'll be an additional reason to help the pthreads porting + effort + (pthreads have the same sort of issues wrt to timeout handling, + but threads are a doubly-linked lists, making it way easier to adjust) + +on + damn i'm happy + 3 days on this stupid bug + (which is actually responsible for what i initially feared to be a + mach bug on non blocking sends) + (and because of that, i worked on the code to make it sure that 1/ + waking is truely non blocking and 2/ only one message is required for + wakeups + ) + a simple flag is tested instead of sending in a non blocking way + :) + these improvments should be ported to pthreads some day + +[[!taglink open_issue_libpthread]] + + ahah ! + view is now FAST ! + braunr: what do you mean by 'view'? + mel-: i mean the read-only version of vim + aah + i still have a few port leaks to fix + and some polishing + but basically, the non-blocking select issue seems fixed + and with some luck, we should get unexpected speedups here and + there + so vim was considerable slow on the Hurd before? didn't know that. + not exactly + at first, it wasn't, but the non blocking select/poll calls + misbehaved + so a patch was introduced to make these block at least 1 ms + then vim became slow, because it does a lot of non blocking select + so another patch was introduced, not to set the 1ms timeout for a + few programs + youpi: darnassus is already running the patched hurd, which shows + (as expected) that it can safely be used with an older libc + i.e. servers with the additional io_select? + yes + k + good :) + and the modified cthreads + which is the most intrusive change + port leaks fixed + braunr: Congrats:-D + thanks + it's not over yet :p + tests, reviews, more tests, polishing, commits, packaging + + +## IRC, freenode, #hurd, 2012-08-04 + + grmbl, apt-get fails on select in my subhurd with the updated + glibc + otherwise it boots and runs fine + fixed :) + grmbl, there is a deadlock in pfinet with my patch + deadlock fixed + the sigstate and the condition locks must be taken at the same + time, for some obscure reason explained in the cthreads code + but when a thread awakes and dequeues itself from the condition + queue, it only took the condition lock + i noted in my todo list that this could create problems, but + wanted to leave it as it is to really see it happen + well, i saw :) + the last commit of my hurd branch includes the 3 line fix + these fixes will be required for libpthreads + (pthread_mutex_timedlock and pthread_cond_timedwait) some day + after the select bug is fixed, i'll probably work on that with you + and thomas d + + +## IRC, freenode, #hurd, 2012-08-05 + + eh, i made dpkg-buildpackage use the patched c library, and it + finished the build oO + braunr: :) + faked-tcp was blocked in a select call :/ + (with the old libc i mean) + with mine i just worked at the first attempt + i'm not sure what it means + it could mean that the patched hurd servers are not completely + compatible with the current libc, for some weird corner cases + the slowness of faked-tcp is apparently inherent to its + implementation + all right, let's put all these packages online + eh, right when i upload them, i get a deadlock + this one seems specific to pfinet + only one deadlock so far, and the libc wasn't in sync with the + hurd + :/ + damn, another deadlock as soon as i send a mail on bug-hurd :( + grr + thou shall not email + aptitude seems to be a heavy user of select + oh, it may be due to my script regularly chaning the system time + or it may not be a deadlock, but simply the linear queue getting + extremely large + + +## IRC, freenode, #hurd, 2012-08-06 + + i have bad news :( it seems there can be memory corruptions with + my io_select patch + i've just seen an auth server (!) spinning on a condition lock + (the internal spin lock), probably because the condition was corrupted .. + i guess it's simply because conditions embedded in dynamically + allocated structures can be freed while there are still threads waiting + ... + so, yes the solution to my problem is simply to dequeue threads + from both the waker when there is one, and the waiter when no wakeup + message was received + simple + it's so obvious i wonder how i didn't think of it earlier :(- + braunr: an elegant solution always seems obvious afterwards... ;-) + antrik: let's hope this time, it's completely right + good, my latest hurd packages seem fixed finally + looks like i got another deadlock + * braunr hangs himselg + that, or again, condition queues can get very large (e.g. on + thread storms) + looks like this is the case yes + after some time the system recovered :( + which means a doubly linked list is required to avoid pathological + behaviours + arg + it won't be easy at all to add a doubly linked list to condition + variables :( + actually, just a bit messy + youpi: other than this linear search on dequeue, darnassus has + been working fine so far + k + Mmm, you'd need to bump the abi soname if changing the condition + structure layout + :( + youpi: how are we going to solve that ? + well, either bump soname, or finish transition to libpthread :) + it looks better to work on pthread now + to avoid too many abi changes + +[[libpthread]]. + + # See Also See also [[select_bogus_fd]] and [[select_vs_signals]]. diff --git a/open_issues/strict_aliasing.mdwn b/open_issues/strict_aliasing.mdwn index 01019372..b7d39805 100644 --- a/open_issues/strict_aliasing.mdwn +++ b/open_issues/strict_aliasing.mdwn @@ -19,3 +19,13 @@ License|/fdl]]."]]"""]] instead? pinotree: if we can rely on gcc for the warnings, yes but i suspect there might be other silent issues in very old code + + +# IRC, freenode, #hurd, 2012-07-12 + + btw, i'm building glibc right now, and i can see a few strict + aliasing warnings + fixing them will allow us to avoid wasting time on very obscure + issues (if gcc catches them all) + The strict aliasing things should be fixed, yes. Some might be + from MIG. diff --git a/open_issues/synchronous_ipc.mdwn b/open_issues/synchronous_ipc.mdwn new file mode 100644 index 00000000..57bcdda7 --- /dev/null +++ b/open_issues/synchronous_ipc.mdwn @@ -0,0 +1,64 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + + +# IRC, freenode, #hurd, 2012-07-20 + +From [[Genode RPC|microkernel/genode/rpc]]. + + assuming synchronous ipc is the way to go (it seems so), there is + still the need for some async ipc (e.g signalling untrusted recipients + without risking blocking on them) + 1/ do you agree on that and 2/ how would this low-overhead async + ipc be done ? (and 3/ are there relevant examples ? + if you think about this stuff too much you will end up like marcus + and neal ;-) + antrik: likely :) + the truth is that there are various possible designs all with + their own tradeoffs, and nobody can really tell which one is better + the only sensible one i found is qnx :/ + but it's still messy + they have what they call pulses, with a strictly defined format + so it's actually fine because it guarantees low overhead, and can + easily be queued + but i'm not sure about the format + I must say that Neal's half-sync approach in Viengoos still sounds + most promising to me. it's actually modelled after the needs of a + Hurd-like system; and he thought about it a lot... + damn i forgot to reread that + stupid me + note that you can't come up with a design that allows both a) + delivering reliably and b) never blocking the sender -- unless you cache + in the kernel, which we don't want + but I don't think it's really necessary to fulfill both of these + requirements + it's up to the receiver to make sure it gets important signals + right + caching in the kernel is ok as long as the limit allows the + receiver to handle its signals + in the Viengoos approach, the receiver can allocate a number of + receive buffers; so it's even possible to do some queuing if desired + ah great, limits in the form of resources lent by the receiver + one thing i really don't like in mach is the behaviour on full + message queues + blocking :/ + i bet the libpager deadlock is due to that + +[[libpager_deadlock]]. + + it simply means async ipc doesn't prevent at all from deadlocks + the sender can set a timeout. blocking only happens when setting + it to infinite... + which is commonly the case + well, if you see places where blocking is done but failing would + be more appropriate, try changing them I'd say... + it's not that easy :/ diff --git a/open_issues/usleep.mdwn b/open_issues/usleep.mdwn new file mode 100644 index 00000000..b71cd902 --- /dev/null +++ b/open_issues/usleep.mdwn @@ -0,0 +1,25 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc]] + +# IRC, OFTC, #debian-hurd, 2012-07-14 + + eeek, usleep has the issues which i fixed in nanosleep + pinotree: ? + * pinotree ponders a `mv sysdeps/unix/sysv/linux/usleep.c + sysdeps/mach/usleep.c` + s/mv/cp/ + What the heck is the point of usleep(0) anyway? Isn't that + basically saying suspend for 0 milliseconds? + it's rounded up by the kernel I guess + i.e. suspend for the shortest time possible (a clock tick) + posix 2001 says that «If the value of useconds is 0, then the + call has no effect.» diff --git a/open_issues/virtualbox.mdwn b/open_issues/virtualbox.mdwn index 9440284f..d0608b4a 100644 --- a/open_issues/virtualbox.mdwn +++ b/open_issues/virtualbox.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -8,11 +8,15 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] +[[!toc]] + + +# Running GNU Mach in VirtualBox crashes during initialization. + [[!tag open_issue_gnumach]] -Running GNU Mach in VirtualBox crashes during initialization. -IRC, freenode, #hurd, 2011-08-15 +## IRC, freenode, #hurd, 2011-08-15 HowTo Reproduce: 1) Use `reboot` to reboot the system. 2) Once you see the Grub menu, turn off the debian hurd box. 3) Let the box boot @@ -97,3 +101,37 @@ IRC, freenode, #hurd, 2011-08-15 what's interesting is that that one means that $USER_DS did load in %es fine at least once and it's the reload that fails + + +# Slow SCSI probing + +[[!tag open_issue_gnumach]] + + +## IRC, freenode, #hurd, 2012-08-07 + + youpi: it seems the slow boot on virtualbox is really because of + scsi (it spends a long time in scsi_init, probing for all the drivers) + braunr: we know that + isn't it in the io port probe printed at boot? + iirc that was that + the discussion i found was about eata + not the whole scsi group + there used to be another in eata, yas + oh + i must have missed the first discussion then + I mean + the eata is the first + ok + and scsi was mentioned later + just nobody took the time to track it down + ok + so it's not just a matter of disabling a single driver :( + braunr: I still believe it's a matter of disableing a single driver + I don't see why scsi in general should take a lot of time + youpi: it doesn't on qemu, it may simply be virtualbox's fault + it is, yes + and virtualbox people say it's hurd's fault, of course + both are possible + but we can't expect them to fix it :) + that's what I mean diff --git a/open_issues/wait_errors.mdwn b/open_issues/wait_errors.mdwn new file mode 100644 index 00000000..855b9add --- /dev/null +++ b/open_issues/wait_errors.mdwn @@ -0,0 +1,25 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc open_issue_hurd]] + +# IRC, freenode, #hurd, 2012-07-12 + + tschwinge: have you encountered wait() errors ? + What kind of wait errors? + when running htop or watch vmstat, other apparently unrelated + processes calling wait() sometimes fail with an error + i saw it mostly during builds, as they spawn lots of children + (and used the aforementioned commands to monitor the builds) + Sounds nasty... No, don't remember seeing that. But I don't + typiclly invoke such commands during builds. + So this wait thing suggests there's something going wrong in + the proc server? + tschwinge: yes -- cgit v1.2.3