diff options
Diffstat (limited to 'open_issues/select.mdwn')
-rw-r--r-- | open_issues/select.mdwn | 236 |
1 files changed, 236 insertions, 0 deletions
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn index 6bed94ca..778af530 100644 --- a/open_issues/select.mdwn +++ b/open_issues/select.mdwn @@ -1395,6 +1395,242 @@ IRC, unknown channel, unknown date: [[libpthread]]. +## IRC, freenode, #hurd, 2012-08-07 + + <rbraun_hurd> anyone knows of applications extensively using non-blocking + networking functions ? + <rbraun_hurd> (well, networking functions in a non-blocking way) + <antrik> rbraun_hurd: X perhaps? + <antrik> it's single-threaded, so I guess it must be pretty async ;-) + <antrik> thinking about it, perhaps it's the reason it works so poorly on + Hurd... + <braunr> it does ? + <rbraun_hurd> ah maybe at the client side, right + <rbraun_hurd> hm no, the client side is synchronous + <rbraun_hurd> oh by the way, i can use gitk on darnassys + <rbraun_hurd> i wonder if it's because of the select fix + <tschwinge> rbraun_hurd: If you want, you could also have a look if there's + any improvement for these: + http://www.gnu.org/software/hurd/open_issues/select.html (elinks), + http://www.gnu.org/software/hurd/open_issues/dbus.html, + http://www.gnu.org/software/hurd/open_issues/runit.html + <tschwinge> rbraun_hurd: And congratulations, again! :-) + <rbraun_hurd> tschwinge: too bad it can't be merged before the pthread port + :( + <antrik> rbraun_hurd: I was talking about server. most clients are probably + sync. + <rbraun_hurd> antrik: i guessed :) + <antrik> (thought certainly not all... multithreaded clients are not really + supported with xlib IIRC) + <rbraun_hurd> but i didn't have much trouble with X + <antrik> tried something pushing a lot of data? like, say, glxgears? :-) + <rbraun_hurd> why not + <rbraun_hurd> the problem with tests involving "a lot of data" is that it + can easily degenerate into a livelock + <antrik> yeah, sounds about right + <rbraun_hurd> (with the current patch i mean) + <antrik> the symptoms I got were general jerkiness, with occasional long + hangs + <rbraun_hurd> that applies to about everything on the hurd + <rbraun_hurd> so it didn't alarm me + <antrik> another interesting testcase is freeciv-gtk... it reporducibly + caused a thread explosion after idling for some time -- though I don't + remember the details; and never managed to come up with a way to track + down how this happens... + <rbraun_hurd> dbus is more worthwhile + <rbraun_hurd> pinotree: hwo do i test that ? + <pinotree> eh? + <rbraun_hurd> pinotree: you once mentioned dbus had trouble with non + blocking selects + <pinotree> it does a poll() with a 0s timeout + <rbraun_hurd> that's the non blocking select part, yes + <pinotree> you'll need also fixes for the socket credentials though, + otherwise it won't work ootb + <rbraun_hurd> right but, isn't it already used somehow ? + <antrik> rbraun_hurd: uhm... none of the non-X applications I use expose a + visible jerkiness/long hangs pattern... though that may well be a result + of general load patterns rather than X I guess + <rbraun_hurd> antrik: that's my feeling + <rbraun_hurd> antrik: heavy communication channels, unoptimal scheduling, + lack of scalability, they're clearly responsible for the generally + perceived "jerkiness" of the system + <antrik> again, I can't say I observe "general jerkiness". apart from slow + I/O the system behaves rather normally for the things I do + <antrik> I'm pretty sure the X jerkiness *is* caused by the socket + communication + <antrik> which of course might be a scheduling issue + <antrik> but it seems perfectly possible that it *is* related to the select + implementation + <antrik> at least worth a try I'd say + <rbraun_hurd> sure + <rbraun_hurd> there is still some work to do on it though + <rbraun_hurd> the client side changes i did could be optimized a bit more + <rbraun_hurd> (but i'm afraid it would lead to ugly things like 2 timeout + parameters in the io_select_timeout call, one for the client side, the + other for the servers, eh) + + +## IRC, freenode, #hurd, 2012-08-07 + + <braunr> when running gitk on [darnassus], yesterday, i could push the CPU + to 100% by simply moving the mouse in the window :p + <braunr> (but it may also be caused by the select fix) + <antrik> braunr: that cursor might be "normal" + <rbraunrh> antrik: what do you mean ? + <antrik> the 100% CPU + <rbraunh> antrik: yes i got that, but what would make it normal ? + <rbraunh> antrik: right i get similar behaviour on linux actually + <rbraunh> (not 100% because two threads are spread on different cores, but + their cpu usage add up to 100%) + <rbraunh> antrik: so you think as long as there are events to process, the + x client is running + <rbraunh> thath would mean latencies are small enough to allow that, which + is actually a very good thing + <antrik> hehe... sound kinda funny :-) + <rbraunh> this linear search on dequeue is a real pain :/ + + +## IRC, freenode, #hurd, 2012-08-09 + +`screen` doesn't close a window/hangs after exiting the shell. + + <rbraunh> the screen issue seems linked to select :p + <rbraunh> tschwinge: the term server may not correctly implement it + <rbraunh> tschwinge: the problem looks related to the term consoles not + dying + <rbraunh> http://www.gnu.org/software/hurd/open_issues/term_blocking.html + +[[Term_blocking]]. + + +# IRC, freenode, #hurd, 2012-12-05 + + <braunr> well if i'm unable to build my own packages, i'll send you the one + line patch i wrote that fixes select/poll for the case where there is + only one descriptor + <braunr> (the current code calls mach_msg twice, each time with the same + timeout, doubling the total wait time when there is no event) + + +## IRC, freenode, #hurd, 2012-12-06 + + <braunr> damn, my eglibc patch breaks select :x + <braunr> i guess i'll just simplify the code by using the same path for + both single fd and multiple fd calls + <braunr> at least, the patch does fix the case i wanted it to .. :) + <braunr> htop and ping act at the right regular interval + <braunr> my select patch is : + <braunr> /* Now wait for reply messages. */ + <braunr> - if (!err && got == 0) + <braunr> + if (!err && got == 0 && firstfd != -1 && firstfd != lastfd) + <braunr> basically, when there is a single fd, the code calls io_select + with a timeout + <braunr> and later calls mach_msg with the same timeout + <braunr> effectively making the maximum wait time twice what it should be + <pinotree> ouch + <braunr> which is why htop and ping are "laggy" + <braunr> and perhaps also why fakeroot is when building libc + <braunr> well + <braunr> when building packages + <braunr> my patch avoids entering the mach_msg call if there is only one fd + <braunr> (my failed attempt didn't have the firstfd != -1 check, leading to + the 0 fd case skipping mach_msg too, which is wrong since in that case + there is just no wait, making applications use select/poll for sleeping + consume all cpu) + + <braunr> the second is a fix in select (yet another) for the case where a + single fd is passed + <braunr> in which case there is one timeout directly passed in the + io_select call, but then yet another in the mach_msg call that waits for + replies + <braunr> this can account for the slowness of a bunch of select/poll users + + +## IRC, freenode, #hurd, 2012-12-07 + + <braunr> finally, my select patch works :) + + +## IRC, freenode, #hurd, 2012-12-08 + + <braunr> for those interested, i pushed my eglibc packages that include + this little select/poll timeout fix on my debian repository + <braunr> deb http://ftp.sceen.net/debian-hurd experimental/ + <braunr> reports are welcome, i'm especially interested in potential + regressions + + +## IRC, freenode, #hurd, 2012-12-10 + + <gnu_srs> I have verified your double timeout bug in hurdselect.c. + <gnu_srs> Since I'm also working on hurdselect I have a few questions + about where the timeouts in mach_msg and io_select are implemented. + <gnu_srs> Have a big problem to trace them down to actual code: mig magic + again? + <braunr> yes + <braunr> see hurd/io.defs, io_select includes a waittime timeout: + natural_t; parameter + <braunr> waittime is mig magic that tells the client side not to wait more + than the timeout + <braunr> and in _hurd_select, you can see these lines : + <braunr> err = __io_select (d[i].io_port, d[i].reply_port, + <braunr> /* Poll only if there's a single + descriptor. */ + <braunr> (firstfd == lastfd) ? to : 0, + <braunr> to being the timeout previously computed + <braunr> "to" + <braunr> and later, when waiting for replies : + <braunr> while ((msgerr = __mach_msg (&msg.head, + <braunr> MACH_RCV_MSG | options, + <braunr> 0, sizeof msg, portset, to, + <braunr> MACH_PORT_NULL)) == + MACH_MSG_SUCCESS) + <braunr> the same timeout is used + <braunr> hope it helps + <gnu_srs> Additional stuff on io-select question is at + http://paste.debian.net/215401/ + <gnu_srs> Sorry, should have posted it before you comment, but was + disturbed. + <braunr> 14:13 < braunr> waittime is mig magic that tells the client side + not to wait more than the timeout + <braunr> the waittime argument is a client argument only + <braunr> that's one of the main source of problems with select/poll, and + the one i fixed 6 months ago + <gnu_srs> so there is no relation to the third argument of the client call + and the third argument of the server code? + <braunr> no + <braunr> the 3rd argument at server side is undoubtedly the 4th at client + side here + <gnu_srs> but for the fourth argument there is? + <braunr> i think i've just answered that + <braunr> when in doubt, check the code generated by mig when building glibc + <gnu_srs> as I said before, I have verified the timeout bug you solved. + <gnu_srs> which code to look for RPC_*? + <braunr> should be easy to guess + <gnu_srs> is it the same with mach_msg()? No explicit usage of the timeout + there either. + <gnu_srs> in the code for the function I mean. + <braunr> gnu_srs: mach_msg is a low level system call + <braunr> see + http://www.gnu.org/software/hurd/gnumach-doc/Mach-Message-Call.html#Mach-Message-Call + <gnu_srs> found the definition of __io_select in: RPC_io_select.c, thanks. + <gnu_srs> so the client code to look for wrt RPC_ is in hurd/*.defs? what + about the gnumach/*/include/*.defs? + <gnu_srs> a final question: why use a timeout if there is a single FD for + the __io_select call, not when there are more than one? + <braunr> well, the code is obviously buggy, so don't expect me to justify + wrong code + <braunr> but i suppose the idea was : if there is only one fd, perform a + classical synchronous RPC, whereas if there are more use a heavyweight + portset and additional code to receive replies + + <youpi> exim4 didn't get fixed by the libc patch, unfortunately + <braunr> yes i noticed + <braunr> gdb can't attach correctly to exim, so it's probably something + completely different + <braunr> i'll try the non intrusive mode + + # See Also See also [[select_bogus_fd]] and [[select_vs_signals]]. |