aboutsummaryrefslogtreecommitdiff
path: root/open_issues/select.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'open_issues/select.mdwn')
-rw-r--r--open_issues/select.mdwn236
1 files changed, 236 insertions, 0 deletions
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn
index 6bed94ca..778af530 100644
--- a/open_issues/select.mdwn
+++ b/open_issues/select.mdwn
@@ -1395,6 +1395,242 @@ IRC, unknown channel, unknown date:
[[libpthread]].
+## IRC, freenode, #hurd, 2012-08-07
+
+ <rbraun_hurd> anyone knows of applications extensively using non-blocking
+ networking functions ?
+ <rbraun_hurd> (well, networking functions in a non-blocking way)
+ <antrik> rbraun_hurd: X perhaps?
+ <antrik> it's single-threaded, so I guess it must be pretty async ;-)
+ <antrik> thinking about it, perhaps it's the reason it works so poorly on
+ Hurd...
+ <braunr> it does ?
+ <rbraun_hurd> ah maybe at the client side, right
+ <rbraun_hurd> hm no, the client side is synchronous
+ <rbraun_hurd> oh by the way, i can use gitk on darnassys
+ <rbraun_hurd> i wonder if it's because of the select fix
+ <tschwinge> rbraun_hurd: If you want, you could also have a look if there's
+ any improvement for these:
+ http://www.gnu.org/software/hurd/open_issues/select.html (elinks),
+ http://www.gnu.org/software/hurd/open_issues/dbus.html,
+ http://www.gnu.org/software/hurd/open_issues/runit.html
+ <tschwinge> rbraun_hurd: And congratulations, again! :-)
+ <rbraun_hurd> tschwinge: too bad it can't be merged before the pthread port
+ :(
+ <antrik> rbraun_hurd: I was talking about server. most clients are probably
+ sync.
+ <rbraun_hurd> antrik: i guessed :)
+ <antrik> (thought certainly not all... multithreaded clients are not really
+ supported with xlib IIRC)
+ <rbraun_hurd> but i didn't have much trouble with X
+ <antrik> tried something pushing a lot of data? like, say, glxgears? :-)
+ <rbraun_hurd> why not
+ <rbraun_hurd> the problem with tests involving "a lot of data" is that it
+ can easily degenerate into a livelock
+ <antrik> yeah, sounds about right
+ <rbraun_hurd> (with the current patch i mean)
+ <antrik> the symptoms I got were general jerkiness, with occasional long
+ hangs
+ <rbraun_hurd> that applies to about everything on the hurd
+ <rbraun_hurd> so it didn't alarm me
+ <antrik> another interesting testcase is freeciv-gtk... it reporducibly
+ caused a thread explosion after idling for some time -- though I don't
+ remember the details; and never managed to come up with a way to track
+ down how this happens...
+ <rbraun_hurd> dbus is more worthwhile
+ <rbraun_hurd> pinotree: hwo do i test that ?
+ <pinotree> eh?
+ <rbraun_hurd> pinotree: you once mentioned dbus had trouble with non
+ blocking selects
+ <pinotree> it does a poll() with a 0s timeout
+ <rbraun_hurd> that's the non blocking select part, yes
+ <pinotree> you'll need also fixes for the socket credentials though,
+ otherwise it won't work ootb
+ <rbraun_hurd> right but, isn't it already used somehow ?
+ <antrik> rbraun_hurd: uhm... none of the non-X applications I use expose a
+ visible jerkiness/long hangs pattern... though that may well be a result
+ of general load patterns rather than X I guess
+ <rbraun_hurd> antrik: that's my feeling
+ <rbraun_hurd> antrik: heavy communication channels, unoptimal scheduling,
+ lack of scalability, they're clearly responsible for the generally
+ perceived "jerkiness" of the system
+ <antrik> again, I can't say I observe "general jerkiness". apart from slow
+ I/O the system behaves rather normally for the things I do
+ <antrik> I'm pretty sure the X jerkiness *is* caused by the socket
+ communication
+ <antrik> which of course might be a scheduling issue
+ <antrik> but it seems perfectly possible that it *is* related to the select
+ implementation
+ <antrik> at least worth a try I'd say
+ <rbraun_hurd> sure
+ <rbraun_hurd> there is still some work to do on it though
+ <rbraun_hurd> the client side changes i did could be optimized a bit more
+ <rbraun_hurd> (but i'm afraid it would lead to ugly things like 2 timeout
+ parameters in the io_select_timeout call, one for the client side, the
+ other for the servers, eh)
+
+
+## IRC, freenode, #hurd, 2012-08-07
+
+ <braunr> when running gitk on [darnassus], yesterday, i could push the CPU
+ to 100% by simply moving the mouse in the window :p
+ <braunr> (but it may also be caused by the select fix)
+ <antrik> braunr: that cursor might be "normal"
+ <rbraunrh> antrik: what do you mean ?
+ <antrik> the 100% CPU
+ <rbraunh> antrik: yes i got that, but what would make it normal ?
+ <rbraunh> antrik: right i get similar behaviour on linux actually
+ <rbraunh> (not 100% because two threads are spread on different cores, but
+ their cpu usage add up to 100%)
+ <rbraunh> antrik: so you think as long as there are events to process, the
+ x client is running
+ <rbraunh> thath would mean latencies are small enough to allow that, which
+ is actually a very good thing
+ <antrik> hehe... sound kinda funny :-)
+ <rbraunh> this linear search on dequeue is a real pain :/
+
+
+## IRC, freenode, #hurd, 2012-08-09
+
+`screen` doesn't close a window/hangs after exiting the shell.
+
+ <rbraunh> the screen issue seems linked to select :p
+ <rbraunh> tschwinge: the term server may not correctly implement it
+ <rbraunh> tschwinge: the problem looks related to the term consoles not
+ dying
+ <rbraunh> http://www.gnu.org/software/hurd/open_issues/term_blocking.html
+
+[[Term_blocking]].
+
+
+# IRC, freenode, #hurd, 2012-12-05
+
+ <braunr> well if i'm unable to build my own packages, i'll send you the one
+ line patch i wrote that fixes select/poll for the case where there is
+ only one descriptor
+ <braunr> (the current code calls mach_msg twice, each time with the same
+ timeout, doubling the total wait time when there is no event)
+
+
+## IRC, freenode, #hurd, 2012-12-06
+
+ <braunr> damn, my eglibc patch breaks select :x
+ <braunr> i guess i'll just simplify the code by using the same path for
+ both single fd and multiple fd calls
+ <braunr> at least, the patch does fix the case i wanted it to .. :)
+ <braunr> htop and ping act at the right regular interval
+ <braunr> my select patch is :
+ <braunr> /* Now wait for reply messages. */
+ <braunr> - if (!err && got == 0)
+ <braunr> + if (!err && got == 0 && firstfd != -1 && firstfd != lastfd)
+ <braunr> basically, when there is a single fd, the code calls io_select
+ with a timeout
+ <braunr> and later calls mach_msg with the same timeout
+ <braunr> effectively making the maximum wait time twice what it should be
+ <pinotree> ouch
+ <braunr> which is why htop and ping are "laggy"
+ <braunr> and perhaps also why fakeroot is when building libc
+ <braunr> well
+ <braunr> when building packages
+ <braunr> my patch avoids entering the mach_msg call if there is only one fd
+ <braunr> (my failed attempt didn't have the firstfd != -1 check, leading to
+ the 0 fd case skipping mach_msg too, which is wrong since in that case
+ there is just no wait, making applications use select/poll for sleeping
+ consume all cpu)
+
+ <braunr> the second is a fix in select (yet another) for the case where a
+ single fd is passed
+ <braunr> in which case there is one timeout directly passed in the
+ io_select call, but then yet another in the mach_msg call that waits for
+ replies
+ <braunr> this can account for the slowness of a bunch of select/poll users
+
+
+## IRC, freenode, #hurd, 2012-12-07
+
+ <braunr> finally, my select patch works :)
+
+
+## IRC, freenode, #hurd, 2012-12-08
+
+ <braunr> for those interested, i pushed my eglibc packages that include
+ this little select/poll timeout fix on my debian repository
+ <braunr> deb http://ftp.sceen.net/debian-hurd experimental/
+ <braunr> reports are welcome, i'm especially interested in potential
+ regressions
+
+
+## IRC, freenode, #hurd, 2012-12-10
+
+ <gnu_srs> I have verified your double timeout bug in hurdselect.c.
+ <gnu_srs> Since I'm also working on hurdselect I have a few questions
+ about where the timeouts in mach_msg and io_select are implemented.
+ <gnu_srs> Have a big problem to trace them down to actual code: mig magic
+ again?
+ <braunr> yes
+ <braunr> see hurd/io.defs, io_select includes a waittime timeout:
+ natural_t; parameter
+ <braunr> waittime is mig magic that tells the client side not to wait more
+ than the timeout
+ <braunr> and in _hurd_select, you can see these lines :
+ <braunr> err = __io_select (d[i].io_port, d[i].reply_port,
+ <braunr> /* Poll only if there's a single
+ descriptor. */
+ <braunr> (firstfd == lastfd) ? to : 0,
+ <braunr> to being the timeout previously computed
+ <braunr> "to"
+ <braunr> and later, when waiting for replies :
+ <braunr> while ((msgerr = __mach_msg (&msg.head,
+ <braunr> MACH_RCV_MSG | options,
+ <braunr> 0, sizeof msg, portset, to,
+ <braunr> MACH_PORT_NULL)) ==
+ MACH_MSG_SUCCESS)
+ <braunr> the same timeout is used
+ <braunr> hope it helps
+ <gnu_srs> Additional stuff on io-select question is at
+ http://paste.debian.net/215401/
+ <gnu_srs> Sorry, should have posted it before you comment, but was
+ disturbed.
+ <braunr> 14:13 < braunr> waittime is mig magic that tells the client side
+ not to wait more than the timeout
+ <braunr> the waittime argument is a client argument only
+ <braunr> that's one of the main source of problems with select/poll, and
+ the one i fixed 6 months ago
+ <gnu_srs> so there is no relation to the third argument of the client call
+ and the third argument of the server code?
+ <braunr> no
+ <braunr> the 3rd argument at server side is undoubtedly the 4th at client
+ side here
+ <gnu_srs> but for the fourth argument there is?
+ <braunr> i think i've just answered that
+ <braunr> when in doubt, check the code generated by mig when building glibc
+ <gnu_srs> as I said before, I have verified the timeout bug you solved.
+ <gnu_srs> which code to look for RPC_*?
+ <braunr> should be easy to guess
+ <gnu_srs> is it the same with mach_msg()? No explicit usage of the timeout
+ there either.
+ <gnu_srs> in the code for the function I mean.
+ <braunr> gnu_srs: mach_msg is a low level system call
+ <braunr> see
+ http://www.gnu.org/software/hurd/gnumach-doc/Mach-Message-Call.html#Mach-Message-Call
+ <gnu_srs> found the definition of __io_select in: RPC_io_select.c, thanks.
+ <gnu_srs> so the client code to look for wrt RPC_ is in hurd/*.defs? what
+ about the gnumach/*/include/*.defs?
+ <gnu_srs> a final question: why use a timeout if there is a single FD for
+ the __io_select call, not when there are more than one?
+ <braunr> well, the code is obviously buggy, so don't expect me to justify
+ wrong code
+ <braunr> but i suppose the idea was : if there is only one fd, perform a
+ classical synchronous RPC, whereas if there are more use a heavyweight
+ portset and additional code to receive replies
+
+ <youpi> exim4 didn't get fixed by the libc patch, unfortunately
+ <braunr> yes i noticed
+ <braunr> gdb can't attach correctly to exim, so it's probably something
+ completely different
+ <braunr> i'll try the non intrusive mode
+
+
# See Also
See also [[select_bogus_fd]] and [[select_vs_signals]].