1 files changed, 236 insertions, 0 deletions
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn
index 6bed94ca..778af530 100644
--- a/open_issues/select.mdwn
+++ b/open_issues/select.mdwn
@@ -1395,6 +1395,242 @@ IRC, unknown channel, unknown date:
 [[libpthread]].
 
 
+## IRC, freenode, #hurd, 2012-08-07
+
+    <rbraun_hurd> anyone knows of applications extensively using non-blocking
+      networking functions ?
+    <rbraun_hurd> (well, networking functions in a non-blocking way)
+    <antrik> rbraun_hurd: X perhaps?
+    <antrik> it's single-threaded, so I guess it must be pretty async ;-)
+    <antrik> thinking about it, perhaps it's the reason it works so poorly on
+      Hurd...
+    <braunr> it does ?
+    <rbraun_hurd> ah maybe at the client side, right
+    <rbraun_hurd> hm no, the client side is synchronous
+    <rbraun_hurd> oh by the way, i can use gitk on darnassys
+    <rbraun_hurd> i wonder if it's because of the select fix
+    <tschwinge> rbraun_hurd: If you want, you could also have a look if there's
+      any improvement for these:
+      http://www.gnu.org/software/hurd/open_issues/select.html (elinks),
+      http://www.gnu.org/software/hurd/open_issues/dbus.html,
+      http://www.gnu.org/software/hurd/open_issues/runit.html
+    <tschwinge> rbraun_hurd: And congratulations, again!  :-)
+    <rbraun_hurd> tschwinge: too bad it can't be merged before the pthread port
+      :(
+    <antrik> rbraun_hurd: I was talking about server. most clients are probably
+      sync.
+    <rbraun_hurd> antrik: i guessed :)
+    <antrik> (thought certainly not all... multithreaded clients are not really
+      supported with xlib IIRC)
+    <rbraun_hurd> but i didn't have much trouble with X 
+    <antrik> tried something pushing a lot of data? like, say, glxgears? :-)
+    <rbraun_hurd> why not
+    <rbraun_hurd> the problem with tests involving "a lot of data" is that it
+      can easily degenerate into a livelock
+    <antrik> yeah, sounds about right
+    <rbraun_hurd> (with the current patch i mean)
+    <antrik> the symptoms I got were general jerkiness, with occasional long
+      hangs
+    <rbraun_hurd> that applies to about everything on the hurd
+    <rbraun_hurd> so it didn't alarm me
+    <antrik> another interesting testcase is freeciv-gtk... it reporducibly
+      caused a thread explosion after idling for some time -- though I don't
+      remember the details; and never managed to come up with a way to track
+      down how this happens...
+    <rbraun_hurd> dbus is more worthwhile
+    <rbraun_hurd> pinotree: hwo do i test that ?
+    <pinotree> eh?
+    <rbraun_hurd> pinotree: you once mentioned dbus had trouble with non
+      blocking selects
+    <pinotree> it does a poll() with a 0s timeout
+    <rbraun_hurd> that's the non blocking select part, yes
+    <pinotree> you'll need also fixes for the socket credentials though,
+      otherwise it won't work ootb
+    <rbraun_hurd> right but, isn't it already used somehow ?
+    <antrik> rbraun_hurd: uhm... none of the non-X applications I use expose a
+      visible jerkiness/long hangs pattern... though that may well be a result
+      of general load patterns rather than X I guess
+    <rbraun_hurd> antrik: that's my feeling
+    <rbraun_hurd> antrik: heavy communication channels, unoptimal scheduling,
+      lack of scalability, they're clearly responsible for the generally
+      perceived "jerkiness" of the system
+    <antrik> again, I can't say I observe "general jerkiness". apart from slow
+      I/O the system behaves rather normally for the things I do
+    <antrik> I'm pretty sure the X jerkiness *is* caused by the socket
+      communication
+    <antrik> which of course might be a scheduling issue
+    <antrik> but it seems perfectly possible that it *is* related to the select
+      implementation
+    <antrik> at least worth a try I'd say
+    <rbraun_hurd> sure
+    <rbraun_hurd> there is still some work to do on it though
+    <rbraun_hurd> the client side changes i did could be optimized a bit more
+    <rbraun_hurd> (but i'm afraid it would lead to ugly things like 2 timeout
+      parameters in the io_select_timeout call, one for the client side, the
+      other for the servers, eh)
+
+
+## IRC, freenode, #hurd, 2012-08-07
+
+    <braunr> when running gitk on [darnassus], yesterday, i could push the CPU
+      to 100% by simply moving the mouse in the window :p
+    <braunr> (but it may also be caused by the select fix)
+    <antrik> braunr: that cursor might be "normal"
+    <rbraunrh> antrik: what do you mean ?
+    <antrik> the 100% CPU
+    <rbraunh> antrik: yes i got that, but what would make it normal ?
+    <rbraunh> antrik: right i get similar behaviour on linux actually
+    <rbraunh> (not 100% because two threads are spread on different cores, but
+      their cpu usage add up to 100%)
+    <rbraunh> antrik: so you think as long as there are events to process, the
+      x client is running
+    <rbraunh> thath would mean latencies are small enough to allow that, which
+      is actually a very good thing
+    <antrik> hehe... sound kinda funny :-)
+    <rbraunh> this linear search on dequeue is a real pain :/
+
+
+## IRC, freenode, #hurd, 2012-08-09
+
+`screen` doesn't close a window/hangs after exiting the shell.
+
+    <rbraunh> the screen issue seems linked to select :p
+    <rbraunh> tschwinge: the term server may not correctly implement it
+    <rbraunh> tschwinge: the problem looks related to the term consoles not
+      dying
+    <rbraunh> http://www.gnu.org/software/hurd/open_issues/term_blocking.html
+
+[[Term_blocking]].
+
+
+# IRC, freenode, #hurd, 2012-12-05
+
+    <braunr> well if i'm unable to build my own packages, i'll send you the one
+      line patch i wrote that fixes select/poll for the case where there is
+      only one descriptor
+    <braunr> (the current code calls mach_msg twice, each time with the same
+      timeout, doubling the total wait time when there is no event)
+
+
+## IRC, freenode, #hurd, 2012-12-06
+
+    <braunr> damn, my eglibc patch breaks select :x
+    <braunr> i guess i'll just simplify the code by using the same path for
+      both single fd and multiple fd calls
+    <braunr> at least, the patch does fix the case i wanted it to .. :)
+    <braunr> htop and ping act at the right regular interval
+    <braunr> my select patch is :
+    <braunr>    /* Now wait for reply messages.  */
+    <braunr> -  if (!err && got == 0)
+    <braunr> +  if (!err && got == 0 && firstfd != -1 && firstfd != lastfd)
+    <braunr> basically, when there is a single fd, the code calls io_select
+      with a timeout
+    <braunr> and later calls mach_msg with the same timeout
+    <braunr> effectively making the maximum wait time twice what it should be
+    <pinotree> ouch
+    <braunr> which is why htop and ping are "laggy"
+    <braunr> and perhaps also why fakeroot is when building libc
+    <braunr> well
+    <braunr> when building packages
+    <braunr> my patch avoids entering the mach_msg call if there is only one fd
+    <braunr> (my failed attempt didn't have the firstfd != -1 check, leading to
+      the 0 fd case skipping mach_msg too, which is wrong since in that case
+      there is just no wait, making applications use select/poll for sleeping
+      consume all cpu)
+
+    <braunr> the second is a fix in select (yet another) for the case where a
+      single fd is passed
+    <braunr> in which case there is one timeout directly passed in the
+      io_select call, but then yet another in the mach_msg call that waits for
+      replies
+    <braunr> this can account for the slowness of a bunch of select/poll users
+
+
+## IRC, freenode, #hurd, 2012-12-07
+
+    <braunr> finally, my select patch works :)
+
+
+## IRC, freenode, #hurd, 2012-12-08
+
+    <braunr> for those interested, i pushed my eglibc packages that include
+      this little select/poll timeout fix on my debian repository
+    <braunr> deb http://ftp.sceen.net/debian-hurd experimental/
+    <braunr> reports are welcome, i'm especially interested in potential
+      regressions
+
+
+## IRC, freenode, #hurd, 2012-12-10
+
+    <gnu_srs> I have verified your double timeout bug in hurdselect.c.
+    <gnu_srs>  Since I'm also working on hurdselect I have a few questions
+      about where the timeouts in mach_msg and io_select are implemented.
+    <gnu_srs> Have a big problem to trace them down to actual code: mig magic
+      again?
+    <braunr> yes
+    <braunr> see hurd/io.defs, io_select includes a waittime timeout:
+      natural_t; parameter
+    <braunr> waittime is mig magic that tells the client side not to wait more
+      than the timeout
+    <braunr> and in _hurd_select, you can see these lines :
+    <braunr>             err = __io_select (d[i].io_port, d[i].reply_port,
+    <braunr>                                /* Poll only if there's a single
+      descriptor.  */
+    <braunr>                                (firstfd == lastfd) ? to : 0,
+    <braunr> to being the timeout previously computed
+    <braunr> "to"
+    <braunr> and later, when waiting for replies :
+    <braunr>       while ((msgerr = __mach_msg (&msg.head,
+    <braunr>                                    MACH_RCV_MSG | options,
+    <braunr>                                    0, sizeof msg, portset, to,
+    <braunr>                                    MACH_PORT_NULL)) ==
+      MACH_MSG_SUCCESS)
+    <braunr> the same timeout is used
+    <braunr> hope it helps
+    <gnu_srs> Additional stuff on io-select question is at
+      http://paste.debian.net/215401/
+    <gnu_srs> Sorry, should have posted it before you comment, but was
+      disturbed.
+    <braunr> 14:13 < braunr> waittime is mig magic that tells the client side
+      not to wait more than the timeout
+    <braunr> the waittime argument is a client argument only
+    <braunr> that's one of the main source of problems with select/poll, and
+      the one i fixed 6 months ago
+    <gnu_srs> so there is no relation to the third argument of the client call
+      and the third argument of the server code?
+    <braunr> no
+    <braunr> the 3rd argument at server side is undoubtedly the 4th at client
+      side here
+    <gnu_srs> but for the fourth argument there is?
+    <braunr> i think i've just answered that
+    <braunr> when in doubt, check the code generated by mig when building glibc
+    <gnu_srs> as I said before, I have verified the timeout bug you solved.
+    <gnu_srs> which code to look for RPC_*?
+    <braunr> should be easy to guess
+    <gnu_srs> is it the same with mach_msg()? No explicit usage of the timeout
+      there either.
+    <gnu_srs> in the code for the function I mean.
+    <braunr> gnu_srs: mach_msg is a low level system call
+    <braunr> see
+      http://www.gnu.org/software/hurd/gnumach-doc/Mach-Message-Call.html#Mach-Message-Call
+    <gnu_srs> found the definition of __io_select in: RPC_io_select.c, thanks.
+    <gnu_srs> so the client code to look for wrt RPC_ is in hurd/*.defs? what
+      about the gnumach/*/include/*.defs?
+    <gnu_srs> a final question: why use a timeout if there is a single FD for
+      the __io_select call, not when there are more than one?
+    <braunr> well, the code is obviously buggy, so don't expect me to justify
+      wrong code
+    <braunr> but i suppose the idea was : if there is only one fd, perform a
+      classical synchronous RPC, whereas if there are more use a heavyweight
+      portset and additional code to receive replies
+
+    <youpi> exim4 didn't get fixed by the libc patch, unfortunately
+    <braunr> yes i noticed
+    <braunr> gdb can't attach correctly to exim, so it's probably something
+      completely different
+    <braunr> i'll try the non intrusive mode
+
+
 # See Also
 
 See also [[select_bogus_fd]] and [[select_vs_signals]].