diff options
Diffstat (limited to 'open_issues/profiling.mdwn')
-rw-r--r-- | open_issues/profiling.mdwn | 371 |
1 files changed, 0 insertions, 371 deletions
diff --git a/open_issues/profiling.mdwn b/open_issues/profiling.mdwn deleted file mode 100644 index e7dde903..00000000 --- a/open_issues/profiling.mdwn +++ /dev/null @@ -1,371 +0,0 @@ -[[!meta copyright="Copyright © 2010, 2011, 2013, 2014 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!meta title="Profiling, Tracing"]] - -*Profiling* ([[!wikipedia Profiling_(computer_programming) desc="Wikipedia -article"]]) is a tool for tracing where CPU time is spent. This is usually -done for [[performance analysis|performance]] reasons. - - * [[hurd/debugging/rpctrace]] - - * [[gprof]] - - Should be working, but some issues have been reported, regarding GCC spec - files. Should be possible to fix (if not yet done) easily. - - * [[glibc]]'s sotruss - - * [[ltrace]] - - * [[latrace]] - - * [[community/gsoc/project_ideas/dtrace]] - - Have a look at this, integrate it into the main trees. - - * [[LTTng]] - - * [[SystemTap]] - - * ... or some other Linux thing. - - -# IRC, freenode, #hurd, 2013-06-17 - - <congzhang> is that possible we develop rpc msg analyse tool? make it clear - view system at different level? - <congzhang> hurd was dynamic system, how can we just read log line by line - <kilobug> congzhang: well, you can use rpctrace and then analyze the logs, - but rpctrace is quite intrusive and will slow down things (like strace or - similar) - <kilobug> congzhang: I don't know if a low-overhead solution could be made - or not - <congzhang> that's the problem - <congzhang> when real system run, the msg cross different server, and then - the debug action should not intrusive the process itself - <congzhang> we observe the system and analyse os - <congzhang> when rms choose microkernel, it's expect to accelerate the - progress, but not - <congzhang> microkernel make debug a litter hard - <kilobug> well, it's not limited to microkernels, debugging/tracing is - intrusive and slow things down, it's an universal law of compsci - <kilobug> no, it makes debugging easier - <congzhang> I don't think so - <kilobug> you can gdb the various services (like ext2fs or pfinet) more - easily - <kilobug> and rpctrace isn't any worse than strace - <congzhang> how easy when debug lpc - <kilobug> lpc ? - <congzhang> because cross context - <congzhang> classic function call - <congzhang> when find the bug source, I don't care performance, I wan't to - know it's right or wrong by design, If it work as I expect - <congzhang> I optimize it latter - <congzhang> I have an idea, but don't know weather it's usefull or not - <braunr> rpctrace is a lot less instrusive than ptrace based tools - <braunr> congzhang: debugging is not made hard by the design choice, but by - implementation details - <braunr> as a simple counter example, someone often cited usb development - on l3 being made a lot easier than on a monolithic kernel - <congzhang> Collect the trace information first, and then layout the msg by - graph, when something wrong, I focus the trouble rpc, and found what - happen around - <braunr> "by graph" ? - <congzhang> yes - <congzhang> braunr: directed graph or something similar - <braunr> and not caring about performance when debugging is actually stupid - <braunr> i've seen it on many occasions, people not being able to use - debugging tools because they were far too inefficient and slow - <braunr> why a graph ? - <braunr> what you want is the complete trace, taking into account cross - address space boundaries - <congzhang> yes - <braunr> well it's linear - <braunr> switching server - <congzhang> by independent process view it's linear - <congzhang> it's linear on cpu's view too - <congzhang> yes, I need complete trace, and dynamic control at microkernel - level - <congzhang> os, if server crash, and then I know what's other doing, from - the graph - <congzhang> graph needn't to be one, if the are not connect together, time - sort them - <congzhang> when hurd was complete ok, some tools may be help too - <braunr> i don't get what you want on that graph - <congzhang> sorry, I need a context - <congzhang> like uml sequence diagram, I need what happen one by one - <congzhang> from server's view and from the function's view - <braunr> that's still linear - <braunr> so please stop using the word graph - <braunr> you want a trace - <braunr> a simple call trace - <congzhang> yes, and a tool - <braunr> with some work gdb could do it - <congzhang> you mean under some microkernel infrastructure help - <congzhang> ? - <braunr> if needed - <congzhang> braunr: will that be easy? - <braunr> not too hard - <braunr> i've had this idea for a long time actually - <braunr> another reason i insist on migrating threads (or rather, binding - server and client threads) - <congzhang> braunr: that's great - <braunr> the current problem we have when using gdb is that we don't know - which server thread is handling the request of which client - <braunr> we can guess it - <braunr> but it's not always obvious - <congzhang> I read the talk, know some of your idea - <congzhang> make things happen like classic kernel, just from function - ,sure:) - <braunr> that's it - <congzhang> I think you and other do a lot of work to improve the mach and - hurd, buT we lack the design document and the diagram, one diagram was - great than one thousand words - <braunr> diagrams are made after the prototypes that prove they're doable - <braunr> i'm not a researcher - <braunr> and we have little time - <braunr> the prototype is the true spec - <congzhang> that's why i wan't cllector the trace info and show, you can - know what happen and how happen, maybe just suitable for newbie, hope - more young hack like it - <braunr> once it's done, everything else is just sugar candy around it - - -# IRC, freenode, #hurd, 2014-01-05 - - <teythoon> braunr: do you speak ocaml ? - <teythoon> i had this awesome idea for a universal profiling framework for - c - <teythoon> universal as in not os dependent, so it can be easily used on - hurd or in gnu mach - <teythoon> it does a source transformation, instrumenting what you are - interested in - <teythoon> for this transformation, coccinelle is used - <teythoon> i have a prototype to measure how often a field in a struct is - accessed - <teythoon> unfortunately, coccinelle hangs while processing kern/slab.c :/ - <youpi> teythoon: I do speak ocaml - <teythoon> awesome :) - <teythoon> unfortunately, i do not :/ - <teythoon> i should probably get in touch with the coccinelle devs, most - likely the problem is that coccinelle runs in circles somewhere - <youpi> it's not so complex actually - <youpi> possibly, yes - <teythoon> do you know coccinelle ? - <youpi> the only really peculiar thing in ocaml is lambda calculus - <youpi> +c - <youpi> I know a bit, although I've never really written an semantic patch - myself - <teythoon> i'm okay with that - <youpi> but I can understand them - <youpi> then ocaml should be fine for you :) - <youpi> just ask the few bits that you don't understand :) - <teythoon> yeah, i haven't really made an effort yet - <youpi> writing ocaml is a bit more difficult because you need to - understand the syntax, but for putting printfs it should be easy enough - <youpi> if you get a backtrace with ocamldebug (it basically works like - gdb), I can probably explain you what might be happening - - -## IRC, freenode, #hurd, 2014-01-06 - - <teythoon> braunr: i'm not doing microoptimizations, i'm developing a - profiler :p - <braunr> teythoon: nice :) - <teythoon> i thought you might like it - <braunr> teythoon: you may want to look at - http://pdos.csail.mit.edu/multicore/dprof/ - <braunr> from the same people who brought radixvm - <teythoon> which data structure should i test it with next ? - <braunr> uh, no idea :) - <braunr> the ipc ones i suppose - <teythoon> yeah, or the task related ones - <braunr> but be careful, there many "inline" versions of many ipc functions - in the fast paths - <braunr> and when they say inline, they really mean they copied it - <braunr> +are - <teythoon> but i have a microbenchmark for ipc performance - <braunr> you sure have been busy ;p - <braunr> it's funny you're working on a profiler at the same time a - collegue of mine said he was interested in writing one in x15 :) - <teythoon> i don't think inlining is a problem for my tool - <teythoon> well, you can use my tool for x15 - <braunr> i told him he could look at what you did - <braunr> so i expect he'll ask soon - <teythoon> cool :) - <teythoon> my tool uses coccinelle to instrument c code, so this works in - any environment - <teythoon> one just needs a little glue and a method to get the data - <braunr> seems reasonable - <teythoon> for gnumach, i just stuff a tiny bit of code into the kdb - - <teythoon> hm debians bigmem patch with my code transformation makes - gnumach hang early on - <teythoon> i don't even get a single message from gnumach - <braunr> ouch - <teythoon> or it is somethign else entirely - <teythoon> it didn't even work without my patches o_O - <teythoon> weird - <teythoon> uh oh, the kmem_cache array is not properly aligned - <teythoon> braunr: http://paste.debian.net/74588/ - <braunr> teythoon: do you mean, with your patch ? - <braunr> i'm not sure i understand - <braunr> are you saying gnumach doesn't start because of an alignment issue - ? - <teythoon> no, that's unrelated - <teythoon> i skipped the bigmem patch, have a running gnumach with - instrumentation - <braunr> hum, what is that aliased column ? - <teythoon> but, despite my efforts with __attribute__((align(64))), i see - lot's of accesses to kmem_cache objects which are not properly aligned - <braunr> is that reported by the performance counters ? - <teythoon> no - <teythoon> http://paste.debian.net/74593/ - <braunr> aer those the previous lines accessed by other unrelated code ? - <braunr> previous bytes in the same line* - <teythoon> this is a patch generated to instrument the code - <teythoon> so i instrument field access of the form i->a - <teythoon> but if one does &i->a, my approach will no longer keep track of - any access through that pointer - <teythoon> so i do not count that as an access but as creating an alias for - that field - <braunr> ok - <teythoon> so if that aliased count is not zero, the tool might - underestimate the access count - <teythoon> hm - <teythoon> static struct kmem_cache kalloc_caches[KALLOC_NR_CACHES] - __attribute__((align(64))); - <teythoon> but - <teythoon> nm gnumach|grep kalloc_caches - <teythoon> c0226e20 b kalloc_caches - <teythoon> ah, that's fine - <braunr> yes - <teythoon> nevr mind - <braunr> don't we have a macro for the cache line size ? - <teythoon> ah, there are a great many more kmem_caches around and noone - told me ... - <braunr> teythoon: eh :) - <braunr> aren't you familiar with type-specific caches ? - <teythoon> no, i'm not familiar with anything in gnumach-land - <braunr> well, it's the regular slab allocator, carrying the same ideas - since 1994 - <braunr> it's pretty much the same in linux and other modern unices - <teythoon> ok - <braunr> the main difference is likely that we allocate our caches - statically because we have no kernel modules and know we'll never destroy - them, only reap them - <teythoon> is there a macro for the cache line size ? - <teythoon> there is one burried in the linux source - <teythoon> L1_CACHE_BYTES from linux/src/include/asm-i386/cache.h - <braunr> there is one in kern/slab.h - <teythoon> but it is out of date - <teythoon> there is ? - <braunr> but it's commented out - <braunr> only used when SLAB_USE_CPU_POOLS is defined - <braunr> but the build system should give you CPU_L1_SHIFT - <teythoon> hm - <braunr> and we probably should define CPU_L1_SIZE from that - unconditionnally in config.h or a general param.h file if there is one - <braunr> the architecture-specific one perhaps - <braunr> although it's exported to userland so maybe not - - -## IRC, freenode, #hurd, 2014-01-07 - - <teythoon> braunr: linux defines ____cacheline_aligned : - http://lxr.free-electrons.com/source/include/linux/cache.h#L20 - <teythoon> where would i put a similar definition in gnumach ? - <taylanub> .oO( four underscores ?!? ) - <teythoon> heh - <teythoon> yes, four - <braunr> teythoon: yes :) - - <teythoon> are kmem_cache objects ever allocated dynamically in gnumach ? - <braunr> no - <teythoon> hm - <braunr> i figured that, since there are no kernel modules, there is no - need to allocate them dynamically, since they're never destroyed - <teythoon> so i aligned all statically declarations with - __attribute__((align(1 << CPU_L1_SHIFT))) - <teythoon> but i still see 77% of all accesses being to objects that are - not properly aligned o_O - <teythoon> ah - <teythoon> >,< - <braunr> you could add an assertion in kmem_cache_init to find out what's - wrong - <teythoon> *aligned - <braunr> eh :) - <braunr> right - <teythoon> grr - <teythoon> sweet, the kmem_caches are now all properly aligned :) - <braunr> :) - - <braunr> hm - <braunr> i guess i should change what vmstat reports as "cache" from the - cached objects to the external ones (which map files and not anonymous - memory) - <teythoon> braunr: http://paste.debian.net/74869/ - <teythoon> turned out that struct kmem_cache was actually an easy target - <teythoon> no bitfields, no embedded structs that were addressed as such - (and not aliased) - <braunr> :) - - -## IRC, freenode, #hurd, 2014-01-09 - - <teythoon> braunr: i didn't quite get what you and youpi were talking about - wrt to the alignment attribute - <teythoon> define a type for struct kmem_cache with the alignment attribute - ? is that possible ? - <teythoon> ah, like it's done for kmem_cpu_pool - <braunr> teythoon: that's it :) - <braunr> note that aligning a struct doesn't change what sizeof returns - <teythoon> heh, that save's one a whole lot of trouble indeed - <braunr> you have to align a member inside for that - <teythoon> why would it change the size ? - <braunr> imagine an array of such structs - <teythoon> ah - <teythoon> right - <teythoon> but it fits into two cachelines exactly - <braunr> that wouldn't be a problem with an array either - <teythoon> so an array of those will still be aligned element-wise - <teythoon> yes - <braunr> and it's often used like that, just as i did for the cpu pools - <braunr> but then one is tempted to think the size of each element has - changed too - <braunr> and then use that technique for, say, reserving a whole cache line - for one variable - <teythoon> ah, now i get that remark ;) - <braunr> :) - - <teythoon> braunr: i annotated struct kmem_cache in slab.h with - __cacheline_aligned and it did not have the desired effect - <braunr> can you show the diff please ? - <teythoon> http://paste.debian.net/75192/ - <braunr> i don't know why :/ - <teythoon> that's how it's done for kmem_cpu_pool - <braunr> i'll try it here - <teythoon> wait - <teythoon> i made a typo - <teythoon> >,< - <teythoon> __cachline_aligned - <teythoon> bad one - <braunr> uh :) - <braunr> i don't see it - <braunr> ah yes - <braunr> missing e - <teythoon> yep, works like a charme :) - <teythoon> nice, good to know :) - <braunr> :) - <teythoon> given the previous discussion, shall i send it to the list or - commit it right away ? - <braunr> i'd say go ahead and commit |