diff options
author | Thomas Schwinge <thomas@codesourcery.com> | 2014-02-26 12:32:06 +0100 |
---|---|---|
committer | Thomas Schwinge <thomas@codesourcery.com> | 2014-02-26 12:32:06 +0100 |
commit | c4ad3f73033c7e0511c3e7df961e1232cc503478 (patch) | |
tree | 16ddfd3348bfeec014a4d8bb8c1701023c63678f /open_issues/profiling.mdwn | |
parent | d9079faac8940c4654912b0e085e1583358631fe (diff) | |
download | web-c4ad3f73033c7e0511c3e7df961e1232cc503478.tar.gz web-c4ad3f73033c7e0511c3e7df961e1232cc503478.tar.bz2 web-c4ad3f73033c7e0511c3e7df961e1232cc503478.zip |
IRC.
Diffstat (limited to 'open_issues/profiling.mdwn')
-rw-r--r-- | open_issues/profiling.mdwn | 233 |
1 files changed, 232 insertions, 1 deletions
diff --git a/open_issues/profiling.mdwn b/open_issues/profiling.mdwn index 545edcf6..e7dde903 100644 --- a/open_issues/profiling.mdwn +++ b/open_issues/profiling.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010, 2011, 2013 Free Software Foundation, +[[!meta copyright="Copyright © 2010, 2011, 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -138,3 +138,234 @@ done for [[performance analysis|performance]] reasons. know what happen and how happen, maybe just suitable for newbie, hope more young hack like it <braunr> once it's done, everything else is just sugar candy around it + + +# IRC, freenode, #hurd, 2014-01-05 + + <teythoon> braunr: do you speak ocaml ? + <teythoon> i had this awesome idea for a universal profiling framework for + c + <teythoon> universal as in not os dependent, so it can be easily used on + hurd or in gnu mach + <teythoon> it does a source transformation, instrumenting what you are + interested in + <teythoon> for this transformation, coccinelle is used + <teythoon> i have a prototype to measure how often a field in a struct is + accessed + <teythoon> unfortunately, coccinelle hangs while processing kern/slab.c :/ + <youpi> teythoon: I do speak ocaml + <teythoon> awesome :) + <teythoon> unfortunately, i do not :/ + <teythoon> i should probably get in touch with the coccinelle devs, most + likely the problem is that coccinelle runs in circles somewhere + <youpi> it's not so complex actually + <youpi> possibly, yes + <teythoon> do you know coccinelle ? + <youpi> the only really peculiar thing in ocaml is lambda calculus + <youpi> +c + <youpi> I know a bit, although I've never really written an semantic patch + myself + <teythoon> i'm okay with that + <youpi> but I can understand them + <youpi> then ocaml should be fine for you :) + <youpi> just ask the few bits that you don't understand :) + <teythoon> yeah, i haven't really made an effort yet + <youpi> writing ocaml is a bit more difficult because you need to + understand the syntax, but for putting printfs it should be easy enough + <youpi> if you get a backtrace with ocamldebug (it basically works like + gdb), I can probably explain you what might be happening + + +## IRC, freenode, #hurd, 2014-01-06 + + <teythoon> braunr: i'm not doing microoptimizations, i'm developing a + profiler :p + <braunr> teythoon: nice :) + <teythoon> i thought you might like it + <braunr> teythoon: you may want to look at + http://pdos.csail.mit.edu/multicore/dprof/ + <braunr> from the same people who brought radixvm + <teythoon> which data structure should i test it with next ? + <braunr> uh, no idea :) + <braunr> the ipc ones i suppose + <teythoon> yeah, or the task related ones + <braunr> but be careful, there many "inline" versions of many ipc functions + in the fast paths + <braunr> and when they say inline, they really mean they copied it + <braunr> +are + <teythoon> but i have a microbenchmark for ipc performance + <braunr> you sure have been busy ;p + <braunr> it's funny you're working on a profiler at the same time a + collegue of mine said he was interested in writing one in x15 :) + <teythoon> i don't think inlining is a problem for my tool + <teythoon> well, you can use my tool for x15 + <braunr> i told him he could look at what you did + <braunr> so i expect he'll ask soon + <teythoon> cool :) + <teythoon> my tool uses coccinelle to instrument c code, so this works in + any environment + <teythoon> one just needs a little glue and a method to get the data + <braunr> seems reasonable + <teythoon> for gnumach, i just stuff a tiny bit of code into the kdb + + <teythoon> hm debians bigmem patch with my code transformation makes + gnumach hang early on + <teythoon> i don't even get a single message from gnumach + <braunr> ouch + <teythoon> or it is somethign else entirely + <teythoon> it didn't even work without my patches o_O + <teythoon> weird + <teythoon> uh oh, the kmem_cache array is not properly aligned + <teythoon> braunr: http://paste.debian.net/74588/ + <braunr> teythoon: do you mean, with your patch ? + <braunr> i'm not sure i understand + <braunr> are you saying gnumach doesn't start because of an alignment issue + ? + <teythoon> no, that's unrelated + <teythoon> i skipped the bigmem patch, have a running gnumach with + instrumentation + <braunr> hum, what is that aliased column ? + <teythoon> but, despite my efforts with __attribute__((align(64))), i see + lot's of accesses to kmem_cache objects which are not properly aligned + <braunr> is that reported by the performance counters ? + <teythoon> no + <teythoon> http://paste.debian.net/74593/ + <braunr> aer those the previous lines accessed by other unrelated code ? + <braunr> previous bytes in the same line* + <teythoon> this is a patch generated to instrument the code + <teythoon> so i instrument field access of the form i->a + <teythoon> but if one does &i->a, my approach will no longer keep track of + any access through that pointer + <teythoon> so i do not count that as an access but as creating an alias for + that field + <braunr> ok + <teythoon> so if that aliased count is not zero, the tool might + underestimate the access count + <teythoon> hm + <teythoon> static struct kmem_cache kalloc_caches[KALLOC_NR_CACHES] + __attribute__((align(64))); + <teythoon> but + <teythoon> nm gnumach|grep kalloc_caches + <teythoon> c0226e20 b kalloc_caches + <teythoon> ah, that's fine + <braunr> yes + <teythoon> nevr mind + <braunr> don't we have a macro for the cache line size ? + <teythoon> ah, there are a great many more kmem_caches around and noone + told me ... + <braunr> teythoon: eh :) + <braunr> aren't you familiar with type-specific caches ? + <teythoon> no, i'm not familiar with anything in gnumach-land + <braunr> well, it's the regular slab allocator, carrying the same ideas + since 1994 + <braunr> it's pretty much the same in linux and other modern unices + <teythoon> ok + <braunr> the main difference is likely that we allocate our caches + statically because we have no kernel modules and know we'll never destroy + them, only reap them + <teythoon> is there a macro for the cache line size ? + <teythoon> there is one burried in the linux source + <teythoon> L1_CACHE_BYTES from linux/src/include/asm-i386/cache.h + <braunr> there is one in kern/slab.h + <teythoon> but it is out of date + <teythoon> there is ? + <braunr> but it's commented out + <braunr> only used when SLAB_USE_CPU_POOLS is defined + <braunr> but the build system should give you CPU_L1_SHIFT + <teythoon> hm + <braunr> and we probably should define CPU_L1_SIZE from that + unconditionnally in config.h or a general param.h file if there is one + <braunr> the architecture-specific one perhaps + <braunr> although it's exported to userland so maybe not + + +## IRC, freenode, #hurd, 2014-01-07 + + <teythoon> braunr: linux defines ____cacheline_aligned : + http://lxr.free-electrons.com/source/include/linux/cache.h#L20 + <teythoon> where would i put a similar definition in gnumach ? + <taylanub> .oO( four underscores ?!? ) + <teythoon> heh + <teythoon> yes, four + <braunr> teythoon: yes :) + + <teythoon> are kmem_cache objects ever allocated dynamically in gnumach ? + <braunr> no + <teythoon> hm + <braunr> i figured that, since there are no kernel modules, there is no + need to allocate them dynamically, since they're never destroyed + <teythoon> so i aligned all statically declarations with + __attribute__((align(1 << CPU_L1_SHIFT))) + <teythoon> but i still see 77% of all accesses being to objects that are + not properly aligned o_O + <teythoon> ah + <teythoon> >,< + <braunr> you could add an assertion in kmem_cache_init to find out what's + wrong + <teythoon> *aligned + <braunr> eh :) + <braunr> right + <teythoon> grr + <teythoon> sweet, the kmem_caches are now all properly aligned :) + <braunr> :) + + <braunr> hm + <braunr> i guess i should change what vmstat reports as "cache" from the + cached objects to the external ones (which map files and not anonymous + memory) + <teythoon> braunr: http://paste.debian.net/74869/ + <teythoon> turned out that struct kmem_cache was actually an easy target + <teythoon> no bitfields, no embedded structs that were addressed as such + (and not aliased) + <braunr> :) + + +## IRC, freenode, #hurd, 2014-01-09 + + <teythoon> braunr: i didn't quite get what you and youpi were talking about + wrt to the alignment attribute + <teythoon> define a type for struct kmem_cache with the alignment attribute + ? is that possible ? + <teythoon> ah, like it's done for kmem_cpu_pool + <braunr> teythoon: that's it :) + <braunr> note that aligning a struct doesn't change what sizeof returns + <teythoon> heh, that save's one a whole lot of trouble indeed + <braunr> you have to align a member inside for that + <teythoon> why would it change the size ? + <braunr> imagine an array of such structs + <teythoon> ah + <teythoon> right + <teythoon> but it fits into two cachelines exactly + <braunr> that wouldn't be a problem with an array either + <teythoon> so an array of those will still be aligned element-wise + <teythoon> yes + <braunr> and it's often used like that, just as i did for the cpu pools + <braunr> but then one is tempted to think the size of each element has + changed too + <braunr> and then use that technique for, say, reserving a whole cache line + for one variable + <teythoon> ah, now i get that remark ;) + <braunr> :) + + <teythoon> braunr: i annotated struct kmem_cache in slab.h with + __cacheline_aligned and it did not have the desired effect + <braunr> can you show the diff please ? + <teythoon> http://paste.debian.net/75192/ + <braunr> i don't know why :/ + <teythoon> that's how it's done for kmem_cpu_pool + <braunr> i'll try it here + <teythoon> wait + <teythoon> i made a typo + <teythoon> >,< + <teythoon> __cachline_aligned + <teythoon> bad one + <braunr> uh :) + <braunr> i don't see it + <braunr> ah yes + <braunr> missing e + <teythoon> yep, works like a charme :) + <teythoon> nice, good to know :) + <braunr> :) + <teythoon> given the previous discussion, shall i send it to the list or + commit it right away ? + <braunr> i'd say go ahead and commit |