diff options
author | Samuel Thibault <samuel.thibault@ens-lyon.org> | 2015-02-18 00:58:35 +0100 |
---|---|---|
committer | Samuel Thibault <samuel.thibault@ens-lyon.org> | 2015-02-18 00:58:35 +0100 |
commit | 49a086299e047b18280457b654790ef4a2e5abfa (patch) | |
tree | c2b29e0734d560ce4f58c6945390650b5cac8a1b /open_issues/ext2fs_page_cache_swapping_leak.mdwn | |
parent | e2b3602ea241cd0f6bc3db88bf055bee459028b6 (diff) | |
download | web-49a086299e047b18280457b654790ef4a2e5abfa.tar.gz web-49a086299e047b18280457b654790ef4a2e5abfa.tar.bz2 web-49a086299e047b18280457b654790ef4a2e5abfa.zip |
Revert "rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn"
This reverts commit 95878586ec7611791f4001a4ee17abf943fae3c1.
Diffstat (limited to 'open_issues/ext2fs_page_cache_swapping_leak.mdwn')
-rw-r--r-- | open_issues/ext2fs_page_cache_swapping_leak.mdwn | 361 |
1 files changed, 361 insertions, 0 deletions
diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn new file mode 100644 index 00000000..81915492 --- /dev/null +++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn @@ -0,0 +1,361 @@ +[[!meta copyright="Copyright © 2011, 2012, 2013 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach open_issue_hurd]] + +There is a [[!FF_project 272]][[!tag bounty]] on this task. + +[[!toc]] + + +# IRC, OFTC, #debian-hurd, 2011-03-24 + + <youpi> I still believe we have an ext2fs page cache swapping leak, however + <youpi> as the 1.8GiB swap was full, yet the ld process was only 1.5GiB big + <pinotree> a leak at swapping time, you mean? + <youpi> I mean the ext2fs page cache being swapped out instead of simply + dropped + <pinotree> ah + <pinotree> so the swap tends to accumulate unuseful stuff, i see + <youpi> yes + <youpi> the disk content, basicallyt :) + + +# IRC, freenode, #hurd, 2011-04-18 + + <antrik> damn, a cp -a simply gobbles down swap space... + <braunr> really ? + <braunr> that's weird + <braunr> why would a copy use so much anonymous memory ? + <braunr> unless the external pager is so busy that the kernel falls back to + its default pager + <youpi> that's what I suggested some time ago + <braunr> maybe this case should be traced in the kernel + <braunr> a simple message in the kernel buffer to warn that this condition + happened may help + <youpi> I'm seeing swap space being kept used on buildds for no real reason + except possibly backing ext2fs pages + <youpi> that could help, yes + <antrik> youpi: I think it was actually slpz who suggested that... + <youpi> I think we're generally missing feedback from memory behavior + <antrik> youpi: do you think andrei's kernel instrumentation work might be + helpful with analyzing such things? + <youpi> antrik: I think I suggested it too, but never mind + <youpi> antrik: no, because it's not a trace of events that you want + <youpi> some specific events would be useful + <youpi> but then we don't really need a whole framework for that + <antrik> apt-get upgrade eats swap too + <youpi> the upgrade itself, or the computation of the ugprade? + <youpi> apt is a memory eater nowadays + <antrik> installing the packages + <antrik> seems to have stabilized though after a while... + <antrik> so perhaps it's not a leak in this case + <youpi> ideally we should have a way to know what was put in the swap + <braunr> how would you represent what's in the swap ? + <antrik> the apt-get process has 46M of virtual memory above the 128 M + baseline + <braunr> mostly libraries i guess + <braunr> are trheads stacks 8 MiB like on Linux ? + <youpi> braunr: at least knowing how much of each process is in the swap + <youpi> braunr: 2MiB + <braunr> ok + <youpi> vminfo could also report which parts of the address space are in + the swap + <antrik> youpi: would be nice to have some simple utility reporting how + much of a process' address space is anonymous + <antrik> (in fact, I wonder why it's not reported by standard tools such as + ps or top... this shouldn't be too difficult I would think?) + <antrik> it would be much more useful information than the total virt size, + which includes rather meaningless disk and device mappings... + <youpi> agreed + <braunr> well + <braunr> there are tools like pmap for this + <braunr> unfortunately, it's difficult in mach to know what backs a + non-anonymous mapping + <braunr> pagers should be able to name their mappings + <youpi> that'd be helpful for debugging yes + <braunr> there is almost no overhead in doing that, and it would be very + useful + <youpi> and could lead to /proc/pid/maps + <braunr> yes + <braunr> isn't there a maps already ? + <youpi> nope + <braunr> ok + <youpi> (probably not very useful without the names) + <braunr> ithought i remembered maps without names, and guessed it might + have been on the hurd for that reason + <braunr> but i'm not sure + <youpi> there's the vminfo command, yes + <braunr> 14:06 < youpi> braunr: at least knowing how much of each process + is in the swap + <braunr> wouldn't it be clearer to do it the other way around ? + <braunr> like a swapinfo tool indicating what it contains ? + <youpi> sure, but it's a lot more difficult + <braunr> really ? + <braunr> why ? + <youpi> because you have to traverse all the mappings + <youpi> etc + <youpi> (in all processes, I mean) + <youpi> and you have to name what is waht + <braunr> there are other ways + <braunr> the swap is a central structure + <youpi> while simply introducing the swap % in vminfo + <youpi> for a given process you know what is what + <braunr> right + <youpi> and doing that introduction is probably very simple + <braunr> that's a good point + <braunr> top-down is effectively easier than bottom-up resolution in Mach + VM + <antrik> hm... the memory use caused by cp doesn't seem to be reflected in + the virtual size of any particular process + <antrik> ghost memory + <braunr> what's cp vmsize at the time of the problem ? + <antrik> it's at 134 M right now... so considering the 128 M baseline, + nothing worth speaking of + <braunr> right + <braunr> maybe a copy map during I/O + <braunr> but I don't know Mach copy maps in detail, as they have been + eliminated from UVM + <antrik> BTW, the memory eatup happens even before swap comes into + play... swapping seems to be a result of the problem, not the cause + <braunr> what do you mean ? + <braunr> I thought swapping was the issue + <braunr> you mean RAM is full before swapping ? + <antrik> well, I don't know what the actual problem is... I just don't + understand why the memory use increases without any particular process + seeing an increase in size + <antrik> the "free" size in vmstat decreses + <antrik> once it's eatun up, swap space use increases + <braunr> well it doesn't change much of it + <braunr> the anonymous memory pager will use RAM before resorting to the + external default-pager + <antrik> I would suspect normal block caching... but then, shouldn't this + show up in the memory info of the ext2 process? + <braunr> although, again, I'm not sure of the behaviour of the anonymous + memory pager + <braunr> antrik: I don't know how block caching behaves + <antrik> BTW, is it a know problem that doing ^C on a "cp -a" seems to hang + the whole system?... + <antrik> (the whole hurd instance that is... the other instance is not + affected) + <youpi> not that I know of + <braunr> seems like a deadlock in the anonymous memory handling + <youpi> (and I've never seen that) + <antrik> happens both in my main system (using ancient hurd/libc) and in my + subhurd (recently upgraded to current stuff) + <antrik> this make testing this stuff quite a lot harder... [sigh] + <antrik> any suggestions how to debug this hang? + <braunr> antrik: no :/ + +2011-04-28: [[!taglink open_issue_documentation]] + + <antrik> hm... is it normal that "swap free" doesn't increase as a process' + memory is paged back in? + <youpi> yes + <youpi> there's no real use cleaning swap + <youpi> on the contrary, it makes paging the process out again longer + <antrik> hm... so essentially, after swapping back and forth a bit, a part + of the swap equal to the size of physical RAM will be occupied with stuff + that is actually in RAM? + <youpi> yes + <youpi> so that that RAM can be freed immediately if needed + <antrik> hm... that means my effective swap size is only like 300 MB... no + wonder I see crashes under load + <antrik> err... make that 230 actually + <antrik> indeed, quitting the application freed both the physical RAM and + swap space + <braunr> 02:28 < antrik> hm... is it normal that "swap free" doesn't + increase as a process' memory is paged back in? + <braunr> swap is the backing store of anonymous memory, like ext2fs is the + backing store of memory objects created from its pager + <braunr> so you can view swap as the file system for everything that isn't + an external memory object + + +# IRC, freenode, #hurd, 2011-11-15 + + <braunr> hm, now my system got unstable + <braunr> swap is increasing, without any apparent reason + <antrik> you mean without any load? + <braunr> with load, yes + <braunr> :) + <antrik> well, with load is "normal"... + <antrik> at least for some loads + <braunr> i can't create memory pressure to stress reclaiming without any + load + <antrik> what load are you using? + <braunr> ftp mirrorring + <antrik> hm... never tried that; but I guess it's similar to apt-get + <antrik> so yes, that's "normal". I talked about it several times, and also + wrote to the ML + <braunr> antrik: ok + <antrik> if you find out how to fix this, you are my hero ;-) + <braunr> arg :) + <antrik> I suspect it's the infamous double swapping problem; but that's + just a guess + <braunr> looks like this + <antrik> BTW, if you give me the exact command, I could check if I see it + too + <braunr> i use lftp (mirror -Re) from a linux git repository + <braunr> through sftp + <braunr> (lots of small files, big content) + <antrik> can't you just give me the exact command? I don't feel like + figuring it out myself + <braunr> antrik: cd linux-stable; lftp sftp://hurd_addr/ + <braunr> inside lftp: mkdir linux-stable; cd linux-stable; mirror -Re + <braunr> hm, half of physical memory just got freed + <braunr> our page cache is really weird :/ + <braunr> (i didn't delete any file when that happened) + <antrik> hurd_addr? + <braunr> ssh server ip address + <braunr> or name + <braunr> of your hurd :) + <antrik> I'm confused. you are mirroring *from* the Hurd box? + <braunr> no, to it + <antrik> ah, so you login via sftp and then push to it? + <braunr> yes + <braunr> fragmentation looks very fine + <braunr> even for the huge pv_entry cache and its 60k+ entries + <braunr> (and i'm running a kernel with the cpu layer enabled) + <braunr> git reset/status/diff/log/grep all work correctly + <braunr> anyway, mcsim's branch looks quite stable to me + <antrik> braunr: I can't reproduce the swap leak with ftp. free memory + idles around 6.5 k (seems to be the threshold where paging starts), and + swap use is constant + <antrik> might be because everything swappable is already present in swap + from previous load I guess... + <antrik> err... scratch that. was connected to the wrong host, silly me + <antrik> indeed swap gets eaten away, as expected + <antrik> but only if free memory actually falls below the + threshold. otherwise it just oscillates around a constant value, and + never touches swap + <antrik> so this seems to confirm the double swapping theory + <youpi> antrik: is that "double swap" theory written somewhere? + <youpi> (no, a quick google didn't tell me) + + +## IRC, freenode, #hurd, 2011-11-16 + + <antrik> youpi: + http://lists.gnu.org/archive/html/l4-hurd/2002-06/msg00001.html talks + about "double paging". probably it's also the term others used for it; + however, the term is generally used in a completely different meaning, so + I guess it's not really suitable for googling either ;-) + <antrik> IIRC slpz (or perhaps someone else?) proposed a solution to this, + but I don't remember any details + <youpi> ok so it's the same thing I was thinking about with swap getting + filled + <youpi> my question was: is there something to release the double swap, + once the ext2fs pager managed to recover? + <antrik> apparently not + <antrik> the only way to free the memory seems to be terminating the FS + server + <youpi> uh :/ + + +# IRC, freenode, #hurd, 2011-11-30 + + <antrik> slpz: basically, whenever free memory goes below the paging + threshold (which seems to be around 6 MiB) while there is other I/O + happening, swap usage begins to increase continuously; and only gets + freed again when the filesystem translator in question exits + <antrik> so it sounds *very* much like pages go to swap because the + filesystem isn't quick enough to properly page them out + <antrik> slpz: I think it was you who talked about double paging a while + back? + <slpz> antrik: probably, sounds like me :-) + <antrik> slpz: I have some indication that the degenerating performance and + ultimate hang issues I'm seeing are partially or entirely caused by + double paging... + <antrik> slpz: I don't remember, did you propose some possible fix? + <slpz> antrik: hmm... perhaps it wasn't me, because I don't remember trying + to fix that problem... + <slpz> antrik: at which point do you think pages get duplicated? + <antrik> slpz: it was a question. I don't remember whether you proposed + something or not :-) + <antrik> slpz: basically, whenever free memory goes below the paging + threshold (which seems to be around 6 MiB) while there is other I/O + happening, swap usage begins to increase continuously; and only gets + freed again when the filesystem translator in question exits + <antrik> so it sounds *very* much like pages go to swap because the + filesystem isn't quick enough to properly page them out + <slpz> antrik: I see + <slpz> antrik: I didn't addressed this problem directly, but when I've + modified the pageout mechanism to provide a special treatment for + external pages, I also removed the possibility of sending them to the + default pager + <slpz> antrik: this was in my experimental environment, of course + <antrik> slpz: oh, nice... so it may fix the issues I'm seeing? :-) + <antrik> anything testable yet? + <slpz> antrik: yes, only anonymous memory could be swapped with that + <slpz> antrik: it works, but is ugly as hell + <antrik> tschwinge: these is also your observation about compilations + getting slower on further runs, and my followups... I *suspect* it's the + same issue + +[[performance/degradation]]. + + <slpz> antrik: I'm thinking about establishing a repository for these + experimental versions, so they don't get lost with the time + <antrik> slpz: please do :-) + <slpz> antrik: perhaps in savannah's HARD project + <antrik> even if it's not ready for upstream, it would be nice if I could + test it -- right now it's bothering me more than any other Hurd issues I + think... + <slpz> also, there's another problem which causes performance degradation + with the simple use of the system + <tschwinge> slpz: Please just push to Savannah Hurd. Under your + slpz/... or similar. + <tschwinge> antrik: Might very well be, yes. + <slpz> and I almost sure it is the fragmentation of the task map + <slpz> tschwinge: ok + <slpz> after playing a bit with a translator, it can easily get more than + 3000 entries in its map + <antrik> slpz: yeah, other issues might play a role here as well. I + observed that terminating the problematic FS servers does free most of + the memory and remove most of the performance degradation, but in some + cases it's still very slow + <slpz> that makes vm_map_lookup a lot slower + <antrik> on a related note: any idea what can cause paging errors and a + system hang even when there is plenty of free swap? + <antrik> (I'm not entirely sure, but my impression is that it *might* be + related to the swap usage and performance degradation problems) + <slpz> I think this degree of fragmentation has something to do with the + reiterative mapping of memory objects which is done in pager-memcpy.c + <slpz> antrik: which kind of paging errors? + <antrik> hm... I don't think I ever noted down the exact message; but I + think it's the same you get when actually running out of swap + <slpz> antrik: that could be the default pager dying for some internal bug + <antrik> well, but it *seems* to go along with the performance degradation + and/or swap usage + <slpz> I also have the impression that we're using memory objects the wrong + way + <antrik> basically, once I get to a certain level of swap use and slowness + (after about a month of use), the system eventually dies + <slpz> antrik: I never had a system running for that time, so it could be a + completely different problem from what I've seen before :-/ + <slpz> Anybody has experience with block-level caches on microkernel + environments? + <antrik> slpz: yeah, it typically happens after about a month of my normal + use... but I can significantly accellerate it by putting some problematic + load on it, such as large apt-get runs... + <slpz> I wonder if it would be better to put them in kernel or in user + space. And in the latter, if it would be better to have one per-device + shared for all accesing translators, or just each task should have its + own cache... + <antrik> slpz: + http://lists.gnu.org/archive/html/bug-hurd/2011-09/msg00041.html is where + I described the issue(s) + <antrik> (should send another update for the most recent findings I + guess...) + <antrik> slpz: well, if we move to userspace drivers, the kernel part of + the question is already answered ;-) + <antrik> but I'm not sure about per-device cache vs. caching in FS server |