|
|
Created:
4 years, 10 months ago by Primiano Tucci (use gerrit) Modified:
4 years, 9 months ago CC:
chromium-reviews, wfh+watch_chromium.org, Dai Mikurube (NOT FULLTIME), vmpstr+watch_chromium.org, chrisha Base URL:
https://chromium.googlesource.com/chromium/src.git@shim_exp_flag Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAllocator shim skeleton + Linux impl behind a build flag
TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.
Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
std::new_handler, preventing allocations > 2GB which can trigger
signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).
Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
them in allocator_shim_win.cc
On Android: we don't have 1)
The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).
Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
the same code paths of the security enforcement layer.
Notes about execution
---------------------
Essentially on each platform we need to do three things:
- Dismantle the code that defines the malloc()/new symbols.
- Let the malloc()/new symbols route through the shim.
- Have the shim ultimately invoke the actual allocator (tcmalloc,
winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.
Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.
Design doc: bit.ly/allocator-shim
BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.*
Committed: https://crrev.com/4e68ed2d51f897d12a570e16f7adbcb7595fa031
Cr-Commit-Position: refs/heads/master@{#380196}
Patch Set 1 : rebase #Patch Set 2 : Remove use of c++ classes + test #Patch Set 3 : cleanup #
Total comments: 1
Patch Set 4 : Rebase + some thoughts overnight #Patch Set 5 : rebase #Patch Set 6 : rebase + remove removal #Patch Set 7 : rebase #
Total comments: 32
Patch Set 8 : Rebase + address thakis review #Patch Set 9 : Make self the 1st arg #
Dependent Patchsets: Messages
Total messages: 47 (31 generated)
Description was changed from ========== NOT FOR REVIEW: WIP on unified shim BUG=550886 ========== to ========== [Work In Progress] Unified shim allocator shim (Linux Desktop) NOT FOR REVIEW (but works) BUG=550886 ==========
Description was changed from ========== [Work In Progress] Unified shim allocator shim (Linux Desktop) NOT FOR REVIEW (but works) BUG=550886 ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and the an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and moving them to base. This CL does NOT introduce any actual change (yet) to production code. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ==========
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
Patchset #1 (id:60001) has been deleted
Patchset #1 (id:80001) has been deleted
Patchset #1 (id:100001) has been deleted
Patchset #1 (id:120001) has been deleted
Patchset #1 (id:140001) has been deleted
Patchset #1 (id:160001) has been deleted
Patchset #1 (id:180001) has been deleted
Patchset #1 (id:200001) has been deleted
Patchset #1 (id:220001) has been deleted
Patchset #1 (id:240001) has been deleted
Patchset #1 (id:260001) has been deleted
Patchset #1 (id:280001) has been deleted
Patchset #1 (id:300001) has been deleted
Patchset #1 (id:320001) has been deleted
Patchset #1 (id:340001) has been deleted
Patchset #1 (id:360001) has been deleted
Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and the an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and moving them to base. This CL does NOT introduce any actual change (yet) to production code. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and the an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ==========
primiano@chromium.org changed reviewers: + thakis@chromium.org, wfh@chromium.org
Well this is going to be .... fun! Thanks in advance. https://codereview.chromium.org/1675143004/diff/420001/base/allocator/allocat... File base/allocator/allocator_shim.cc (right): https://codereview.chromium.org/1675143004/diff/420001/base/allocator/allocat... base/allocator/allocator_shim.cc:86: dispatch->next = g_chain_head; Ok, this I think is going to be the most controversial part of the CL. The overall good news is that in production code this is never going to be hit, this is really to install the tracing hooks. But it's not a good reason for not making it future-proof. I know that here I am plain on danger zone as: - writing pointers is not guaranteed to be atomic (but AFAIK they actually are on all the platforms we care about) - There is a memory barrier betweens the writes here, but no barrier on the read. My rationale here is that the readers has a data dependency: you need to get the head pointer (g_chain_head) in order to get the next pointer, so if they are written in order, it shouldn't be possible for the hw itself to read them in the reverse order. - my biggest doubt is about compiler optimizations: the lack of read barriers might allow the compiler to reorder the reads. the read here is just dereferencing g_chain_head but here my knowledge on what the compiler might do or not decays here.
+chrisha as follow-up to the SS16 chat.
Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and the an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ==========
Nice design doc and CL description! I'm a bit worried about a) the amount of machinery this adds and b) that we might no longer be able to wrap malloc and friends in 2015 (might be worth talking to scottmg what his plans here were) which might render this whole scheme less useful. But you removed a lot of stuff in preparation for this, and once it's all said and done there's more stuff to remove, so I think it's overall ok. In your research, have you looked at blink's oilpan allocator and blink's partition alloc at all? I guess the plan is to wrap those separately? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.gn File base/allocator/BUILD.gn (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g... base/allocator/BUILD.gn:307: # Linux/CrOS only. http://crbug.com/550886 . remove one of the two "only" https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g... base/allocator/BUILD.gn:308: configs += [ "//base:base_implementation" ] # for BASE_EXPORT give this visibility = [ ":base" ] too, to make it harder to misuse this. (else, e.g. //ui/gfx might accidentally add an explicit dep on this target and then it'll export base symbols. gyp doesn't have a protection against this type of mistake) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:27: const allocator::AllocatorDispatch* volatile g_chain_head = why volatile? should this be an atomicptr? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:37: if (thread_id == kInvalidThreadId) This looks racy for the first few calls. Does it matter? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:66: return true; // Assume the new_handler will abort if it fails. This should probably mention that we don't want new handlers that throw bad_alloc (no exceptions etc) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:103: // THe Shim* functions below are the entry-points into the shim-layer and s/THe/The/ https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:125: } while (!ptr && CallNewHandler()); Do we currently have new handlers that try to free up memory and then call again? Is this loop really needed? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:147: ptr = chain_head->alloc_zero_initialized_function(n, size, chain_head); Since the wrapper is called calloc and the c function is called calloc, maybe call the member fn calloc too? (as-is is fine too i suppose; calloc isn't a great name :-P) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:186: return ShimMemalign(GetPageSize(), size); If this doesn't call through to valloc, why have this as part of this interface? do we call valloc anywhere? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const AllocatorDispatch* self); why make self the last arg? mention that these functions must all be thread-safe. are there any performance concerns about the indirection? with the default setup, it's the same number of indirections as previously, right? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:82: BASE_EXPORT void InsertAllocatorDispatch(AllocatorDispatch* dispatch); removing a dispatch object is not possible? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_glibc.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_glibc.cc:17: } // extern "C" do we ever want to call glibc instead of tcmalloc? https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim_unittest.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_unittest.cc:285: // paths of allocator_shim.cc and smoke-test its thread safey. nice! https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_unittest.cc:295: WaitableEvent event(true /* manual_reset */, false /* initially_signaled */); nit: use /*manual_reset=*/true, /*initialliy_signaled=*/false With `false /* do thing */` it's not clear if "do thing" is set to false (i.e. "don't do thing") or if false means "do thing". /*do thing=*/false makes this clear. https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator... File build/config/allocator.gni (right): https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator... build/config/allocator.gni:27: assert(!use_experimental_allocator_shim || is_linux || is_nacl, why is this needed?
Super thanks for the review. Will get to the actual inline comments later today, in the meantime addressing top-level comments. > I'm a bit worried about a) the amount of machinery this adds Hmm I think I know what you mean. Even though, most of this CL is really more moving the machinery around, not adding. I think the bad part of this CL is bringing to the light the dark things happening today in the tcmalloc dungeons (but ... either there or here we have them). Honestly don't have any too many ideas on how to deal differently with this. The only other option I see is to leave things as they are and keep following today's pattern, which concretely means that things like the heap profiler will spray yet another layer of #ifdef (TCMALLOC), #if OS(WINDOWS), #if win_allocator_shim all over .cc files and build files. And having to maintain decentralized logic makes me a bit nervous. I'd honestly prefer the pattern: make lot of changes while trying to (functionally) change nothing, and make the extra functionality change a +few lines patch. It should be easier to spot things going wrong without adding any feature. Ditto for chrisha who wants to add a similar interception layer (For SyzyAsan I suppose), which will make maintaining the (Existing) shim + heap profiler + syzyasan ... fun. > and b) that we might no longer be able to wrap malloc and friends in 2015 (might be worth talking to scottmg what his plans here were) which might render this whole scheme less useful. Let's pull +scottmg in the discussion. My understanding from talking with wfh@ and crisha@ last week is that we are supposed to still wrap malloc and friends in VS2015. The difference is that prior to VS2015 we were achieving this by stripping the libcmt. In VS2015 malloc & friends are weak and we are supposed to deal differently, but still override, them (read: the stripping trick does not work anymore). My understanding is that the absence of the shim in VS2015 is temporary (to make VS2015 itself stick) and we should not reach the branch point without the new shim. I believe that wfh@ had some plans for it. scottmg/wfg@ any idea?
On 2016/03/08 11:55:05, Primiano wrote: > Super thanks for the review. > Will get to the actual inline comments later today, in the meantime addressing > top-level comments. > > > I'm a bit worried about a) the amount of machinery this adds > Hmm I think I know what you mean. Even though, most of this CL is really more > moving the machinery around, not adding. > I think the bad part of this CL is bringing to the light the dark things > happening today in the tcmalloc dungeons (but ... either there or here we have > them). > Honestly don't have any too many ideas on how to deal differently with this. > The only other option I see is to leave things as they are and keep following > today's pattern, which concretely means that things like the heap profiler will > spray yet another layer of #ifdef (TCMALLOC), #if OS(WINDOWS), #if > win_allocator_shim all over .cc files and build files. And having to maintain > decentralized logic makes me a bit nervous. > I'd honestly prefer the pattern: make lot of changes while trying to > (functionally) change nothing, and make the extra functionality change a +few > lines patch. It should be easier to spot things going wrong without adding any > feature. > Ditto for chrisha who wants to add a similar interception layer (For SyzyAsan I > suppose), which will make maintaining the (Existing) shim + heap profiler + > syzyasan ... fun. > > > and b) that we might no longer be able to wrap malloc and friends in 2015 > (might be worth > talking to scottmg what his plans here were) which might render this whole > scheme less useful. > > Let's pull +scottmg in the discussion. > My understanding from talking with wfh@ and crisha@ last week is that we are > supposed to still wrap malloc and friends in VS2015. > The difference is that prior to VS2015 we were achieving this by stripping the > libcmt. In VS2015 malloc & friends are weak and we are supposed to deal > differently, but still override, them (read: the stripping trick does not work > anymore). > My understanding is that the absence of the shim in VS2015 is temporary (to make > VS2015 itself stick) and we should not reach the branch point without the new > shim. I believe that wfh@ had some plans for it. > scottmg/wfg@ any idea? Yes, we still need to support limiting allocations, so we need to wrap (somehow) on 2015. (I haven't looked through this CL to know if you really care about that or not yet.)
Thanks for the review, really! New patchset + comments inline. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.gn File base/allocator/BUILD.gn (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g... base/allocator/BUILD.gn:307: # Linux/CrOS only. http://crbug.com/550886 . On 2016/03/08 03:08:42, Nico wrote: > remove one of the two "only" But it was so symmetric :) Done. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g... base/allocator/BUILD.gn:308: configs += [ "//base:base_implementation" ] # for BASE_EXPORT On 2016/03/08 03:08:41, Nico wrote: > give this > > visibility = [ ":base" ] > > too, to make it harder to misuse this. (else, e.g. //ui/gfx might accidentally > add an explicit dep on this target and then it'll export base symbols. gyp > doesn't have a protection against this type of mistake) TIL visibility in GN. Done. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:27: const allocator::AllocatorDispatch* volatile g_chain_head = Removed volatile and switched to AtomicWord after lot of thoughts. Here's the braindump. I originally wrote the reply below because I was convinced I still needed volatile. By the time I reached the end I changed my mind (in essence: this is my rationale. If you don't like I have another, better, one :P): On 2016/03/08 03:08:42, Nico wrote: > why volatile? Now, this is the real reason why I waited for a compiler-master-of-puppets on this cl :-) So my rationale for the volatile here is: - InsertAllocationDispatch is, by design, guaranteed to be called on the same thread, so I don't have to worry about races on insertion (good). - However, I want to guarantee that when you InsertAllocationDispatch from thread X, thread Y does always see the sequence of [update next pointer, update chain head] in this order, as that guarantees that Y either sees the old chain head, or the new chain head properly setup. Now I am convinced that from an OOO execution viewpoint, the MemoryBarrier in InsertAllocatorDispatch is enough (The dispatch functions are supposed to be static and indefinitely-lived anyways, so I don't think I need any happens-before semantic w.r.t. the Insert). However, I don't know if the compiler can do some unexpected optimization when accessing g_chain_head. Not even sure which optimizations we are talking about, but in my naive mind, volatile is a way to tell the compiler "mind your business with this var". > should this be an atomicptr? Hmm is your suggestion that this should be volatile AND atomicptr, or atomicptr should be enough. I speculated on the fact that on all architectures we care about, writing a pointer word is intrinstically atomic. But now that you let me think more about this, you are right, I should have used AtomicWord (where NoBarrier_Load/Store really end up just dereferencing a pointer anyways IIRC). My question on volatile still remains though. Having established this should be AtomicWord, should this be Atomic AND volatile, or am I over-worrying about the volatile part? Edit 1h later: just looked at the code and it seems that NoBarrier_Load/Store have volatile in the signature, so I believe that having this switched to AtomicPtr catches three birds with a stone: - I don't have to speculate anymore about r/w ptr-size being intrinsically atomic. - I think I don't need a volatile anymore at this point. - I have the unique opportunity to make a nerdy pun with birds and volatile :-) <img src="self_high_five.gif"> https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:37: if (thread_id == kInvalidThreadId) On 2016/03/08 03:08:42, Nico wrote: > This looks racy for the first few calls. Does it matter? Oh you are right. fail. I made a new version using a CAS, which if I am reasoning correctly, should guarantee that in case of a race one of the two threads will hit the DCHECK. P.S. If my reasoning is correct, we should probably benchmark and update thread_checker_impl.cc. I suspect a single CAS is more efficient than a lock acquire + release as in thread_checker_impl.cc, but right now have no data to back this hypothesis. In my case I really don't have alternatives as I can't afford a base::Lock here :) but maybe we just found a way to make debug builds slightly faster. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:66: return true; // Assume the new_handler will abort if it fails. On 2016/03/08 03:08:42, Nico wrote: > This should probably mention that we don't want new handlers that throw > bad_alloc (no exceptions etc) Oh right. (std::new_handler is such an utopistic API btw.. "The new-handler function may try to make more storage available" yeahhh) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:103: // THe Shim* functions below are the entry-points into the shim-layer and On 2016/03/08 03:08:42, Nico wrote: > s/THe/The/ Done. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:125: } while (!ptr && CallNewHandler()); On 2016/03/08 03:08:42, Nico wrote: > Do we currently have new handlers that try to free up memory and then call > again? Is this loop really needed? both tcmalloc [1] and the win_shim [2] do this. I think the rationale is: in theory a new handler could return true to say "I freed up some memory, trust me, try again and you will get !=nullptr this time, pinky swear" Realistically the handler will just suicide. If that doesn't happen, the most reasonable thing we can do here, to comply both with the utopistic world (new handler will free memory, ha!) API and safety (never return null) is keep looping until some memory becomes available. From a more practical viewpoint, I like to stick the current behavior, regardless of the reason, I feel this CL is adventurous enough :) [1] https://chromium.googlesource.com/chromium/src/+/f7b03f4b26ee936818a896f244d5... [2] https://chromium.googlesource.com/chromium/src/+/f7b03f4b26ee936818a896f244d5... https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:147: ptr = chain_head->alloc_zero_initialized_function(n, size, chain_head); On 2016/03/08 03:08:42, Nico wrote: > Since the wrapper is called calloc and the c function is called calloc, maybe > call the member fn calloc too? (as-is is fine too i suppose; calloc isn't a > great name :-P) Yeah I think this is mostly driven by: - my long standing hate towards calloc (both the name and the silly two-arg signature). - a headache I had while developing this, when I typoed /calloc/alloc/ and had to blur my eyes to figure out the missing c. No strong opinion, can do this change it in the final pathcset once everything is clear. Up to your preference really. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.cc:186: return ShimMemalign(GetPageSize(), size); On 2016/03/08 03:08:42, Nico wrote: > If this doesn't call through to valloc, why have this as part of this interface? The deal is: we want to expose the full malloc/valloc family as we want to be sure that everything reaches into the same allocator at the end. We don't want to end up in a situation where malloc/free are from tcmalloc, and valloc from libc. However, while we want to *expose* valloc, there is no need to plumb it all the way through the allocator (read: require that from the final *dispatch* interface) as valloc can be really implemented on top of memalign. Essentially there is a mismatching between the number of functions we expose to clients (malloc/free/valloc/pvalloc) and the internal functions required from the dispatchers (the AllocatorDispatch interface). The mismatch is: the minimum set of functions (required from the dispatcher) that allows to implement the full malloc/valloc/pvalloc family consistently. > do we call valloc anywhere? Not sure we have it in the codebase, but we have to cover the case of chrome calling a third_party library (e.g. libcups) which calls it, in which case we want that to be still hooked properly. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const AllocatorDispatch* self); On 2016/03/08 03:08:42, Nico wrote: > why make self the last arg? Is the question: Q1) why having |self| at all? Q2) why having |self| instead of |next|? Q3) why having |self| as last arg and not first? A1) Because we want to give the dispatch the ability to chain to the next element in the chain, snoop the result, and proxy it back. A2) To avoid the useless indirection in the case of the default chain. In that case |next| is always nullptr, and the default dispatchers don't need to look at |next| at all, so why bothering dereferencing that for them no reason? A3) No reason, can move it if you think it's more readable (or there is some other benefit I am missing). > mention that these functions must all be thread-safe. Done. > are there any performance concerns about the indirection? I will find out soon (I did run some trybots two weeks ago, but forgot and lost the results). Worst case the perf waterfall will tell. > with the default setup, it's the same number of indirections as previously, right? Correct. I don't expect any per hit (last famous words) as, as you say, we already have these layers of indirection in the current code. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:82: BASE_EXPORT void InsertAllocatorDispatch(AllocatorDispatch* dispatch); On 2016/03/08 03:08:42, Nico wrote: > removing a dispatch object is not possible? I removed it after a discussion that with chrisha: - none of us has a real use case for the removal. for both of us (future clients of this API) the decision to install is taken at startup time (either command-line or finch). - doing a sane removal (sane: of an arbitrary dispatch) on a singly linked list will make the code funky, and at that point the malloc() needs a lock, which I'd really really try to avoid (that would be a candidate for a perf hit). So, for the tracing case, I perfer adding an if(should_noop) to the dispatch that will be inserted by --enable-heap-profiling. Rationale: that will make the heap profiler code slighly slower (still, no locks) but will keep the default production code lock-free in the malloc fastpath. The lock-free alternative would be RemoveFirstAllocatorDispatch, which is a bit of a awkward and potentially dangerous (you have to know for sure the registration order) API. Thinking more, I need a RemoveForTesing one, to ensure that the unittests always clean up after themselves. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_glibc.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_glibc.cc:17: } // extern "C" On 2016/03/08 03:08:42, Nico wrote: > do we ever want to call glibc instead of tcmalloc? That's the case for use_allocator=none. I am pretty confident there there is some bot/configuration who wants this out there (lesson learnt from the previous refactorings: all the build configs that you expect to not be used are covered by at least one bot you don't know about. All the configurations that you don't know about are covered by at least one bot that you don't expect... in the fyi waterfall... 1 week after landing :P) If I omit this, the base security unittests will fail without the shim when use_allocator=none. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim_unittest.cc (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_unittest.cc:285: // paths of allocator_shim.cc and smoke-test its thread safey. On 2016/03/08 03:08:42, Nico wrote: > nice! ;-) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim_unittest.cc:295: WaitableEvent event(true /* manual_reset */, false /* initially_signaled */); On 2016/03/08 03:08:42, Nico wrote: > nit: use > > /*manual_reset=*/true, /*initialliy_signaled=*/false > > With `false /* do thing */` it's not clear if "do thing" is set to false (i.e. > "don't do thing") or if false means "do thing". /*do thing=*/false makes this > clear. Ahhh. Suddenly all that code that previously looked inconsistent (I always saw a mixture of both in the codebase) makes sense. Done. https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator... File build/config/allocator.gni (right): https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator... build/config/allocator.gni:27: assert(!use_experimental_allocator_shim || is_linux || is_nacl, On 2016/03/08 03:08:42, Nico wrote: > why is this needed? Oh that didn't really work as expected, removing it. The full story is that nacl is special in ways I often don't understand. My plan for flipping the flag in the next CL, is to enable only for non-nacl as I don't see a compelling use case to have the shim on nacl (unless I am missing something). What I get when building with nacl and the shim is: ../../base/allocator/allocator_shim_override_cpp_symbols.h:20:25: error: function previously declared with an explicit exception specification redeclared with an implicit exception specification [-Werror,-Wimplicit-exception-spec-mismatch] SHIM_ALWAYS_EXPORT void operator delete(void* p) __THROW ^ /s/chrome-shim/src/native_client/toolchain/linux_x86/pnacl_newlib/le32-nacl/include/c++/v1/new:133:30: note: previous declaration is here _LIBCPP_NEW_DELETE_VIS void operator delete(void* __p) _NOEXCEPT; ^ So that thing was really a note-to-self to rule out nacl from the flag. But it seems that I get if I forget that error is enough of a reminder.
Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platform, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platforms, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op until use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ==========
A'ight, lgtm, let's see how this goes :-) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const AllocatorDispatch* self); On 2016/03/08 20:54:05, Primiano wrote: > On 2016/03/08 03:08:42, Nico wrote: > > why make self the last arg? > Is the question: > Q1) why having |self| at all? > Q2) why having |self| instead of |next|? > Q3) why having |self| as last arg and not first? > > A1) Because we want to give the dispatch the ability to chain to the next > element in the chain, snoop the result, and proxy it back. > A2) To avoid the useless indirection in the case of the default chain. In that > case |next| is always nullptr, and the default dispatchers don't need to look at > |next| at all, so why bothering dereferencing that for them no reason? > A3) No reason, can move it if you think it's more readable (or there is some > other benefit I am missing). Q3 – seems a bit more common (languages with explicit self have it first, and in c++ the implicit this is also the first arg). But up to you > > > > > mention that these functions must all be thread-safe. > Done. > > > > are there any performance concerns about the indirection? > I will find out soon (I did run some trybots two weeks ago, but forgot and lost > the results). Worst case the perf waterfall will tell. > > with the default setup, it's the same number of indirections as previously, > right? > Correct. I don't expect any per hit (last famous words) as, as you say, we > already have these layers of indirection in the current code.
https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat... base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const AllocatorDispatch* self); On 2016/03/09 05:34:25, Nico wrote: > On 2016/03/08 20:54:05, Primiano wrote: > > On 2016/03/08 03:08:42, Nico wrote: > > > why make self the last arg? > > Is the question: > > Q1) why having |self| at all? > > Q2) why having |self| instead of |next|? > > Q3) why having |self| as last arg and not first? > > > > A1) Because we want to give the dispatch the ability to chain to the next > > element in the chain, snoop the result, and proxy it back. > > A2) To avoid the useless indirection in the case of the default chain. In that > > case |next| is always nullptr, and the default dispatchers don't need to look > at > > |next| at all, so why bothering dereferencing that for them no reason? > > A3) No reason, can move it if you think it's more readable (or there is some > > other benefit I am missing). > > Q3 – seems a bit more common (languages with explicit self have it first, and in > c++ the implicit this is also the first arg). But up to you > > > > > > > > > > mention that these functions must all be thread-safe. > > Done. > > > > > > > are there any performance concerns about the indirection? > > I will find out soon (I did run some trybots two weeks ago, but forgot and > lost > > the results). Worst case the perf waterfall will tell. > > > with the default setup, it's the same number of indirections as previously, > > right? > > Correct. I don't expect any per hit (last famous words) as, as you say, we > > already have these layers of indirection in the current code. > Alright, point taken. Moved as 1st argument.
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1675143004/540001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1675143004/540001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/03/09 12:42:11, commit-bot: I haz the power wrote: > Dry run: This issue passed the CQ dry run. I did some telemetry and micro benchmarks here. telemetry benchmarks are pretty inconclusive due to the noise (mild shock). However I have some conclusive data from microbenchmarks. TL;DR This CL doesn't seem to show any perf-impact if crbug.com/593344 is fixed (base::NoBarrier_Load not being realy "NoBarrier" in clang+sysroot). (More details below) What should I do here? Option 1) Land this as-is. Hope that the NoBarrier perf bug is visible only in the microbenchmarks and gets lost in the noise in the waterfall. Once that bug will be fixed, all will be good. Option 2) Make a small tweak to GetChainHead() (which is the thing affected by NoBarrier_Load), as follows: ----- // TODO(primiano): this should be a NoBarrier_Load once crbug.com/593344 is fixed. inline const allocator::AllocatorDispatch* GetChainHead() { return reinterpret_cast<const allocator::AllocatorDispatch*>( *static_cast<const volatile subtle::AtomicWord*>(&g_chain_head)); } ---- Option 3) Do option1 and land the workaround of Option2 only if the perf waterfall shouts. Very tempted to go for option 3 Details about the benchmark --------------------------- I have a microbenchmark that spawns 8 threads, does malloc()+realloc() in a tight loop and measures the time (Taken from http://www.citi.umich.edu/projects/linux-scalability/reports/malloc.html) The microbenchmark shows no measurable diff with and without this patch if GetChainHead() is really unbarriered (i.e. if I do my Option 2 hack). With crbug.com/593344, instead, the microbenchmark becomes 2.2x slower (I'm not entirely surprised, that's really what you buy from an x86 arch, an OOO execution engine on steroids and 20years of conflictual evolution of its memory model. If you start stalling its pipeline you might as well just get a RISC and add some resistors to match the power budget)
I'd do 1 and reconsider if there are problems.
On 2016/03/09 18:38:07, Nico wrote: > I'd do 1 and reconsider if there are problems. Which is Option 3 :) SGTM. Landing in 3.. 2.. 1..
The CQ bit was checked by primiano@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thakis@chromium.org Link to the patchset: https://codereview.chromium.org/1675143004/#ps540001 (title: "Make self the 1st arg")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1675143004/540001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1675143004/540001
Message was sent while issue was closed.
Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platforms, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op until use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platforms, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op until use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ==========
Message was sent while issue was closed.
Committed patchset #9 (id:540001)
Message was sent while issue was closed.
Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platforms, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op until use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* ========== to ========== Allocator shim skeleton + Linux impl behind a build flag TL;DR ----- This CL introduces the skeleton for the unified allocator shim and an actual working implementation for Linux. The Linux implementation is really just taking the headers that today define the malloc symbols in tcmalloc and copying them to base. All the changes introduced are conditioned behind the build-time flag use_experimental_allocator_shim, which is disabled by default. Background Context ------------------ There are two reasons why we want to intercept allocations in Chrome: 1) To enforce some security properties (suicide on malloc failure via std::new_handler, preventing allocations > 2GB which can trigger signed vs. unsigned bugs in third_party libraries). 2) For diagnostic purposes (heap profiling). Today (before this CL) allocation calls are already intercepted in most Chrome platforms, but that happens in a disorganized and inconsistent fashion. More in details: On Linux: TCMalloc redefines the malloc()/new() symbols and we added code to the internal fork of tcmalloc to achieve 1). On Mac: we inject our hooks in the default malloc zone in EnableTerminationOnOutOfMemory() (memory_mac.mm) On Windows: we strip the malloc symbols out of libcmt and re-define them in allocator_shim_win.cc On Android: we don't have 1) The purpose of this work is to refactor and uniform this. The goal is to have all the OS call into a single module (herein allocator_shim.h) which performs 1) in base (as opposite to forking allocator-specific code) and which offers a uniform interface for 2). Why is this good? ----------------- - Makes the allocator code more readable. - Allows to unfork code from tcmalloc. - Allows to design a more maintainable heap profiler, which shares the same code paths of the security enforcement layer. Notes about execution --------------------- Essentially on each platform we need to do three things: - Dismantle the code that defines the malloc()/new symbols. - Let the malloc()/new symbols route through the shim. - Have the shim ultimately invoke the actual allocator (tcmalloc, winheap, system allocator) as defined by the build config. This clearly cannot happen atomically in one CL. The plan, therefore, is to make the above happen behind a build-time flag (use_experimental_allocator_shim), so the shim can be easily tested / rolled-back in case of failures, and ultimately drop the build-time flag (and remove the dead cde) once everything works. This also will make dealing with reverts very manageable. Therefore this CL (and likely all the future ones) is meant to be a no-op until use_experimental_allocator_shim==true. Design doc: bit.ly/allocator-shim BUG=550886 TEST=build with use_experimental_allocator_shim=true, base_unittests --gtest_filter=AllocatorShimTest.* Committed: https://crrev.com/4e68ed2d51f897d12a570e16f7adbcb7595fa031 Cr-Commit-Position: refs/heads/master@{#380196} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/4e68ed2d51f897d12a570e16f7adbcb7595fa031 Cr-Commit-Position: refs/heads/master@{#380196} |