Issue 1675143004: Allocator shim skeleton + Linux impl behind a build flag

Primiano Tucci (use gerrit)

Description was changed from ========== NOT FOR REVIEW: WIP on unified shim BUG=550886 ========== to ...

4 years, 10 months ago (2016-02-11 14:55:13 UTC) #1

Primiano Tucci (use gerrit)

Description was changed from ========== [Work In Progress] Unified shim allocator shim (Linux Desktop) NOT ...

4 years, 10 months ago (2016-02-12 16:05:46 UTC) #2

Description was changed from

==========
[Work In Progress] Unified shim allocator shim (Linux Desktop)

NOT FOR REVIEW (but works)

BUG=550886
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
the an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and moving them to base.
This CL does NOT introduce any actual change (yet) to production code.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

Primiano Tucci (use gerrit)

Patchset #1 (id:20001) has been deleted

4 years, 10 months ago (2016-02-16 18:08:39 UTC) #4

Primiano Tucci (use gerrit)

Patchset #1 (id:40001) has been deleted

4 years, 10 months ago (2016-02-16 18:08:43 UTC) #5

Primiano Tucci (use gerrit)

Patchset #1 (id:60001) has been deleted

4 years, 10 months ago (2016-02-16 18:08:48 UTC) #6

Primiano Tucci (use gerrit)

Patchset #1 (id:80001) has been deleted

4 years, 10 months ago (2016-02-16 18:08:54 UTC) #7

Primiano Tucci (use gerrit)

Patchset #1 (id:100001) has been deleted

4 years, 10 months ago (2016-02-16 18:09:00 UTC) #8

Primiano Tucci (use gerrit)

Patchset #1 (id:120001) has been deleted

4 years, 10 months ago (2016-02-16 18:09:06 UTC) #9

Primiano Tucci (use gerrit)

Patchset #1 (id:140001) has been deleted

4 years, 10 months ago (2016-02-16 18:09:11 UTC) #10

Primiano Tucci (use gerrit)

Patchset #1 (id:160001) has been deleted

4 years, 10 months ago (2016-02-16 18:10:34 UTC) #11

Primiano Tucci (use gerrit)

Patchset #1 (id:180001) has been deleted

4 years, 10 months ago (2016-02-16 18:10:42 UTC) #12

Primiano Tucci (use gerrit)

Patchset #1 (id:200001) has been deleted

4 years, 10 months ago (2016-02-16 18:10:48 UTC) #13

Primiano Tucci (use gerrit)

Patchset #1 (id:220001) has been deleted

4 years, 10 months ago (2016-02-16 18:10:52 UTC) #14

Primiano Tucci (use gerrit)

Patchset #1 (id:240001) has been deleted

4 years, 10 months ago (2016-02-16 18:10:57 UTC) #15

Primiano Tucci (use gerrit)

Patchset #1 (id:260001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:04 UTC) #16

Primiano Tucci (use gerrit)

Patchset #1 (id:280001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:10 UTC) #17

Primiano Tucci (use gerrit)

Patchset #1 (id:300001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:18 UTC) #18

Primiano Tucci (use gerrit)

Patchset #1 (id:320001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:24 UTC) #19

Primiano Tucci (use gerrit)

Patchset #1 (id:340001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:31 UTC) #20

Primiano Tucci (use gerrit)

Patchset #1 (id:360001) has been deleted

4 years, 10 months ago (2016-02-16 18:11:38 UTC) #21

Primiano Tucci (use gerrit)

Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag ...

4 years, 10 months ago (2016-02-16 21:14:48 UTC) #22

Description was changed from

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
the an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and moving them to base.
This CL does NOT introduce any actual change (yet) to production code.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
the an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

Primiano Tucci (use gerrit)

primiano@chromium.org changed reviewers: + thakis@chromium.org, wfh@chromium.org

4 years, 10 months ago (2016-02-16 22:58:05 UTC) #23

Primiano Tucci (use gerrit)

Well this is going to be .... fun! Thanks in advance. https://codereview.chromium.org/1675143004/diff/420001/base/allocator/allocator_shim.cc File base/allocator/allocator_shim.cc (right): ...

4 years, 10 months ago (2016-02-16 22:58:06 UTC) #24

Primiano Tucci (use gerrit)

+chrisha as follow-up to the SS16 chat.

4 years, 9 months ago (2016-02-29 20:55:04 UTC) #25

Nico

Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag ...

4 years, 9 months ago (2016-03-08 02:28:59 UTC) #26

Description was changed from

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
the an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

Nico

Nice design doc and CL description! I'm a bit worried about a) the amount of ...

4 years, 9 months ago (2016-03-08 03:08:42 UTC) #27

Primiano Tucci (use gerrit)

Super thanks for the review. Will get to the actual inline comments later today, in ...

4 years, 9 months ago (2016-03-08 11:55:05 UTC) #28

Super thanks for the review.
Will get to the actual inline comments later today, in the meantime addressing
top-level comments.

> I'm a bit worried about a) the amount of machinery this adds
Hmm I think I know what you mean. Even though, most of this CL is really more
moving the machinery around, not adding.
I think the bad part of this CL is bringing to the light the dark things
happening today in the tcmalloc dungeons (but ... either there or here we have
them).
Honestly don't have any too many ideas on how to deal differently with this.
The only other option I see is to leave things as they are and keep following
today's pattern, which concretely means that things like the heap profiler will
spray yet another layer of #ifdef (TCMALLOC), #if OS(WINDOWS), #if
win_allocator_shim all over .cc files and build files. And having to maintain
decentralized logic makes me a bit nervous.
I'd honestly prefer the pattern: make lot of changes while trying to
(functionally) change nothing, and make the extra functionality change a +few
lines patch. It should be easier to spot things going wrong without adding any
feature.
Ditto for chrisha who wants to add a similar interception layer (For SyzyAsan I
suppose), which will make maintaining the (Existing) shim + heap profiler +
syzyasan ... fun.

> and b) that we might no longer be able to wrap malloc and friends in 2015
(might be worth
talking to scottmg what his plans here were) which might render this whole
scheme less useful. 

Let's pull +scottmg in the discussion.
My understanding from talking with wfh@ and crisha@ last week is that we are
supposed to still wrap malloc and friends in VS2015.
The difference is that prior to VS2015 we were achieving this by stripping the
libcmt. In VS2015 malloc & friends are weak and we are supposed to deal
differently, but still override, them (read: the stripping trick does not work
anymore).
My understanding is that the absence of the shim in VS2015 is temporary (to make
VS2015 itself stick) and we should not reach the branch point without the new
shim. I believe that wfh@ had some plans for it.
scottmg/wfg@ any idea?

scottmg

On 2016/03/08 11:55:05, Primiano wrote: > Super thanks for the review. > Will get to ...

4 years, 9 months ago (2016-03-08 17:02:28 UTC) #29

On 2016/03/08 11:55:05, Primiano wrote:
> Super thanks for the review.
> Will get to the actual inline comments later today, in the meantime addressing
> top-level comments.
> 
> > I'm a bit worried about a) the amount of machinery this adds
> Hmm I think I know what you mean. Even though, most of this CL is really more
> moving the machinery around, not adding.
> I think the bad part of this CL is bringing to the light the dark things
> happening today in the tcmalloc dungeons (but ... either there or here we have
> them).
> Honestly don't have any too many ideas on how to deal differently with this.
> The only other option I see is to leave things as they are and keep following
> today's pattern, which concretely means that things like the heap profiler
will
> spray yet another layer of #ifdef (TCMALLOC), #if OS(WINDOWS), #if
> win_allocator_shim all over .cc files and build files. And having to maintain
> decentralized logic makes me a bit nervous.
> I'd honestly prefer the pattern: make lot of changes while trying to
> (functionally) change nothing, and make the extra functionality change a +few
> lines patch. It should be easier to spot things going wrong without adding any
> feature.
> Ditto for chrisha who wants to add a similar interception layer (For SyzyAsan
I
> suppose), which will make maintaining the (Existing) shim + heap profiler +
> syzyasan ... fun.
> 
> > and b) that we might no longer be able to wrap malloc and friends in 2015
> (might be worth
> talking to scottmg what his plans here were) which might render this whole
> scheme less useful. 
> 
> Let's pull +scottmg in the discussion.
> My understanding from talking with wfh@ and crisha@ last week is that we are
> supposed to still wrap malloc and friends in VS2015.
> The difference is that prior to VS2015 we were achieving this by stripping the
> libcmt. In VS2015 malloc & friends are weak and we are supposed to deal
> differently, but still override, them (read: the stripping trick does not work
> anymore).
> My understanding is that the absence of the shim in VS2015 is temporary (to
make
> VS2015 itself stick) and we should not reach the branch point without the new
> shim. I believe that wfh@ had some plans for it.
> scottmg/wfg@ any idea?

Yes, we still need to support limiting allocations, so we need to wrap (somehow)
on 2015. (I haven't looked through this CL to know if you really care about that
or not yet.)

Primiano Tucci (use gerrit)

Thanks for the review, really! New patchset + comments inline. https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.gn File base/allocator/BUILD.gn (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.gn#newcode307 ...

4 years, 9 months ago (2016-03-08 20:54:05 UTC) #30

Thanks for the review, really!
New patchset + comments inline.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.gn
File base/allocator/BUILD.gn (right):

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g...
base/allocator/BUILD.gn:307: # Linux/CrOS only. http://crbug.com/550886 .
On 2016/03/08 03:08:42, Nico wrote:
> remove one of the two "only"

But it was so symmetric :)
Done.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/BUILD.g...
base/allocator/BUILD.gn:308: configs += [ "//base:base_implementation" ]  # for
BASE_EXPORT
On 2016/03/08 03:08:41, Nico wrote:
> give this
> 
>     visibility = [ ":base" ]
> 
> too, to make it harder to misuse this. (else, e.g. //ui/gfx might accidentally
> add an explicit dep on this target and then it'll export base symbols. gyp
> doesn't have a protection against this type of mistake)

TIL visibility in GN. Done.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
File base/allocator/allocator_shim.cc (right):

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:27: const allocator::AllocatorDispatch*
volatile g_chain_head =
Removed volatile and switched to AtomicWord after lot of thoughts.
Here's the braindump.
I originally wrote the reply below because I was convinced I still needed
volatile. By the time I reached the end I changed my mind (in essence: this is
my rationale. If you don't like I have another, better, one :P):

On 2016/03/08 03:08:42, Nico wrote:
> why volatile?

Now, this is the real reason why I waited for a compiler-master-of-puppets on
this cl :-)
So my rationale for the volatile here is:
 - InsertAllocationDispatch is, by design, guaranteed to be called on the same
thread, so I don't have to worry about races on insertion (good).
 - However, I want to guarantee that when you InsertAllocationDispatch from
thread X, thread Y does always see the sequence of [update next pointer, update
chain head] in this order, as that guarantees that Y either sees the old chain
head, or the new chain head properly setup.

Now I am convinced that from an OOO execution viewpoint, the MemoryBarrier in
InsertAllocatorDispatch is enough (The dispatch functions are supposed to be
static and indefinitely-lived anyways, so I don't think I need any
happens-before semantic w.r.t. the Insert).
However, I don't know if the compiler can do some unexpected optimization when
accessing g_chain_head.
Not even sure which optimizations we are talking about, but in my naive mind,
volatile is a way to tell the compiler "mind your business with this var".

> should this be an atomicptr?
Hmm is your suggestion that this should be volatile AND atomicptr, or atomicptr
should be enough.
I speculated on the fact that on all architectures we care about, writing a
pointer word is intrinstically atomic. But now that you let me think more about
this, you are right, I should have used AtomicWord (where NoBarrier_Load/Store
really end up just dereferencing a pointer anyways IIRC).

My question on volatile still remains though. Having established this should be
AtomicWord, should this be Atomic AND volatile, or am I over-worrying about the
volatile part?

Edit 1h later: just looked at the code and it seems that NoBarrier_Load/Store
have volatile in the signature, so I believe that having this switched to
AtomicPtr catches three birds with a stone:
 - I don't have to speculate anymore about r/w ptr-size being intrinsically
atomic.
 - I think I don't need a volatile anymore at this point.
 - I have the unique opportunity to make a nerdy pun with birds and volatile :-)
<img src="self_high_five.gif">

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:37: if (thread_id == kInvalidThreadId)
On 2016/03/08 03:08:42, Nico wrote:
> This looks racy for the first few calls. Does it matter?

Oh you are right. fail.
I made a new version using a CAS, which if I am reasoning correctly, should
guarantee that in case of a race one of the two threads will hit the DCHECK.

P.S. If my reasoning is correct, we should probably benchmark and update
thread_checker_impl.cc. I suspect a single CAS is more efficient than a lock
acquire + release as in  thread_checker_impl.cc, but right now have no data to
back this hypothesis.
In my case I really don't have alternatives as I can't afford a base::Lock here
:)
but maybe we just found a way to make debug builds slightly faster.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:66: return true;  // Assume the new_handler
will abort if it fails.
On 2016/03/08 03:08:42, Nico wrote:
> This should probably mention that we don't want new handlers that throw
> bad_alloc (no exceptions etc)
Oh right. (std::new_handler is such an utopistic API btw.. "The new-handler
function may try to make more storage available" yeahhh)

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:103: // THe Shim* functions below are the
entry-points into the shim-layer and
On 2016/03/08 03:08:42, Nico wrote:
> s/THe/The/

Done.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:125: } while (!ptr && CallNewHandler());
On 2016/03/08 03:08:42, Nico wrote:
> Do we currently have new handlers that try to free up memory and then call
> again? Is this loop really needed? 

both tcmalloc [1] and the win_shim [2] do this.
I think the rationale is: in theory a new handler could return true to say "I
freed up some memory, trust me, try again and you will get !=nullptr this time,
pinky swear"

Realistically the handler will just suicide. If that doesn't happen, the most
reasonable thing we can do here, to comply both with the utopistic world (new
handler will free memory, ha!) API and safety (never return null) is keep
looping until some memory becomes available.

From a more practical viewpoint, I like to stick the current behavior,
regardless of the reason, I feel this CL is adventurous enough :)

[1]
https://chromium.googlesource.com/chromium/src/+/f7b03f4b26ee936818a896f244d5...
[2]
https://chromium.googlesource.com/chromium/src/+/f7b03f4b26ee936818a896f244d5...

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:147: ptr =
chain_head->alloc_zero_initialized_function(n, size, chain_head);
On 2016/03/08 03:08:42, Nico wrote:
> Since the wrapper is called calloc and the c function is called calloc, maybe
> call the member fn calloc too? (as-is is fine too i suppose; calloc isn't a
> great name :-P)

Yeah I think this is mostly driven by:
 - my long standing hate towards calloc (both the name and the silly two-arg
signature). 
 - a headache I had while developing this, when I typoed /calloc/alloc/ and had
to blur my eyes to figure out the missing c.

No strong opinion, can do this change it in the final pathcset once everything
is clear. Up to your preference really.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.cc:186: return ShimMemalign(GetPageSize(), size);
On 2016/03/08 03:08:42, Nico wrote:
> If this doesn't call through to valloc, why have this as part of this
interface?
The deal is: we want to expose the full malloc/valloc family as we want to be
sure that everything reaches into the same allocator at the end. We don't want
to end up in a situation where malloc/free are from tcmalloc, and valloc from
libc.
However, while we want to *expose* valloc, there is no need to plumb it all the
way through the allocator (read: require that from the final *dispatch*
interface) as valloc can be really implemented on top of memalign.

Essentially there is a mismatching between the number of functions we expose to
clients (malloc/free/valloc/pvalloc) and the internal functions required from
the dispatchers (the AllocatorDispatch interface).
The mismatch is: the minimum set of functions (required from the dispatcher)
that allows to implement the full malloc/valloc/pvalloc family consistently.

> do we call valloc anywhere?
Not sure we have it in the codebase, but we have to cover the case of chrome
calling a third_party library (e.g. libcups) which calls it, in which case we
want that to be still hooked properly.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
File base/allocator/allocator_shim.h (right):

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const
AllocatorDispatch* self);
On 2016/03/08 03:08:42, Nico wrote:
> why make self the last arg?
Is the question:
Q1) why having |self| at all?
Q2) why having |self| instead of |next|?
Q3) why having |self| as last arg and not first?

A1) Because we want to give the dispatch the ability to chain to the next
element in the chain, snoop the result, and proxy it back.
A2) To avoid the useless indirection in the case of the default chain. In that
case |next| is always nullptr, and the default dispatchers don't need to look at
|next| at all, so why bothering dereferencing that for them no reason?
A3) No reason, can move it if you think it's more readable (or there is some
other benefit I am missing).

> mention that these functions must all be thread-safe.
Done.

> are there any performance concerns about the indirection?
I will find out soon (I did run some trybots two weeks ago, but forgot and lost
the results). Worst case the perf waterfall will tell.
> with the default setup, it's the same number of indirections as previously,
right?
Correct. I don't expect any per hit (last famous words) as, as you say, we
already have these layers of indirection in the current code.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim.h:82: BASE_EXPORT void
InsertAllocatorDispatch(AllocatorDispatch* dispatch);
On 2016/03/08 03:08:42, Nico wrote:
> removing a dispatch object is not possible?

I removed it after a discussion that with chrisha:
- none of us has a real use case for the removal. for both of us (future clients
of this API) the decision to install is taken at startup time (either
command-line or finch).
- doing a sane removal (sane: of an arbitrary dispatch) on a singly linked list
will make the code funky, and at that point the malloc() needs a lock, which I'd
really really try to avoid (that would be a candidate for a perf hit).

So, for the tracing case, I perfer adding an if(should_noop) to the dispatch
that will be inserted by --enable-heap-profiling. Rationale: that will make the
heap profiler code slighly slower (still, no locks) but will keep the default
production code lock-free in the malloc fastpath.

The lock-free alternative would be RemoveFirstAllocatorDispatch, which is a bit
of a awkward and potentially dangerous (you have to know for sure the
registration order) API.

Thinking more, I need a RemoveForTesing one, to ensure that the unittests always
clean up after themselves.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
File base/allocator/allocator_shim_default_dispatch_to_glibc.cc (right):

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim_default_dispatch_to_glibc.cc:17: }  // extern "C"
On 2016/03/08 03:08:42, Nico wrote:
> do we ever want to call glibc instead of tcmalloc?

That's the case for use_allocator=none.
I am pretty confident there there is some bot/configuration who wants this out
there
(lesson learnt from the previous refactorings: all the build configs that you
expect to not be used are covered by at least one bot you don't know about. All
the configurations that you don't know about are covered by at least one bot
that you don't expect... in the fyi waterfall... 1 week after landing :P)

If I omit this, the base security unittests will fail without the shim when
use_allocator=none.

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
File base/allocator/allocator_shim_unittest.cc (right):

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim_unittest.cc:285: // paths of allocator_shim.cc and
smoke-test its thread safey.
On 2016/03/08 03:08:42, Nico wrote:
> nice!

;-)

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocat...
base/allocator/allocator_shim_unittest.cc:295: WaitableEvent event(true /*
manual_reset */, false /* initially_signaled */);
On 2016/03/08 03:08:42, Nico wrote:
> nit: use
> 
>   /*manual_reset=*/true, /*initialliy_signaled=*/false
> 
> With `false /* do thing */` it's not clear if "do thing" is set to false (i.e.
> "don't do thing") or if false means "do thing". /*do thing=*/false makes this
> clear.

Ahhh. Suddenly all that code that previously looked inconsistent (I always saw a
mixture of both in the codebase) makes sense. Done.

https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator...
File build/config/allocator.gni (right):

https://codereview.chromium.org/1675143004/diff/500001/build/config/allocator...
build/config/allocator.gni:27: assert(!use_experimental_allocator_shim ||
is_linux || is_nacl,
On 2016/03/08 03:08:42, Nico wrote:
> why is this needed?

Oh that didn't really work as expected, removing it.

The full story is that nacl is special in ways I often don't understand.
My plan for flipping the flag in the next CL, is to enable only for non-nacl as
I don't see a compelling use case to have the shim on nacl (unless I am missing
something).

What I get when building with nacl and the shim is:
../../base/allocator/allocator_shim_override_cpp_symbols.h:20:25: error:
function previously declared with an explicit exception specification redeclared
with an implicit exception specification
[-Werror,-Wimplicit-exception-spec-mismatch]
SHIM_ALWAYS_EXPORT void operator delete(void* p) __THROW
                        ^
/s/chrome-shim/src/native_client/toolchain/linux_x86/pnacl_newlib/le32-nacl/include/c++/v1/new:133:30:
note: previous declaration is here
_LIBCPP_NEW_DELETE_VIS void  operator delete(void* __p) _NOEXCEPT;
                             ^

So that thing was really a note-to-self to rule out nacl from the flag. But it
seems that I get if I forget that error is enough of a reminder.

Primiano Tucci (use gerrit)

Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag ...

4 years, 9 months ago (2016-03-08 20:56:25 UTC) #31

Description was changed from

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platform, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

Nico

A'ight, lgtm, let's see how this goes :-) https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocator_shim.h File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocator_shim.h#newcode44 base/allocator/allocator_shim.h:44: using ...

4 years, 9 months ago (2016-03-09 05:34:25 UTC) #32

Primiano Tucci (use gerrit)

https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocator_shim.h File base/allocator/allocator_shim.h (right): https://codereview.chromium.org/1675143004/diff/500001/base/allocator/allocator_shim.h#newcode44 base/allocator/allocator_shim.h:44: using AllocFn = void*(size_t size, const AllocatorDispatch* self); On ...

4 years, 9 months ago (2016-03-09 10:51:59 UTC) #33

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

4 years, 9 months ago (2016-03-09 10:52:25 UTC) #34

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1675143004/540001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1675143004/540001

4 years, 9 months ago (2016-03-09 10:52:33 UTC) #35

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 9 months ago (2016-03-09 12:42:10 UTC) #36

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 9 months ago (2016-03-09 12:42:11 UTC) #37

Primiano Tucci (use gerrit)

On 2016/03/09 12:42:11, commit-bot: I haz the power wrote: > Dry run: This issue passed ...

4 years, 9 months ago (2016-03-09 16:21:14 UTC) #38

On 2016/03/09 12:42:11, commit-bot: I haz the power wrote:
> Dry run: This issue passed the CQ dry run.

 I did some telemetry and micro benchmarks here.
telemetry benchmarks are pretty inconclusive due to the noise (mild shock).
However I have some conclusive data from microbenchmarks.

TL;DR
This CL doesn't seem to show any perf-impact if crbug.com/593344 is fixed
(base::NoBarrier_Load not being realy "NoBarrier" in clang+sysroot).
(More details below)

What should I do here?
Option 1) Land this as-is. Hope that the NoBarrier perf bug is visible only in
the microbenchmarks and gets lost in the noise in the waterfall. Once that bug
will be fixed, all will be good. 
Option 2) Make a small tweak to GetChainHead() (which is the thing affected by
NoBarrier_Load), as follows:
-----
// TODO(primiano): this should be a NoBarrier_Load once crbug.com/593344 is
fixed.
inline const allocator::AllocatorDispatch* GetChainHead() {
  return reinterpret_cast<const allocator::AllocatorDispatch*>(
      *static_cast<const volatile subtle::AtomicWord*>(&g_chain_head));
}
----

Option 3) Do option1 and land the workaround of Option2 only if the perf
waterfall shouts.
Very tempted to go for option 3

Details about the benchmark
---------------------------
I have a microbenchmark that spawns 8 threads, does malloc()+realloc() in a
tight loop and measures the time (Taken from
http://www.citi.umich.edu/projects/linux-scalability/reports/malloc.html)
The microbenchmark shows no measurable diff with and without this patch if
GetChainHead() is really unbarriered (i.e. if I do my Option 2 hack).
With crbug.com/593344, instead, the microbenchmark becomes 2.2x slower (I'm not
entirely surprised, that's really what you buy from an x86 arch, an OOO
execution engine on steroids and 20years of conflictual evolution of its memory
model. If you start stalling its pipeline you might as well just get a RISC and
add some resistors to match the power budget)

Primiano Tucci (use gerrit)

On 2016/03/09 18:38:07, Nico wrote: > I'd do 1 and reconsider if there are problems. ...

4 years, 9 months ago (2016-03-09 18:41:21 UTC) #40

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org

4 years, 9 months ago (2016-03-09 20:07:44 UTC) #41

Primiano Tucci (use gerrit)

The patchset sent to the CQ was uploaded after l-g-t-m from thakis@chromium.org Link to the ...

4 years, 9 months ago (2016-03-09 20:07:44 UTC) #42

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1675143004/540001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1675143004/540001

4 years, 9 months ago (2016-03-09 20:08:17 UTC) #43

commit-bot: I haz the power

Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag ...

4 years, 9 months ago (2016-03-09 20:13:55 UTC) #44

Message was sent while issue was closed.

Description was changed from

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

commit-bot: I haz the power

Description was changed from ========== Allocator shim skeleton + Linux impl behind a build flag ...

4 years, 9 months ago (2016-03-09 20:15:54 UTC) #46

Message was sent while issue was closed.

Description was changed from

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*
==========

to

==========
Allocator shim skeleton + Linux impl behind a build flag

TL;DR
-----
This CL introduces the skeleton for the unified allocator shim and
an actual working implementation for Linux.
The Linux implementation is really just taking the headers that today
define the malloc symbols in tcmalloc and copying them to base.
All the changes introduced are conditioned behind the build-time flag
use_experimental_allocator_shim, which is disabled by default.

Background Context
------------------
There are two reasons why we want to intercept allocations in Chrome:
1) To enforce some security properties (suicide on malloc failure via
   std::new_handler, preventing allocations > 2GB which can trigger
   signed vs. unsigned bugs in third_party libraries).
2) For diagnostic purposes (heap profiling).

Today (before this CL) allocation calls are already intercepted in
most Chrome platforms, but that happens in a disorganized and
inconsistent fashion. More in details:
On Linux: TCMalloc redefines the malloc()/new() symbols and we added
  code to the internal fork of tcmalloc to achieve 1).
On Mac: we inject our hooks in the default malloc zone in
  EnableTerminationOnOutOfMemory() (memory_mac.mm)
On Windows: we strip the malloc symbols out of libcmt and re-define
  them in allocator_shim_win.cc
On Android: we don't have 1)

The purpose of this work is to refactor and uniform this.
The goal is to have all the OS call into a single module (herein
allocator_shim.h) which performs 1) in base (as opposite to forking
allocator-specific code) and which offers a uniform interface for 2).

Why is this good?
-----------------
- Makes the allocator code more readable.
- Allows to unfork code from tcmalloc.
- Allows to design a more maintainable heap profiler, which shares
  the same code paths of the security enforcement layer.

Notes about execution
---------------------
Essentially on each platform we need to do three things:
 - Dismantle the code that defines the malloc()/new symbols.
 - Let the malloc()/new symbols route through the shim.
 - Have the shim ultimately invoke the actual allocator (tcmalloc,
   winheap, system allocator) as defined by the build config.
This clearly cannot happen atomically in one CL.
The plan, therefore, is to make the above happen behind a build-time
flag (use_experimental_allocator_shim), so the shim can be easily
tested / rolled-back in case of failures, and ultimately drop the
build-time flag (and remove the dead cde) once everything works.
This also will make dealing with reverts very manageable.

Therefore this CL (and likely all the future ones) is meant to be
a no-op until use_experimental_allocator_shim==true.

Design doc: bit.ly/allocator-shim

BUG=550886
TEST=build with use_experimental_allocator_shim=true, base_unittests
--gtest_filter=AllocatorShimTest.*

Committed: https://crrev.com/4e68ed2d51f897d12a570e16f7adbcb7595fa031
Cr-Commit-Position: refs/heads/master@{#380196}
==========

commit-bot: I haz the power

4 years, 9 months ago (2016-03-09 20:15:55 UTC) #47

Message was sent while issue was closed.

Patchset 9 (id:??) landed as
https://crrev.com/4e68ed2d51f897d12a570e16f7adbcb7595fa031
Cr-Commit-Position: refs/heads/master@{#380196}

Issue 1675143004: Allocator shim skeleton + Linux impl behind a build flag (Closed)

Description

Patch Set 1 : rebase #

Patch Set 2 : Remove use of c++ classes + test #

Patch Set 3 : cleanup #

Patch Set 4 : Rebase + some thoughts overnight #

Patch Set 5 : rebase #

Patch Set 6 : rebase + remove removal #

Patch Set 7 : rebase #

Patch Set 8 : Rebase + address thakis review #

Patch Set 9 : Make self the 1st arg #

Messages