gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks

The fragment shader of CMAA is heavy but CMAA is not expensive. It's because
CMAA runs the fragment shader only on edge fragments via early Z rejection.
The edge framents is only fractional on the whole screen.
However, COMBINE_EDGES path runs the fragment shadre on all screen fragments.
It's redundant because combined edges in COMBINE_EDGES path must be subset of
the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside
the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer.

For your information, CMAA consists of in terms of GPU cost;
* DETECT_EDGES1 : cheap shader on the whole screen.
* DETECT_EDGES2 : cheap shader on the only edges.
* COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL
* BLUR_EDGES : heavy shader on the only edges.

Performance data:
Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes
The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz
FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1"
NoAA        25.2 FPS
MSAA        10.6 FPS
CMAA-before 19.9 FPS
CMAA-after  21.3 FPS

BUG=535198
TEST=Run a WebGL app on Chromebook Pixel 2015
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2125803002
Cr-Commit-Position: refs/heads/master@{#404132}
1 file changed