384853b1969a9dcb0ada5131b9f703f9ed30c1c2 - chromium/src

commit	384853b1969a9dcb0ada5131b9f703f9ed30c1c2	[log] [tgz]
author	brettw@chromium.org <brettw@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>	Sun Jan 30 17:58:21 2011
committer	brettw@chromium.org <brettw@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>	Sun Jan 30 17:58:21 2011
tree	4354b70414986911cdec1c984d6a95e689f581f6
parent	5261606647f05b40bf43b23a292c257c49241b87 [diff]

Integration of most changes from the GoogleTV project around the convolver/scaler.
This contains the following improvements:
- Adding a few extra convolution filters on top of the existing LANCZOS3 (used
internally in Chrome), and BOX (used in unit tests):
- LANCZOS2: a variation of LANCZOS3 except that the windowed function is
limited to the [-2:2] range.
- HAMMING1: this uses a Hamming window using the [-1:-1] range.
If we define the zoom down factor to z, and w the size of the window,
the actual cost of each filter (CPU wise) is proportional to (w * 2 * z + 1).
So, if we look at what happens when you zoom down by a factor of 4 (as often
found when creating thumbnails), the cost would be 25 for LANCZOS3,
17 for LANCZOS2, and 9 for HAMMING.
As a result, HAMMING1 can end up be roughly three times as fast as the typical
LANCZOS3.
In terms of visual quality, HAMMING1 will be obviously worse than filters that
have a larger window.
The motivation of this change is that not all processors are equally equipped,
and while LANCZOS3 does provide good quality, it will be completely inadequate
in speed on slower processors (as found on Google TV), and it would be worth
trading some visual quality for speed.
Because the definitions of what is acceptable from one platform to another will
differ, this change adds generic enums describing various trade offs between
quality and speed. And depending on the platform, these would then be mapped
to different filters. This change does not contain the other changes made to
the all the call sites to transform LANCZOS3 to the appropriate enum. Another
CL will have to be checked in for the policy definition.

- Improvements in speed by around 10% (the actual speed up depends on the
parameters of the scale (scale ratios, sizes of images), as well as the actual
processor on which this is run on. The 10% was measured on scale down of
1920x1080 images to 1920/4x1080/4 using the LANCZOS3 filter on a 32bit Atom
based using the image_operations_bench. Actual numbers for a 64bit processor
are discussed below.
This optimization attempts to basically eliminate all zeroes on each side of
the filter_size, since it is very likely that the calculated window will go one
fraction of a pixel outside of the window where the function is actuall not
zero. In many cases, this means it gets rid the convolution by one point. So,
using the math above, (w * 2 * z + 1) will have 1 subtracted. The code though
is generic and will get rid of more points if possible.

- To measure speed, a small utility image_operations_bench was added. Its
purpose is to simply measure speed of the actual speed of the convolution
without any regards to the actual data. Run with --help for a list of options.
The actual measured number is in MB/s (source MB + dest MB / time).
The following numbers were found on a 64 bit Release build on a z600:
| zero optimization |
Filter | no | yes |
Hamming1 | 459 | 495 |
Lanczos2 | 276 | 294 |
Lanczos3 | 202 | 207 |
The command line was:
for i in HAMMING1 LANCZOS2 LANCZOS3 ; do echo $i; out/Release/image_operations_bench -source 1920x1080 -destination 480x270 -m $i -iter 50 ; done
The actual improvements for the zero optimization mentioned above are much
more prevalent on a 32bit Atom.

- Commented that there is half-pixel error inside the code in image_operations.
Because this would effectively changes the results of many scales that are
used in win_layout tests, this would effectively break them. As a result, the
change here only adds comments about what needs to be changed, but does not
fix the issue itself. A subsequent change will remove the comments and enable
the fix, and also adds the corrected reference images used for the test.
See bug 69999: http://code.google.com/p/chromium/issues/detail?id=69999
- Enhanced the convolver to support arbitrary strides, instead of the hard
coded 4 * width. This value is correct on most platforms, but is not on
GoogleTV since buffers allocated need to be 32 pixel multiples to exploit HW
capabilities.

- Added numerous unit tests to cover the new filters as well as adding other
ones that are more rigourous than the existing ones. Such a test is the reason,
we have found the half pixel error mentioned above.

TEST=This was tested against the existing unit tests, and the added unit tests on
a 64 bit Linux platform. The tests were then ran under valgrind to check for
possible memory leaks/ and errors. The tests do come out clean (except the
preexisting file descriptor 'leaks' coming from other tests that are linked
with test_shell_tests

Actual credit to most of the actual changes go to various contributors of the
Google TV team.

Note that there are two types of optimizations that are possible beyond these
changes that are not done here:
1/ Use the fact that the filter coefficients will be periodic to reduce the cost
of calculating the coefficients (though typically in the noise), but rather when
the convolution is done to decrease cache misses on the coefficients.
Experiments showed that on an Atom, this can yield 5 % improvement.
2/ This code is the prime target for the use of SIMD instructions.

BUG=47447, 62820, 69999
Patch by evannier@google.com
Original review http://codereview.chromium.org/5575010/

git-svn-id: svn://svn.chromium.org/chrome/trunk/src@73110 0039d316-1c4b-4281-b951-d872f2087c98

8 files changed

tree: 4354b70414986911cdec1c984d6a95e689f581f6