Stop ignoring whitespaces in the middle of MIME type in a Content-Type header

extractMIMETypeFromMediaType() has been treating \n (0x0a), \v (0x0b),
\f (0x0c), \r (0x0d) and Unicode characters with BIDI property WS as
white space in addition to the OWS characters SP (0x20) and HTAB (0x09).
These characters were trimmed not only from the head and tail of the
type/subtype part but also from the middle of the value, i.e. when
"te xt/ht ml" is received it's automatically normalized into
"text/html".

This CL fixes this spec violation partially by:
- limiting characters which are dealt with as whitespaces to only SP
  and HTAB as specified in the RFC 7230
- stop trimming white space characters (including SP and HTAB) from the
  middle of type/subtype value

We don't add full ABNF validation to drop everything that doesn't
conform to the media-type ABNF as we're not sure how much the result
of such strict fixing would be.

See also https://bugs.webkit.org/show_bug.cgi?id=8644

R=mkwst@chromium.org
BUG=642346

Review-Url: https://codereview.chromium.org/2310783003
Cr-Commit-Position: refs/heads/master@{#416586}
5 files changed