19 Commits

Author SHA1 Message Date
Roland Shoemaker
e1fcd82abb html: properly handle trailing solidus in unquoted attribute value in foreign content
The parser properly treats tags like <p a=/> as <p a="/">, but the
tokenizer emits the SelfClosingTagToken token incorrectly. When the
parser is used to parse foreign content, this results in an incorrect
DOM.

Thanks to Sean Ng (https://ensy.zip) for reporting this issue.

Fixes golang/go#73070
Fixes CVE-2025-22872

Change-Id: I65c18df6d6244bf943b61e6c7a87895929e78f4f
Reviewed-on: https://go-review.googlesource.com/c/net/+/661256
Reviewed-by: Neal Patel <nealpatel@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Gopher Robot <gobot@golang.org>
2025-03-27 12:51:24 -07:00
Maciej Mionskowski
643fd162e3 html: fix SOLIDUS '/' handling in attribute parsing
Calling the Tokenizer with HTML elements containing SOLIDUS (/) character
in the attribute name results in incorrect tokenization.

This is due to violation of the following rule transitions in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state,
  where we are not reconsuming the character if '/' is encountered
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state,
  where we are not switching to self closing state

Fixes golang/go#63402

Change-Id: I90d998dd8decde877bd63aa664f3657aa6161024
GitHub-Last-Rev: 3546db808c
GitHub-Pull-Request: golang/net#195
Reviewed-on: https://go-review.googlesource.com/c/net/+/533518
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2024-02-07 19:23:52 +00:00
Roland Shoemaker
4050002696 html: handle equals sign before attribute
Apply the correct normalization when an equals sign appears before an
attribute name (e.g. '<tag =>' -> '<tag =="">'), per WHATWG 13.2.5.32.

Change-Id: Id21b428bd86117dd073c502767386bc718a3fb7b
Reviewed-on: https://go-review.googlesource.com/c/net/+/488695
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Run-TryBot: Roland Shoemaker <roland@golang.org>
Reviewed-by: Nigel Tao (INACTIVE; USE @golang.org INSTEAD) <nigeltao@google.com>
2023-06-20 17:16:42 +00:00
Nigel Tao
1d46ed8b48 html: have Render escape comments less often
Fixes golang/go#58246

Change-Id: I3effbd2afd7e363a42baa4db20691e57c9a08389
Reviewed-on: https://go-review.googlesource.com/c/net/+/469056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com>
Reviewed-by: Damien Neil <dneil@google.com>
2023-02-28 08:42:21 +00:00
Nigel Tao
39940adcaa html: parse comments per HTML spec
Updates golang/go#58246

Change-Id: Iaba5ed65f5d244fd47372ef0c08fc4cdb5ed90f9
Reviewed-on: https://go-review.googlesource.com/c/net/+/466776
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Nigel Tao (INACTIVE; USE @golang.org INSTEAD) <nigeltao@google.com>
2023-02-10 18:21:14 +00:00
Roland Shoemaker
430a433969 html: properly handle exclamation marks in comments
Properly handle the case where HTML comments begin with exclamation
marks and have no other content, i.e. "<!--!-->". Previously these
comments would cause the tokenizer to consider everything following to
also be considered part of the comment.

Fixes golang/go#37771

Change-Id: I78ea310debc3846f145d62cba017055abc7fa4e0
Reviewed-on: https://go-review.googlesource.com/c/net/+/442496
Run-TryBot: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
2022-10-20 16:40:45 +00:00
Nigel Tao
0699458419 html: escape comment and doctype tokens' data
Fixes golang/go#48237

Change-Id: I309e3ad30684fb71b9b3e67dfac156da08dbc69b
Reviewed-on: https://go-review.googlesource.com/c/net/+/419334
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-07-26 23:03:23 +00:00
Kunpei Sakai
e7e4b65ae6 html: improve coding style
Change-Id: I05c0ccbad41f5512f8096b0d15991d7d6b5d726e
Reviewed-on: https://go-review.googlesource.com/c/net/+/209398
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-12-07 00:06:13 +00:00
Dario
2ec189313e html: fix tokenizer error
Trailing '<' entities in the text token make the tokenizer fail
for escapable raw text elements like title and textarea

Fixes golang/go#34281

Change-Id: I6fe8f2229b5fd639cf5a02ab1db31f18ea034c8b
GitHub-Last-Rev: 4a9da03177
GitHub-Pull-Request: golang/net#53
Reviewed-on: https://go-review.googlesource.com/c/net/+/196620
Run-TryBot: Kunpei Sakai <kunpei@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-10-02 03:54:40 +00:00
Nigel Tao
2e5a9a9514 html: add Tokenizer.Raw comment re byte offsets
Change-Id: I2a08f28fcc58869b0e8a3b21b9a9c97da5063014
Reviewed-on: https://go-review.googlesource.com/c/net/+/198357
Reviewed-by: David Symonds <dsymonds@golang.org>
2019-10-02 03:42:24 +00:00
Nigel Tao
5ccada7d0a html: fix misleading Tokenizer.Token comment
Change-Id: I39359b5fa52faf5b69005ba47b58be3beec16c4e
Reviewed-on: https://go-review.googlesource.com/87515
Reviewed-by: David Symonds <dsymonds@golang.org>
2018-01-12 01:58:58 +00:00
Andrew Gerrand
fbe893ddcd go.net: use golang.org/x/... import paths
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/167030043
2014-11-10 09:04:43 +11:00
Andrew Balholm
4109fccea4 html: handle '<' before a tag
As pointed out at
https://groups.google.com/forum/#!topic/golang-nuts/LJozHIXAAJY,
`<<p>html</p>` was parsed as `&lt;&lt;p&gt;html</p>`.
There was no test case for this. Chrome parses it as `&lt<p>html</p>`,
and that seems to be correct. We were missing the
"Reconcume the current input character" step at
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state

LGTM=nigeltao
R=golang-codereviews, gobot, nigeltao
CC=golang-codereviews, nigeltao
https://golang.org/cl/96060044
2014-05-12 16:42:14 +10:00
Robert Griesemer
a6927df230 go.net: fix various typos
LGTM=adonovan
R=adonovan
CC=golang-codereviews, golang-dev
https://golang.org/cl/97950043
2014-05-02 14:50:26 -07:00
Michael Piatek
4698117464 go.net/html: Expose data read from the input reader but not yet tokenized in Tokenizer.
This allows clients to efficiently reconstruct the original input in the case of ErrBufferExceeded. TestMaxBufferReconstruction now properly verifies this.

R=bradfitz
CC=golang-codereviews
https://golang.org/cl/47770043
2014-01-06 10:51:23 -08:00
Michael Piatek
384e4d292e html: limit buffering during tokenization.
This is optional. By default, buffering is unlimited.

Fixes golang/go#7053

R=bradfitz
CC=golang-codereviews
https://golang.org/cl/43190044
2014-01-03 13:16:55 -08:00
Michael Piatek
480e7b06ec go.net/html: Tokenizer.Raw returns the original input when tokenizer errors occur.
Two tweaks enable this:
1) Updating the raw and data span pointers when Tokenizer.Next is called, even
if an error has occurred. This prevents duplicate data from being returned by
Raw in the common case of an EOF.

2) Treating '</>' as an empty comment token to expose the raw text as a
tokenization event. (This matches the semantics of other non-token events,
e.g., '</ >' is treated as '<!-- -->'.)

Fixes golang/go#7029.

R=golang-codereviews, r, bradfitz
CC=golang-codereviews
https://golang.org/cl/46370043
2014-01-02 10:51:00 -08:00
Nigel Tao
e8489d83dd go.net/html: fix the tokenizer when the underlying io.Reader returns
either (0, nil) or an (n, err) such that n > 0 && err != nil. Both
cases are valid by the io.Reader contract.

R=r
CC=golang-dev
https://golang.org/cl/12513043
2013-08-07 12:55:39 +10:00
Nigel Tao
ea127e889c go.net/html: move exp/html and exp/html/atom here to the go.net
sub-repo.

It's a straight copy, except for these modifications:
* "exp/html" and "exp/html/atom" imports were renamed, and
* the "TODO... When this package moves out of exp" comment was
  deleted from atom/atom.go.

The matching change is at https://golang.org/cl/7317043

The rationale was discussed at
https://groups.google.com/d/topic/golang-nuts/Qq5hTQyPuLg/discussion

R=adg, remyoudompheng, dave
CC=golang-dev
https://golang.org/cl/7310063
2013-02-11 11:55:20 +11:00