Roland Shoemaker
e1fcd82abb
html: properly handle trailing solidus in unquoted attribute value in foreign content
...
The parser properly treats tags like <p a=/> as <p a="/">, but the
tokenizer emits the SelfClosingTagToken token incorrectly. When the
parser is used to parse foreign content, this results in an incorrect
DOM.
Thanks to Sean Ng (https://ensy.zip ) for reporting this issue.
Fixes golang/go#73070
Fixes CVE-2025-22872
Change-Id: I65c18df6d6244bf943b61e6c7a87895929e78f4f
Reviewed-on: https://go-review.googlesource.com/c/net/+/661256
Reviewed-by: Neal Patel <nealpatel@google.com >
Reviewed-by: Roland Shoemaker <roland@golang.org >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Auto-Submit: Gopher Robot <gobot@golang.org >
2025-03-27 12:51:24 -07:00
Maciej Mionskowski
643fd162e3
html: fix SOLIDUS '/' handling in attribute parsing
...
Calling the Tokenizer with HTML elements containing SOLIDUS (/) character
in the attribute name results in incorrect tokenization.
This is due to violation of the following rule transitions in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state ,
where we are not reconsuming the character if '/' is encountered
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state ,
where we are not switching to self closing state
Fixes golang/go#63402
Change-Id: I90d998dd8decde877bd63aa664f3657aa6161024
GitHub-Last-Rev: 3546db808c
GitHub-Pull-Request: golang/net#195
Reviewed-on: https://go-review.googlesource.com/c/net/+/533518
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Auto-Submit: Michael Pratt <mpratt@google.com >
Reviewed-by: Roland Shoemaker <roland@golang.org >
Reviewed-by: David Chase <drchase@google.com >
2024-02-07 19:23:52 +00:00
Roland Shoemaker
4050002696
html: handle equals sign before attribute
...
Apply the correct normalization when an equals sign appears before an
attribute name (e.g. '<tag =>' -> '<tag =="">'), per WHATWG 13.2.5.32.
Change-Id: Id21b428bd86117dd073c502767386bc718a3fb7b
Reviewed-on: https://go-review.googlesource.com/c/net/+/488695
Auto-Submit: Roland Shoemaker <roland@golang.org >
TryBot-Result: Gopher Robot <gobot@golang.org >
Reviewed-by: Nigel Tao <nigeltao@golang.org >
Run-TryBot: Roland Shoemaker <roland@golang.org >
Reviewed-by: Nigel Tao (INACTIVE; USE @golang.org INSTEAD) <nigeltao@google.com >
2023-06-20 17:16:42 +00:00
Nigel Tao
1d46ed8b48
html: have Render escape comments less often
...
Fixes golang/go#58246
Change-Id: I3effbd2afd7e363a42baa4db20691e57c9a08389
Reviewed-on: https://go-review.googlesource.com/c/net/+/469056
TryBot-Result: Gopher Robot <gobot@golang.org >
Run-TryBot: Nigel Tao <nigeltao@golang.org >
Reviewed-by: Bryan Mills <bcmills@google.com >
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com >
Reviewed-by: Damien Neil <dneil@google.com >
2023-02-28 08:42:21 +00:00
Nigel Tao
39940adcaa
html: parse comments per HTML spec
...
Updates golang/go#58246
Change-Id: Iaba5ed65f5d244fd47372ef0c08fc4cdb5ed90f9
Reviewed-on: https://go-review.googlesource.com/c/net/+/466776
TryBot-Result: Gopher Robot <gobot@golang.org >
Auto-Submit: Nigel Tao <nigeltao@golang.org >
Reviewed-by: Damien Neil <dneil@google.com >
Run-TryBot: Nigel Tao <nigeltao@golang.org >
Reviewed-by: Nigel Tao (INACTIVE; USE @golang.org INSTEAD) <nigeltao@google.com >
2023-02-10 18:21:14 +00:00
Roland Shoemaker
430a433969
html: properly handle exclamation marks in comments
...
Properly handle the case where HTML comments begin with exclamation
marks and have no other content, i.e. "<!--!-->". Previously these
comments would cause the tokenizer to consider everything following to
also be considered part of the comment.
Fixes golang/go#37771
Change-Id: I78ea310debc3846f145d62cba017055abc7fa4e0
Reviewed-on: https://go-review.googlesource.com/c/net/+/442496
Run-TryBot: Roland Shoemaker <roland@golang.org >
TryBot-Result: Gopher Robot <gobot@golang.org >
Reviewed-by: Damien Neil <dneil@google.com >
2022-10-20 16:40:45 +00:00
Nigel Tao
0699458419
html: escape comment and doctype tokens' data
...
Fixes golang/go#48237
Change-Id: I309e3ad30684fb71b9b3e67dfac156da08dbc69b
Reviewed-on: https://go-review.googlesource.com/c/net/+/419334
Run-TryBot: Nigel Tao <nigeltao@golang.org >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com >
TryBot-Result: Gopher Robot <gobot@golang.org >
2022-07-26 23:03:23 +00:00
Kunpei Sakai
e7e4b65ae6
html: improve coding style
...
Change-Id: I05c0ccbad41f5512f8096b0d15991d7d6b5d726e
Reviewed-on: https://go-review.googlesource.com/c/net/+/209398
Reviewed-by: Nigel Tao <nigeltao@golang.org >
2019-12-07 00:06:13 +00:00
Dario
2ec189313e
html: fix tokenizer error
...
Trailing '<' entities in the text token make the tokenizer fail
for escapable raw text elements like title and textarea
Fixes golang/go#34281
Change-Id: I6fe8f2229b5fd639cf5a02ab1db31f18ea034c8b
GitHub-Last-Rev: 4a9da03177
GitHub-Pull-Request: golang/net#53
Reviewed-on: https://go-review.googlesource.com/c/net/+/196620
Run-TryBot: Kunpei Sakai <kunpei@google.com >
TryBot-Result: Gobot Gobot <gobot@golang.org >
Reviewed-by: Nigel Tao <nigeltao@golang.org >
2019-10-02 03:54:40 +00:00
Nigel Tao
2e5a9a9514
html: add Tokenizer.Raw comment re byte offsets
...
Change-Id: I2a08f28fcc58869b0e8a3b21b9a9c97da5063014
Reviewed-on: https://go-review.googlesource.com/c/net/+/198357
Reviewed-by: David Symonds <dsymonds@golang.org >
2019-10-02 03:42:24 +00:00
Nigel Tao
5ccada7d0a
html: fix misleading Tokenizer.Token comment
...
Change-Id: I39359b5fa52faf5b69005ba47b58be3beec16c4e
Reviewed-on: https://go-review.googlesource.com/87515
Reviewed-by: David Symonds <dsymonds@golang.org >
2018-01-12 01:58:58 +00:00
Andrew Gerrand
fbe893ddcd
go.net: use golang.org/x/... import paths
...
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/167030043
2014-11-10 09:04:43 +11:00
Andrew Balholm
4109fccea4
html: handle '<' before a tag
...
As pointed out at
https://groups.google.com/forum/#!topic/golang-nuts/LJozHIXAAJY ,
`<<p>html</p>` was parsed as `<<p>html</p>`.
There was no test case for this. Chrome parses it as `<<p>html</p>`,
and that seems to be correct. We were missing the
"Reconcume the current input character" step at
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state
LGTM=nigeltao
R=golang-codereviews, gobot, nigeltao
CC=golang-codereviews, nigeltao
https://golang.org/cl/96060044
2014-05-12 16:42:14 +10:00
Robert Griesemer
a6927df230
go.net: fix various typos
...
LGTM=adonovan
R=adonovan
CC=golang-codereviews, golang-dev
https://golang.org/cl/97950043
2014-05-02 14:50:26 -07:00
Michael Piatek
4698117464
go.net/html: Expose data read from the input reader but not yet tokenized in Tokenizer.
...
This allows clients to efficiently reconstruct the original input in the case of ErrBufferExceeded. TestMaxBufferReconstruction now properly verifies this.
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/47770043
2014-01-06 10:51:23 -08:00
Michael Piatek
384e4d292e
html: limit buffering during tokenization.
...
This is optional. By default, buffering is unlimited.
Fixes golang/go#7053
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/43190044
2014-01-03 13:16:55 -08:00
Michael Piatek
480e7b06ec
go.net/html: Tokenizer.Raw returns the original input when tokenizer errors occur.
...
Two tweaks enable this:
1) Updating the raw and data span pointers when Tokenizer.Next is called, even
if an error has occurred. This prevents duplicate data from being returned by
Raw in the common case of an EOF.
2) Treating '</>' as an empty comment token to expose the raw text as a
tokenization event. (This matches the semantics of other non-token events,
e.g., '</ >' is treated as '<!-- -->'.)
Fixes golang/go#7029 .
R=golang-codereviews, r, bradfitz
CC=golang-codereviews
https://golang.org/cl/46370043
2014-01-02 10:51:00 -08:00
Nigel Tao
e8489d83dd
go.net/html: fix the tokenizer when the underlying io.Reader returns
...
either (0, nil) or an (n, err) such that n > 0 && err != nil. Both
cases are valid by the io.Reader contract.
R=r
CC=golang-dev
https://golang.org/cl/12513043
2013-08-07 12:55:39 +10:00
Nigel Tao
ea127e889c
go.net/html: move exp/html and exp/html/atom here to the go.net
...
sub-repo.
It's a straight copy, except for these modifications:
* "exp/html" and "exp/html/atom" imports were renamed, and
* the "TODO... When this package moves out of exp" comment was
deleted from atom/atom.go.
The matching change is at https://golang.org/cl/7317043
The rationale was discussed at
https://groups.google.com/d/topic/golang-nuts/Qq5hTQyPuLg/discussion
R=adg, remyoudompheng, dave
CC=golang-dev
https://golang.org/cl/7310063
2013-02-11 11:55:20 +11:00