41 Commits

Author SHA1 Message Date
Roland Shoemaker
59706cdaa8 html: impose open element stack size limit
The HTML specification contains a number of algorithms which are
quadratic in complexity by design. Instead of adding complicated
workarounds to prevent these cases from becoming extremely expensive in
pathological cases, we impose a limit of 512 to the size of the stack of
open elements. It is extremely unlikely that non-adversarial HTML
documents will ever hit this limit (but if we see cases of this, we may
want to make the limit configurable via a ParseOption).

Thanks to Guido Vranken and Jakub Ciolek for both independently
reporting this issue.

Fixes CVE-2025-47911
Fixes golang/go#75682

Change-Id: I890517b189af4ffbf427d25d3fde7ad7ec3509ad
Reviewed-on: https://go-review.googlesource.com/c/net/+/709876
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-07 11:18:01 -07:00
Roland Shoemaker
6ec8895aa5 html: align in row insertion mode with spec
Update inRowIM to match the HTML specification. This fixes an issue
where a specific HTML document could cause the parser to enter an
infinite loop when trying to parse a </tbody> and implied </tr> next to
each other.

Fixes CVE-2025-58190
Fixes golang/go#70179

Change-Id: Idcb133c87c7d475cc8c7eb1f1550ea21d8bdddea
Reviewed-on: https://go-review.googlesource.com/c/net/+/709875
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
2025-10-07 11:17:53 -07:00
Pukki
312450e473 html: ensure <search> tag closes <p> and update tests
This change ensures that the <search> tag correctly closes an open <p> tag when encountered during parsing.

Changes:
- Added <search> to the list of elements that should close an open <p> tag in parse.go.
- Updated the second list in parse.go to ensure consistency.
- Updated html/atom/gen.go, table.go, and table_test.go accordingly.
- Modified parse_test.go to use strings.Builder instead of bytes.Buffer.
- Updated test error messages to follow Go’s conventions.
- Fixed an accidental colon in the comment in parse.go.

Change-Id: I5835da69f6bb0e14c483e55b7ae82915ae958dc1
Reviewed-on: https://go-review.googlesource.com/c/net/+/655457
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2025-03-12 15:46:46 -07:00
Roland Shoemaker
8e66b04771 html: use strings.EqualFold instead of lowering ourselves
Instead of using strings.ToLower and == to check case insensitive
equality, just use strings.EqualFold, even when the strings are only
ASCII. This prevents us unnecessarily lowering extremely long strings,
which can be a somewhat expensive operation, even if we're only
attempting to compare equality with five characters.

Thanks to Guido Vranken for reporting this issue.

Fixes golang/go#70906
Fixes CVE-2024-45338

Change-Id: I323b919f912d60dab6a87cadfdcac3e6b54cd128
Reviewed-on: https://go-review.googlesource.com/c/net/+/637536
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Tatiana Bradley <tatianabradley@google.com>
2024-12-18 11:24:30 -08:00
yincong
b935f7b5d7 html: avoid endless loop on error token
Fixes #70179

Change-Id: I2a0a1fc2e96f7d8eefd0abdf7ef8ba243a6e8645
GitHub-Last-Rev: a601ecd849
GitHub-Pull-Request: golang/net#226
Reviewed-on: https://go-review.googlesource.com/c/net/+/624895
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-12-18 08:05:47 -08:00
cui fliter
415cb6d518 all: fix some comments
Change-Id: Iee11c27052222f017b672c06ced9e129ee51619c
Reviewed-on: https://go-review.googlesource.com/c/net/+/465996
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-08 14:49:55 +00:00
cui fliter
0b7e1fb9d4 all: fix a few function names on comments
Change-Id: I6c853dd402d296701e38289bbc418730b068dde8
Reviewed-on: https://go-review.googlesource.com/c/net/+/441716
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
2022-10-12 13:50:44 +00:00
Nigel Tao
37e1c6afe0 html: ignore templates nested within foreign content
Fixes #46288
Fixes CVE-2021-33194

Change-Id: I2fe39702de8e9aab29965c1526e377a6f9cdf056
Reviewed-on: https://go-review.googlesource.com/c/net/+/311090
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
Trust: Roland Shoemaker <roland@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-05-20 17:08:46 +00:00
Kunpei Sakai
942e2f445f html: avoid using raw text mode even when nested noscript tags
Assuming "in head noscript" insertion mode, the scripting flag will be disabled.
Thus, even if nested noscript tags exist,
the tokenizer should not go into the raw text mode.

This change makes the following test happy:
<head><noscript><noscript class="foo"><!--foo--></noscript>

Change-Id: I2620e751d8be3d313c3a2e2f992b1e21ce2dc2ee
Reviewed-on: https://go-review.googlesource.com/c/net/+/263878
Trust: Kunpei Sakai <namusyaka@gmail.com>
Trust: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2020-10-29 05:50:24 +00:00
Kunpei Sakai
8adf50f3fe html: avoid using raw text mode if there are raw tags to be ignored in select IM
This follows up on https://golang.org/cl/264977

Change-Id: I5d0e2f39173a8bbd07ca53de4df2a7e8772d4197
Reviewed-on: https://go-review.googlesource.com/c/net/+/265960
Trust: Kunpei Sakai <namusyaka@gmail.com>
Trust: Nigel Tao <nigeltao@golang.org>
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2020-10-29 05:33:32 +00:00
Kunpei Sakai
e7e4b65ae6 html: improve coding style
Change-Id: I05c0ccbad41f5512f8096b0d15991d7d6b5d726e
Reviewed-on: https://go-review.googlesource.com/c/net/+/209398
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-12-07 00:06:13 +00:00
Kunpei Sakai
51f093181b html: update adoption agency algorithm
See: https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm

This follows up on golang.org/cl/205617

Change-Id: I45862eb81ed421b327e216254169355e63698716
Reviewed-on: https://go-review.googlesource.com/c/net/+/210317
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-12-07 00:05:07 +00:00
Kunpei Sakai
1ddd1de85c html: implement generic raw text element parsing algorithm
See: https://html.spec.whatwg.org/multipage/parsing.html#parsing-elements-that-contain-only-text

This follows up on golang.org/cl/205617

Change-Id: Id99054bc25e9ea90bb3f03b15c14c13573520997
Reviewed-on: https://go-review.googlesource.com/c/net/+/210318
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-12-06 10:30:17 +00:00
Kunpei Sakai
afd1edf42a html: drop <isindex> and <command> specific handlings
This commit also adds remaining tests to follow up on golang.org/cl/205617

Change-Id: I8b155f9f605c6a0eb8745c32f5e785f5b4bc1c7e
Reviewed-on: https://go-review.googlesource.com/c/net/+/208937
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-12-06 10:28:45 +00:00
Kunpei Sakai
ffdde10578 html: implement adjusted current node and make parser support foreign fragment
This follows up on golang.org/cl/205617

Change-Id: Id94a4fcef6a604936c404f75999ba37321b6c2c0
Reviewed-on: https://go-review.googlesource.com/c/net/+/206121
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-11-25 08:49:36 +00:00
Kunpei Sakai
b954d4e333 html: add Main support
This follows up on golang.org/cl/205617

Change-Id: Ic4a232c40a69bcd3ba35abdd36bce933f35248ea
Reviewed-on: https://go-review.googlesource.com/c/net/+/206117
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-11-24 23:23:54 +00:00
Kunpei Sakai
6f6bbb1828 html: add Dialog support
Change-Id: I16afe71ca444afb03526f94e6743a587cd82a8d4
Reviewed-on: https://go-review.googlesource.com/c/net/+/205618
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-11-08 04:52:52 +00:00
Kunpei Sakai
9ce7a6920f html: implement ParseWithOptions and ParseFragmentWithOptions
This commit newly introduces a type for configuring a parser
called ParseOption, and implements two functions depending on it.
Along with that, this introduces ParseOptionEnableScripting to
enable setting of the scripting flag.

Fixes golang/go#16318

Change-Id: Ie7fd7d8ce286e22e7f57182fc2ce353bce578db6
Reviewed-on: https://go-review.googlesource.com/c/net/+/174157
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-05-01 00:44:15 +00:00
Tom Anthony
ce75fb3bc6 html: Add missing condition to 'in cell' insertion mode, required by spec
In section 12.2.6.4.15 of the spec, there is a condition that the current node is a td or th element, which is not implemented. This can lead to a panic when the open elements stack is popped whilst empty, as outlined in golang/go#30600. This commit implements that check.

Fixes golang/go#30600

Change-Id: I4837815e2edce21b58a985a100d93d146bf71e24
GitHub-Last-Rev: 79084c5a84
GitHub-Pull-Request: golang/net#41
Reviewed-on: https://go-review.googlesource.com/c/net/+/172377
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-24 02:45:59 +00:00
Kunpei Sakai
574d568418 html: add "in head noscript" im support
In the spec 12.2.6.4.5, the "in head noscript" insertion mode is defined.
However, this package and its parser doesn't have the insertion mode,
because the scripting=false case is not considered currently.

This commit adds a test and a support for the "in head noscript"
insertion mode. This change has no effect on the actual behavior.

Updates golang/go#16318

Change-Id: I9314c3342bea27fa2acf2fa7d980a127ee0fbf91
Reviewed-on: https://go-review.googlesource.com/c/net/+/172557
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-04-24 02:42:50 +00:00
Mikio Hara
a33f666f30 html: gofmt -w -s
Change-Id: I2da52ff2afbf0417dbe6c08105fafeb168e936ee
Reviewed-on: https://go-review.googlesource.com/c/net/+/169358
Run-TryBot: Mikio Hara <mikioh.public.networking@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-03-26 08:36:53 +00:00
Tom Anthony
e3b2ff56ed html: fix parsing where nested tags of unknown types inadvertently close one another
The existing implementation behaves differently to all major browsers, for the instance where a self-closing element of an unknown tag type is the child of another element of an unknown tag type. The issue appears to be that nested tags of an differing unknown types will all have an atom value of 0 and `inBodyEndTagOther` will incorrectly match them to one another.

Fixes golang/go#30961

Change-Id: I62b0aa49c027c8432df7d077ffba135201b3b786
GitHub-Last-Rev: fb25181f9a
GitHub-Pull-Request: golang/net#37
Reviewed-on: https://go-review.googlesource.com/c/net/+/168638
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-03-24 22:39:53 +00:00
Kunpei Sakai
3a22650c66 html: remove unnecessary break
The ancestor doesn't always match with the first.

Change-Id: I0edcbffab7e19ba1731e849021ffbb7428ec48d7
Reviewed-on: https://go-review.googlesource.com/c/161857
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-02-13 06:11:40 +00:00
Kunpei Sakai
d26f9f9a57 html: update inSelectIM and inSelectInTableIM for the latest spec
Fixes golang/go#27842

Change-Id: I06eb3c0c18be3566bd30a29fca5f3f7e6791d2cc
Reviewed-on: https://go-review.googlesource.com/c/137275
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2019-01-25 09:10:13 +00:00
Kunpei Sakai
f5e5bdd778 html: remove unnecessary namespace checking
Change-Id: I03ebb4369389262b842001e18d0594fd71b19f44
Reviewed-on: https://go-review.googlesource.com/c/138797
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-10-03 01:32:48 +00:00
Kunpei Sakai
cf3bd585ca html: don't set im if <template> doesn't have HTML namespace
If there are nested <template> elements and the <template> node isn't in HTML namespace,
couldn't continue to parse documents correctly.
By this patch, it makes the <template> which is in math namespace be skipped on
resetting insertion mode.

Fixes golang/go#27702

Change-Id: I6eacdb98fe29eb3c61781afca5bc4d83e72ba4ed
Reviewed-on: https://go-review.googlesource.com/136875
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-09-25 07:13:36 +00:00
Kunpei Sakai
2f5d238892 html: avoid panic even if unconsidered <isindex> and <template> combination
The <isindex> element has been removed from the spec so that the
<template> element doesn't cover it.
To avoid panic, this commit adds ignoring code as a workaround.

Fixes golang/go#27704

Change-Id: I847391389285df2fc0eb6a795f8c93b481cdebac
Reviewed-on: https://go-review.googlesource.com/136575
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-09-21 00:03:56 +00:00
Nigel Tao
161cd47e91 html: add more comments to Parse and ParseFragment
They implement the HTML5 parsing algorithm, which is very complicated.

Fixes golang/go#26973

Change-Id: I83a5753ab00fe84f73797fcecd309306d9f24819
Reviewed-on: https://go-review.googlesource.com/133695
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-09-06 23:31:01 +00:00
Kunpei Sakai
8a410e7b63 html: fix wrong comparison in foster parenting algorithm
Fixes golang/go#23071

Change-Id: I383e13bfd87e32ffb775dff54c46b66b090e5017
Reviewed-on: https://go-review.googlesource.com/131475
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-08-26 01:23:51 +00:00
Kunpei Sakai
faa378e6db html: handle end-of-file cases correctly
Updates golang/go#23071

Change-Id: I02a61109b5738759a9ee3e448981778de7d0ff62
Reviewed-on: https://go-review.googlesource.com/130795
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-08-24 04:51:31 +00:00
Kunpei Sakai
aaf6012214 html: remove special procedure for <template> in frameset im
See more details: https://bugs.chromium.org/p/chromium/issues/detail?id=829668

Updates golang/go#23071

Change-Id: Ib9c963269f814c3f21d3784754729df57dcc8f90
Reviewed-on: https://go-review.googlesource.com/123776
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-08-16 10:28:01 +00:00
Kunpei Sakai
c394268923 html: don't ignore token when current token is not <template>
Updates golang/go#23071

Change-Id: I36b0ee58f61b7de25730e0fb082eeb7ef2787594
Reviewed-on: https://go-review.googlesource.com/123920
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-08-11 02:16:10 +00:00
Kunpei Sakai
32a936f463 html: don't ignore the token if the current node is form
See: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody

Fixes golang/go#25703
Updates golang/go#23071

Change-Id: I09db4c2d07a242cb45c3e37b499c609809dd0b83
Reviewed-on: https://go-review.googlesource.com/120658
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-07-06 05:13:57 +00:00
Kunpei Sakai
d41e817464 html: handle rb and rtc elements
Updates golang/go#23071

Change-Id: Ifef79e077801422eb273af3e5a541c85c63bfce4
Reviewed-on: https://go-review.googlesource.com/107575
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-04-18 06:21:11 +00:00
Kunpei Sakai
8d16fa6dc9 html: avoid invalid nil pointer access
Updates golang/go#23071

Change-Id: I73d7302c5bde4441aa824093fdcce52e8bb51e31
Reviewed-on: https://go-review.googlesource.com/107379
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-04-17 00:37:50 +00:00
namusyaka
500e7a4f95 html: add "in template" insertion mode support
See:
https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-intemplate

Updates golang/go#23071

Change-Id: I36529b7cf5d2adf159ed5c471fba9f67890b7eb9
Reviewed-on: https://go-review.googlesource.com/94838
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-04-15 21:43:07 +00:00
namusyaka
2e7f24ace3 html: update section numbers
Updates golang/go#23071

See https://html.spec.whatwg.org/multipage/

Change-Id: I1bde6e07ae9270ba7b320474b9bec8ec09a79f16
Reviewed-on: https://go-review.googlesource.com/94355
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2018-02-16 11:01:04 +00:00
Dmitry Savintsev
3d87fd621c x/net/html: Sync the html parser and atom with the current whatwg spec
The current documentation as well as set of atoms and attributes has
gotten slightly out of sync with the current state of the WHATWG
html5 specification. The change adds and removes several of the atoms
and attributes, updates the documentation (such as steps numbering in
inBodyEndTagFormatting) and modifies the spec URLs to https://

Change-Id: I6dfa52785858c1521301b20b1e585e19a08b1e98
Reviewed-on: https://go-review.googlesource.com/6173
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2015-03-03 04:37:39 +00:00
Andrew Gerrand
fbe893ddcd go.net: use golang.org/x/... import paths
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/167030043
2014-11-10 09:04:43 +11:00
Frederick Kelly Mayle III
5755bc4e75 go.net/html: Fix comment handling for "in select" insertion mode
LGTM=andybalholm, nigeltao
R=golang-codereviews, gobot, nigeltao, andybalholm
CC=golang-codereviews
https://golang.org/cl/93680045
2014-06-12 11:53:57 +10:00
Nigel Tao
ea127e889c go.net/html: move exp/html and exp/html/atom here to the go.net
sub-repo.

It's a straight copy, except for these modifications:
* "exp/html" and "exp/html/atom" imports were renamed, and
* the "TODO... When this package moves out of exp" comment was
  deleted from atom/atom.go.

The matching change is at https://golang.org/cl/7317043

The rationale was discussed at
https://groups.google.com/d/topic/golang-nuts/Qq5hTQyPuLg/discussion

R=adg, remyoudompheng, dave
CC=golang-dev
https://golang.org/cl/7310063
2013-02-11 11:55:20 +11:00