Commit Graph

13 Commits

Author SHA1 Message Date
Andrew Balholm
4109fccea4 html: handle '<' before a tag
As pointed out at
https://groups.google.com/forum/#!topic/golang-nuts/LJozHIXAAJY,
`<<p>html</p>` was parsed as `&lt;&lt;p&gt;html</p>`.
There was no test case for this. Chrome parses it as `&lt<p>html</p>`,
and that seems to be correct. We were missing the
"Reconcume the current input character" step at
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state

LGTM=nigeltao
R=golang-codereviews, gobot, nigeltao
CC=golang-codereviews, nigeltao
https://golang.org/cl/96060044
2014-05-12 16:42:14 +10:00
Robert Griesemer
a6927df230 go.net: fix various typos
LGTM=adonovan
R=adonovan
CC=golang-codereviews, golang-dev
https://golang.org/cl/97950043
2014-05-02 14:50:26 -07:00
Michael Piatek
4698117464 go.net/html: Expose data read from the input reader but not yet tokenized in Tokenizer.
This allows clients to efficiently reconstruct the original input in the case of ErrBufferExceeded. TestMaxBufferReconstruction now properly verifies this.

R=bradfitz
CC=golang-codereviews
https://golang.org/cl/47770043
2014-01-06 10:51:23 -08:00
Michael Piatek
384e4d292e html: limit buffering during tokenization.
This is optional. By default, buffering is unlimited.

Fixes golang/go#7053

R=bradfitz
CC=golang-codereviews
https://golang.org/cl/43190044
2014-01-03 13:16:55 -08:00
Michael Piatek
480e7b06ec go.net/html: Tokenizer.Raw returns the original input when tokenizer errors occur.
Two tweaks enable this:
1) Updating the raw and data span pointers when Tokenizer.Next is called, even
if an error has occurred. This prevents duplicate data from being returned by
Raw in the common case of an EOF.

2) Treating '</>' as an empty comment token to expose the raw text as a
tokenization event. (This matches the semantics of other non-token events,
e.g., '</ >' is treated as '<!-- -->'.)

Fixes golang/go#7029.

R=golang-codereviews, r, bradfitz
CC=golang-codereviews
https://golang.org/cl/46370043
2014-01-02 10:51:00 -08:00
Andrew Balholm
3f04d1ffd7 go.net/html/charset: add NewReader
NewReader is a convenience function for finding the encoding of
an io.Reader and making a UTF-8 version of that Reader.

R=nigeltao
CC=golang-dev
https://golang.org/cl/43510043
2013-12-19 17:30:38 +11:00
Andrew Balholm
74213743f3 go.net/html/charset: implement the encoding sniffing algorithm
R=nigeltao
CC=golang-dev
https://golang.org/cl/31220043
2013-12-13 16:04:21 +11:00
Andrew Balholm
7eb0b7e953 go.net/html/charset: encoding names
Lookup now returns the canonical name as well as the Encoding.

This will make it easier for users to discover what encoding they
actually have as a return value from functions in this package.
They will also be able to store the name for re-use.

R=nigeltao, mpvl
CC=golang-dev
https://golang.org/cl/30090043
2013-11-23 10:13:36 +11:00
Andrew Balholm
e2719b3103 go.net/html/charset: new package
Implement retrieving encodings by name, according to the names listed
at http://encoding.spec.whatwg.org/#encodings

This is the first step toward implementing the encoding detection
algorithm.

R=nigeltao
CC=golang-dev
https://golang.org/cl/27110043
2013-11-19 21:51:02 +11:00
Nigel Tao
e8489d83dd go.net/html: fix the tokenizer when the underlying io.Reader returns
either (0, nil) or an (n, err) such that n > 0 && err != nil. Both
cases are valid by the io.Reader contract.

R=r
CC=golang-dev
https://golang.org/cl/12513043
2013-08-07 12:55:39 +10:00
Andrew Gerrand
46c4a49ebb go.net/html: put escaping tests escape_test.go
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/11094043
2013-07-10 17:32:24 +10:00
Shenghou Ma
3651a440a7 go.net/html: don't use Go tip io.ByteWriter
So that Go 1.0 user could also use this package.
Fixes golang/go#4931.

R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/7424044
2013-02-28 16:17:17 +08:00
Nigel Tao
ea127e889c go.net/html: move exp/html and exp/html/atom here to the go.net
sub-repo.

It's a straight copy, except for these modifications:
* "exp/html" and "exp/html/atom" imports were renamed, and
* the "TODO... When this package moves out of exp" comment was
  deleted from atom/atom.go.

The matching change is at https://golang.org/cl/7317043

The rationale was discussed at
https://groups.google.com/d/topic/golang-nuts/Qq5hTQyPuLg/discussion

R=adg, remyoudompheng, dave
CC=golang-dev
https://golang.org/cl/7310063
2013-02-11 11:55:20 +11:00