Andy Balholm
f1d3149ecb
html/charset: replace EUC-KR test
...
The old test for EUC-KR was copied from the first web page that I could
find that was encoded in EUC-KR; the new one is the first line of
golang.org/x/text/internal/testtext.Korean.
Change-Id: I3de076256c935088a06138056cde216190766a6d
Reviewed-on: https://go-review.googlesource.com/18063
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org >
2016-01-08 17:00:32 +00:00
Marcel van Lohuizen
68a055e15f
html/charset: verify correct UTF-8 behavior
...
Change-Id: I4083c38468981128c3d74310cd02335c35eafa5d
Reviewed-on: https://go-review.googlesource.com/17966
Reviewed-by: Andy Balholm <andy@balholm.com >
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com >
2015-12-19 10:34:51 +00:00
Marcel van Lohuizen
9b9d6d8d11
html/charset: handle unsupported code points for encoding
...
Change-Id: I11ffc61623496fae6b32e678c91f7609d71aefe5
Reviewed-on: https://go-review.googlesource.com/17961
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com >
Reviewed-by: Andy Balholm <andy@balholm.com >
2015-12-17 16:33:40 +00:00
Marcel van Lohuizen
d28a91ad26
html/charset: use x/text/encoding/htmlindex
...
Saves duplication of work.
Change-Id: I33c715f33cb6cacd8522e480dc96ae71475c5b3c
Reviewed-on: https://go-review.googlesource.com/17805
Reviewed-by: Andy Balholm <andy@balholm.com >
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com >
2015-12-17 11:26:21 +00:00
Marcel van Lohuizen
72b0708b72
html/charmap: update table with latest data
...
Change-Id: I7ae395999a3e61afa3a6ee15d076edae73d8a83b
Reviewed-on: https://go-review.googlesource.com/17800
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com >
Reviewed-by: Andy Balholm <andy@balholm.com >
2015-12-14 17:39:18 +00:00
Ian Lance Taylor
05bc443e7e
html: remove license references from benchmark test data
...
The license references puzzle programs that grep for licenses.
Fixes golang/go#13573 .
Change-Id: I601fbc6ba2b189b476af1082c48fb02cd72f59d8
Reviewed-on: https://go-review.googlesource.com/17714
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org >
2015-12-11 03:42:21 +00:00
Dmitri Shuralyov
edab5dc413
html: Use existing standard library interface internally.
...
Now that Go 1.1 is out, commit 3651a440a7
can be reverted.
Change-Id: I7ac8478aafaa5067630e99cec9eca59792107892
Reviewed-on: https://go-review.googlesource.com/11612
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org >
2015-06-28 04:31:38 +00:00
Ian Lance Taylor
e0403b4e00
html/charset/testdata: update licensing info in README
...
Update the licensing info following the instructions at
http://www.w3.org/Consortium/Legal/2008/04-testsuite-copyright.html#howtouse
Fixes golang/go#10398 .
Change-Id: Ib37b2a696a5287f41d4e85da4eb7ec1cbb346301
Reviewed-on: https://go-review.googlesource.com/8978
Reviewed-by: Martin Packman <martin.packman@canonical.com >
Reviewed-by: Rob Pike <r@golang.org >
2015-05-08 23:18:43 +00:00
Andy Balholm
6460565bec
x/net/html/charset: Change NewReaderByName to NewReaderLabel.
...
Change-Id: Ic4d1df0c4f7048a3e2472cca09ef9390bcfd149d
Reviewed-on: https://go-review.googlesource.com/4533
Reviewed-by: Rob Pike <r@golang.org >
2015-04-03 23:56:49 +00:00
Dmitry Savintsev
3d87fd621c
x/net/html: Sync the html parser and atom with the current whatwg spec
...
The current documentation as well as set of atoms and attributes has
gotten slightly out of sync with the current state of the WHATWG
html5 specification. The change adds and removes several of the atoms
and attributes, updates the documentation (such as steps numbering in
inBodyEndTagFormatting) and modifies the spec URLs to https://
Change-Id: I6dfa52785858c1521301b20b1e585e19a08b1e98
Reviewed-on: https://go-review.googlesource.com/6173
Reviewed-by: Nigel Tao <nigeltao@golang.org >
2015-03-03 04:37:39 +00:00
Andy Balholm
ec18079348
x/net/html/charset: add NewReaderByName
...
This provides a CharsetReader function for xml.Decoder.
Change-Id: Id00787bbdee90d267d38c84c98a06f9e10d93336
Reviewed-on: https://go-review.googlesource.com/4420
Reviewed-by: Nigel Tao <nigeltao@golang.org >
2015-02-10 23:47:13 +00:00
David Symonds
8aa6e209cb
net: add import comments.
...
Change-Id: Ifab0fdaec1d810d268b7c19ad30f476802203b37
2014-12-09 14:17:11 +11:00
Mikio Hara
ccf541d876
x/net/html/charset: add missing copyright
...
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/174240043
2014-11-17 10:54:40 +09:00
Mikio Hara
716c3ccf9b
x/net/html/charset: fix nacl build
...
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/177880043
2014-11-17 10:54:21 +09:00
Andrew Gerrand
fbe893ddcd
go.net: use golang.org/x/... import paths
...
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/167030043
2014-11-10 09:04:43 +11:00
Frederick Kelly Mayle III
5755bc4e75
go.net/html: Fix comment handling for "in select" insertion mode
...
LGTM=andybalholm, nigeltao
R=golang-codereviews, gobot, nigeltao, andybalholm
CC=golang-codereviews
https://golang.org/cl/93680045
2014-06-12 11:53:57 +10:00
Andrew Balholm
4109fccea4
html: handle '<' before a tag
...
As pointed out at
https://groups.google.com/forum/#!topic/golang-nuts/LJozHIXAAJY ,
`<<p>html</p>` was parsed as `<<p>html</p>`.
There was no test case for this. Chrome parses it as `<<p>html</p>`,
and that seems to be correct. We were missing the
"Reconcume the current input character" step at
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state
LGTM=nigeltao
R=golang-codereviews, gobot, nigeltao
CC=golang-codereviews, nigeltao
https://golang.org/cl/96060044
2014-05-12 16:42:14 +10:00
Robert Griesemer
a6927df230
go.net: fix various typos
...
LGTM=adonovan
R=adonovan
CC=golang-codereviews, golang-dev
https://golang.org/cl/97950043
2014-05-02 14:50:26 -07:00
Michael Piatek
4698117464
go.net/html: Expose data read from the input reader but not yet tokenized in Tokenizer.
...
This allows clients to efficiently reconstruct the original input in the case of ErrBufferExceeded. TestMaxBufferReconstruction now properly verifies this.
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/47770043
2014-01-06 10:51:23 -08:00
Michael Piatek
384e4d292e
html: limit buffering during tokenization.
...
This is optional. By default, buffering is unlimited.
Fixes golang/go#7053
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/43190044
2014-01-03 13:16:55 -08:00
Michael Piatek
480e7b06ec
go.net/html: Tokenizer.Raw returns the original input when tokenizer errors occur.
...
Two tweaks enable this:
1) Updating the raw and data span pointers when Tokenizer.Next is called, even
if an error has occurred. This prevents duplicate data from being returned by
Raw in the common case of an EOF.
2) Treating '</>' as an empty comment token to expose the raw text as a
tokenization event. (This matches the semantics of other non-token events,
e.g., '</ >' is treated as '<!-- -->'.)
Fixes golang/go#7029 .
R=golang-codereviews, r, bradfitz
CC=golang-codereviews
https://golang.org/cl/46370043
2014-01-02 10:51:00 -08:00
Andrew Balholm
3f04d1ffd7
go.net/html/charset: add NewReader
...
NewReader is a convenience function for finding the encoding of
an io.Reader and making a UTF-8 version of that Reader.
R=nigeltao
CC=golang-dev
https://golang.org/cl/43510043
2013-12-19 17:30:38 +11:00
Andrew Balholm
74213743f3
go.net/html/charset: implement the encoding sniffing algorithm
...
R=nigeltao
CC=golang-dev
https://golang.org/cl/31220043
2013-12-13 16:04:21 +11:00
Andrew Balholm
7eb0b7e953
go.net/html/charset: encoding names
...
Lookup now returns the canonical name as well as the Encoding.
This will make it easier for users to discover what encoding they
actually have as a return value from functions in this package.
They will also be able to store the name for re-use.
R=nigeltao, mpvl
CC=golang-dev
https://golang.org/cl/30090043
2013-11-23 10:13:36 +11:00
Andrew Balholm
e2719b3103
go.net/html/charset: new package
...
Implement retrieving encodings by name, according to the names listed
at http://encoding.spec.whatwg.org/#encodings
This is the first step toward implementing the encoding detection
algorithm.
R=nigeltao
CC=golang-dev
https://golang.org/cl/27110043
2013-11-19 21:51:02 +11:00
Nigel Tao
e8489d83dd
go.net/html: fix the tokenizer when the underlying io.Reader returns
...
either (0, nil) or an (n, err) such that n > 0 && err != nil. Both
cases are valid by the io.Reader contract.
R=r
CC=golang-dev
https://golang.org/cl/12513043
2013-08-07 12:55:39 +10:00
Andrew Gerrand
46c4a49ebb
go.net/html: put escaping tests escape_test.go
...
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/11094043
2013-07-10 17:32:24 +10:00
Shenghou Ma
3651a440a7
go.net/html: don't use Go tip io.ByteWriter
...
So that Go 1.0 user could also use this package.
Fixes golang/go#4931 .
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/7424044
2013-02-28 16:17:17 +08:00
Nigel Tao
ea127e889c
go.net/html: move exp/html and exp/html/atom here to the go.net
...
sub-repo.
It's a straight copy, except for these modifications:
* "exp/html" and "exp/html/atom" imports were renamed, and
* the "TODO... When this package moves out of exp" comment was
deleted from atom/atom.go.
The matching change is at https://golang.org/cl/7317043
The rationale was discussed at
https://groups.google.com/d/topic/golang-nuts/Qq5hTQyPuLg/discussion
R=adg, remyoudompheng, dave
CC=golang-dev
https://golang.org/cl/7310063
2013-02-11 11:55:20 +11:00