Since CL 391014, cmd/compile now requires the -p flag to be set the
build system. This CL changes it to initialize LocalPkg.Path to the
provided path, rather than relying on writing out `"".` into object
files and expecting cmd/link to substitute them.
However, this actually involved a rather long tail of fixes. Many have
already been submitted, but a few notable ones that have to land
simultaneously with changing LocalPkg:
1. When compiling package runtime, there are really two "runtime"
packages: types.LocalPkg (the source package itself) and
ir.Pkgs.Runtime (the compiler's internal representation, for synthetic
references). Previously, these ended up creating separate link
symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both
end up as `runtime.xxx`, which causes lsym collisions (notably
inittask and funcsyms).
2. test/codegen tests need to be updated to expect symbols to be named
`command-line-arguments.xxx` rather than `"".foo`.
3. The issue20014 test case is sensitive to the sort order of field
tracking symbols. In particular, the local package now sorts to its
natural place in the list, rather than to the front.
Thanks to David Chase for helping track down all of the fixes needed
for this CL.
Updates #51734.
Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565
Reviewed-on: https://go-review.googlesource.com/c/go/+/393715
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
math/bits.Add64 and math/bits.Sub64 now lower and optimize
directly in SSA form.
The optimization of carry chains focuses around eliding
XER<->GPR transfers of the CA bit when used exclusively as an
input to a single carry operations, or when the CA value is
known.
This also adds support for handling XER spills in the assembler
which could happen if carry chains contain inter-dependencies
on each other (which seems very unlikely with practical usage),
or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA).
With PPC64 Add64/Sub64 lowering into SSA and this patch, the net
performance difference in crypto/elliptic benchmarks on P9/ppc64le
are:
name old time/op new time/op delta
ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34%
ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14%
ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14%
ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27%
ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17%
ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56%
ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86%
ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32%
MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63%
MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06%
MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42%
MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87%
MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93%
MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03%
MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81%
MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67%
Note, P256 uses an asm implementation, thus, little variation is expected.
Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71
Reviewed-on: https://go-review.googlesource.com/c/go/+/346870
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
It is hit ~70k times building go.
This make the go binary, 0.04% smaller.
I didn't included benchmarks because this is just constant foldings
and is hard to mesure objectively.
For example, this enable rewriting things like:
if x == 20 {
return x + 30 + z
}
Into:
if x == 20 {
return 50 + z
}
It's not just fixing programer's code,
the ssa generator generate code like this sometimes.
Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc
GitHub-Last-Rev: 4c2f9b5216
GitHub-Pull-Request: golang/go#52669
Reviewed-on: https://go-review.googlesource.com/c/go/+/403735
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
In the load tests, we only want to test the assembly produced by
the load operations. If we use the global variable sink, it will produce
one load operation and one store operation(assign to sink).
For example:
func load_be64(b []byte) uint64 {
sink64 = binary.BigEndian.Uint64(b)
}
If we compile this function with GOAMD64=v3, it may produce MOVBEQload
and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload.
Discovered when developing CL 395474.
Same for the store tests.
Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e
Reviewed-on: https://go-review.googlesource.com/c/go/+/397875
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction.
For #47120
Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a
Reviewed-on: https://go-review.googlesource.com/c/go/+/385174
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using
LZCNT can avoid a special case for zero input. Except that case,
LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x).
And according to https://www.agner.org/optimize/instruction_tables.pdf,
LZCNT instructions are much faster than BSR on AMD CPU.
name old time/op new time/op delta
LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9)
LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9)
LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10)
LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10)
LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10)
Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a
Reviewed-on: https://go-review.googlesource.com/c/go/+/396794
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently we only include static entries in the hint for sizing
the map when allocating a map for a map literal. Change that to
include all entries.
This will be an overallocation if the dynamic entries in the map have
equal keys, but equal keys in map literals are rare, and at worst we
waste a bit of space.
Fixes#43020
Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/383634
Run-TryBot: Keith Randall <khr@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
The code generated when storing eight bytes loaded from memory in big
endian introduced two successive byte swaps that did not actually
modified the data.
The new rules match this specific pattern both for amd64 and for arm64,
eliminating the double swap.
Fixes#41684
Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3
Reviewed-on: https://go-review.googlesource.com/c/go/+/320073
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8
bytes size. This PoC is using the same method as CL 248404.
This optimization fires about 100 times in Go compiler (1675 occurrences
reduced to 1574, so -6%).
Also, added unit-tests to codegen/comparisions.go file.
Updates #37275
Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da
Reviewed-on: https://go-review.googlesource.com/c/go/+/328291
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Trust: David Chase <drchase@google.com>
In case of amd64 the compiler issues checks if extensions are
available on a platform. With GOAMD64 microarchitecture levels
provided, some of the checks could be eliminated.
Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087
Reviewed-on: https://go-review.googlesource.com/c/go/+/352010
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.
In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.
Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.
Arithmetic right shift sequences like:
117fc: 00100513 li a0,1
11800: 04053593 sltiu a1,a0,64
11804: fff58593 addi a1,a1,-1
11808: 0015e593 ori a1,a1,1
1180c: 40b45433 sra s0,s0,a1
Are now a single srai instruction:
117fc: 40145413 srai s0,s0,0x1
Likewise for logical left shift (and logical right shift):
1d560: 01100413 li s0,17
1d564: 04043413 sltiu s0,s0,64
1d568: 40800433 neg s0,s0
1d56c: 01131493 slli s1,t1,0x11
1d570: 0084f433 and s0,s1,s0
Which are now a single slli (or srli) instruction:
1d120: 01131413 slli s0,t1,0x11
This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.
Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Also, add the FABSS and FABSD pseudo instructions to the assembler.
The compiler could use FSGNJX[SD] directly but there doesn't seem
to be much advantage to doing so and the pseudo instructions are
easier to understand.
Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d
Reviewed-on: https://go-review.googlesource.com/c/go/+/348989
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
In some rewrite rules for arm64 bitfield optimizations, the
bitfield lsb value and the bitfield width value are related
to datasize, some of them use datasize directly to check the
bitfield lsb value is valid, to get the bitfiled width value,
but some of them call isARM64BFMask() and arm64BFWidth()
functions. In order to be consistent, this patch changes them
all to use datasize.
Besides, this patch sorts the codegen test cases.
Run the "toolstash-check -all" command and find one inconsistent code
is as the following.
new: src/math/fma.go:104 BEQ 247
master: src/math/fma.go:104 BEQ 248
The above inconsistence is due to this patch changing the range of the
field lsb value in "UBFIZ" optimization rules from "lc+(32|16|8)<64" to
"lc<64", so that the following code is generated as "UBFIZ". The logical
of changed code is still correct.
The code of src/math/fma.go:160:
const uvinf = 0x7FF0000000000000
func FMA(a, b uint32) float64 {
ps := a+b
return Float64frombits(uint64(ps)<<63 | uvinf)
}
The new assembly code:
TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
MOVWU "".a(FP), R0
MOVWU "".b+4(FP), R1
ADD R1, R0, R0
UBFIZ $63, R0, $1, R0
ORR $9218868437227405312, R0, R0
MOVD R0, "".~r2+8(FP)
RET (R30)
The master assembly code:
TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
MOVWU "".a(FP), R0
MOVWU "".b+4(FP), R1
ADD R1, R0, R0
MOVWU R0, R0
LSL $63, R0, R0
ORR $9218868437227405312, R0, R0
MOVD R0, "".~r2+8(FP)
RET (R30)
Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35
Reviewed-on: https://go-review.googlesource.com/c/go/+/265038
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This test is not executed by default (see #48247) and does not
actually pass. It was added in CL 346689. The code generation
changes made in that CL only change how instructions are assembled,
they do not actually affect the output of the compiler. This test
is unfortunately therefore invalid and will never pass.
Updates #48247.
Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f
Reviewed-on: https://go-review.googlesource.com/c/go/+/348390
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
The UBFX and SBFX already zero/sign extend the result. Further
zero/sign extensions are thus unnecessary as long as they leave
the top bits unaltered. This patch absorbs zero/sign extensions
into UBFX/SBFX.
Add the related test cases.
Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265039
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions.
Argument order is:
OP RS1, RS2, RS3, RD
Also, add support for the FMA intrinsic to the compiler. Automatic
FMA matching is left to a future CL.
Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/293030
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>