Michael Pratt
78e6f2a1c8
runtime: rename mapiterinit and mapiternext
...
mapiterinit allows external linkname. These users must allocate their
own iter struct for initialization by mapiterinit. Since the type is
unexported, they also must define the struct themselves. As a result,
they of course define the struct matching the old hiter definition (in
map_noswiss.go).
The old definition is smaller on 32-bit platforms. On those platforms,
mapiternext will clobber memory outside of the caller's allocation.
On all platforms, the pointer layout between the old hiter and new
maps.Iter does not match. Thus the GC may miss pointers and free
reachable objects early, or it may see non-pointers that look like heap
pointers and throw due to invalid references to free objects.
To avoid these issues, we must keep mapiterinit and mapiternext with the
old hiter definition. The most straightforward way to do this is to use
mapiterinit and mapiternext as a compatibility layer between the old and
new iter types.
The first step to that is to move normal map use off of these functions,
which is what this CL does.
Introduce new mapIterStart and mapIterNext functions that replace the
former functions everywhere in the toolchain. These have the same
behavior as the old functions.
This CL temporarily makes the old functions throw to ensure we don't
have hidden dependencies on them. We cannot remove them entirely because
GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj
requires all builtins to exist regardless of GOEXPERIMENT. The next CL
will introduce the compatibility layer.
I want to avoid using linkname between runtime and reflect, as that
would also allow external linknames. So mapIterStart and mapIterNext are
duplicated in reflect, which can be done trivially, as it imports
internal/runtime/maps.
For #71408 .
Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669
Reviewed-on: https://go-review.googlesource.com/c/go/+/643898
Auto-Submit: Michael Pratt <mpratt@google.com >
Reviewed-by: Keith Randall <khr@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@golang.org >
2025-01-28 10:54:43 -08:00
Keith Randall
c5e205e928
internal/runtime/maps: re-enable some tests
...
Re-enable tests for stack-allocated maps and fast map accessors.
Those are implemented now.
Update #54766
Change-Id: I8c019702bd9fb077b2fe3f7c78e8e9e10d2263a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/642376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Michael Pratt <mpratt@google.com >
Auto-Submit: Keith Randall <khr@golang.org >
2025-01-14 09:55:06 -08:00
Keith Randall
44a6f817ea
cmd/compile: fix write barrier coalescing
...
We can't coalesce a non-WB store with a subsequent Move, as the
result of the store might be the source of the move.
There's a simple codegen test. Not sure how we might do a real test,
as all the repro's I've come up with are very expensive and unreliable.
Fixes #71228
Change-Id: If18bf181a266b9b90964e2591cd2e61a7168371c
Reviewed-on: https://go-review.googlesource.com/c/go/+/642197
Reviewed-by: Keith Randall <khr@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
2025-01-12 22:49:39 -08:00
Youlin Feng
c4e6ab9750
cmd/compile: modify CSE to remove redundant OpLocalAddrs
...
Remove the OpLocalAddrs that are unnecessary in the CSE pass, so the
following passes like DSE and memcombine can do its work better.
Fixes #70300
Change-Id: I600025d49eeadb3ca4f092d614428399750f69bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/628075
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: David Chase <drchase@google.com >
Auto-Submit: David Chase <drchase@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@golang.org >
2024-11-22 00:12:03 +00:00
Keith Randall
f0b0109242
cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
...
This came up in some swissmap code.
Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83
Reviewed-on: https://go-review.googlesource.com/c/go/+/629516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: David Chase <drchase@google.com >
2024-11-21 22:57:04 +00:00
Xiaolin Zhao
ab55465098
cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64
...
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 1.7240n ± 0% 0.8120n ± 0% -52.90% (p=0.000 n=20)
TrailingZeros8 1.0530n ± 0% 0.8015n ± 0% -23.88% (p=0.000 n=20)
TrailingZeros16 2.072n ± 0% 1.015n ± 0% -51.01% (p=0.000 n=20)
TrailingZeros32 1.7160n ± 0% 0.8122n ± 0% -52.67% (p=0.000 n=20)
TrailingZeros64 2.0060n ± 0% 0.8125n ± 0% -59.50% (p=0.000 n=20)
geomean 1.669n 0.8470n -49.25%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 2.6275n ± 0% 0.9120n ± 0% -65.29% (p=0.000 n=20)
TrailingZeros8 1.451n ± 0% 1.163n ± 0% -19.85% (p=0.000 n=20)
TrailingZeros16 3.069n ± 0% 1.201n ± 0% -60.87% (p=0.000 n=20)
TrailingZeros32 2.9060n ± 0% 0.9115n ± 0% -68.63% (p=0.000 n=20)
TrailingZeros64 2.6305n ± 0% 0.9115n ± 0% -65.35% (p=0.000 n=20)
geomean 2.456n 1.011n -58.83%
This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui <git@xen0n.name >
Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
2024-11-13 00:57:25 +00:00
Paul E. Murphy
745ec75719
cmd/compile/internal/ssa: improve carry addition rules on PPC64
...
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.
Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Keith Randall <khr@golang.org >
2024-11-12 17:40:44 +00:00
Guoqi Chen
fb9b946adc
cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64
...
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.
Benchmark results on loongson 3A5000 and 3A6000 machines:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10)
OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10)
OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10)
OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10)
OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10)
geomean 3.048n 1.470n -51.78%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10)
OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10)
OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10)
OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10)
OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10)
geomean 2.053n 1.006n -51.01%
Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Meidan Li <limeidan@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
2024-11-12 00:48:04 +00:00
Xiaolin Zhao
583d750fa1
cmd/compile: wire up bits.Reverse intrinsics for loong64
...
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 2.8130n ± 0% 0.8008n ± 0% -71.53% (p=0.000 n=20)
Reverse8 0.7014n ± 0% 0.4040n ± 0% -42.40% (p=0.000 n=20)
Reverse16 1.2975n ± 0% 0.6632n ± 1% -48.89% (p=0.000 n=20)
Reverse32 2.7520n ± 0% 0.4042n ± 0% -85.31% (p=0.000 n=20)
Reverse64 2.8970n ± 0% 0.4041n ± 0% -86.05% (p=0.000 n=20)
geomean 1.828n 0.5116n -72.01%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 4.0050n ± 0% 0.8011n ± 0% -80.00% (p=0.000 n=20)
Reverse8 0.8010n ± 0% 0.5210n ± 1% -34.96% (p=0.000 n=20)
Reverse16 1.6160n ± 0% 0.6008n ± 0% -62.82% (p=0.000 n=20)
Reverse32 3.8550n ± 0% 0.5179n ± 0% -86.57% (p=0.000 n=20)
Reverse64 3.8050n ± 0% 0.5177n ± 0% -86.40% (p=0.000 n=20)
geomean 2.378n 0.5828n -75.49%
Updates #59120
This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui <git@xen0n.name >
Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: David Chase <drchase@google.com >
2024-11-11 00:08:45 +00:00
Xiaolin Zhao
e6cc9d228a
cmd/compile: implement FMA codegen for loong64
...
Benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10)
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10)
Updates #59120
This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name >
Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
2024-11-08 01:05:48 +00:00
Xiaolin Zhao
d6fb0ab2c7
cmd/compile: wire up Bswap/ReverseBytes intrinsics for loong64
...
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.0020n ± 0% 0.4040n ± 0% -79.82% (p=0.000 n=20)
ReverseBytes16 0.8866n ± 1% 0.8007n ± 0% -9.69% (p=0.000 n=20)
ReverseBytes32 1.2195n ± 0% 0.8007n ± 0% -34.34% (p=0.000 n=20)
ReverseBytes64 2.0705n ± 0% 0.8008n ± 0% -61.32% (p=0.000 n=20)
geomean 1.455n 0.6749n -53.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.8040n ± 0% 0.5205n ± 0% -81.44% (p=0.000 n=20)
ReverseBytes16 0.7066n ± 0% 0.8011n ± 0% +13.37% (p=0.000 n=20)
ReverseBytes32 1.5500n ± 0% 0.8010n ± 0% -48.32% (p=0.000 n=20)
ReverseBytes64 2.7665n ± 0% 0.8010n ± 0% -71.05% (p=0.000 n=20)
geomean 1.707n 0.7192n -57.87%
Updates #59120
This patch is a copy of CL 483357.
Co-authored-by: WANG Xuerui <git@xen0n.name >
Change-Id: If355354cd031533df91991fcc3392e5a6c314295
Reviewed-on: https://go-review.googlesource.com/c/go/+/624576
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
2024-11-06 03:12:50 +00:00
Xiaolin Zhao
d98c51809d
cmd/compile: wire up math/bits.Len intrinsics for loong64
...
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name >
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
2024-11-06 00:40:40 +00:00
Xiaolin Zhao
5f88755f43
cmd/compile: add loong64-specific inlining for runtime.memmove
...
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Memmove/0 0.8004n ± 0% 0.4002n ± 0% -50.00% (p=0.000 n=20)
Memmove/1 2.494n ± 0% 2.136n ± 0% -14.35% (p=0.000 n=20)
Memmove/2 2.802n ± 0% 2.512n ± 0% -10.35% (p=0.000 n=20)
Memmove/3 2.802n ± 0% 2.497n ± 0% -10.92% (p=0.000 n=20)
Memmove/4 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20)
Memmove/5 2.821n ± 0% 2.658n ± 0% -5.76% (p=0.000 n=20)
Memmove/6 2.819n ± 0% 2.657n ± 0% -5.73% (p=0.000 n=20)
Memmove/7 2.820n ± 0% 2.654n ± 0% -5.87% (p=0.000 n=20)
Memmove/8 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20)
Memmove/9 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/10 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/11 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/12 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20)
Memmove/13 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/14 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/15 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20)
Memmove/16 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/32 3.602n ± 0% 3.603n ± 0% +0.03% (p=0.000 n=20)
Memmove/64 4.202n ± 0% 4.204n ± 0% +0.05% (p=0.000 n=20)
Memmove/128 8.005n ± 0% 8.007n ± 0% +0.02% (p=0.000 n=20)
Memmove/256 11.21n ± 0% 10.81n ± 0% -3.57% (p=0.000 n=20)
Memmove/512 17.65n ± 0% 17.96n ± 0% +1.73% (p=0.000 n=20)
Memmove/1024 30.48n ± 0% 30.46n ± 0% -0.07% (p=0.000 n=20)
Memmove/2048 56.43n ± 0% 56.30n ± 0% -0.24% (p=0.000 n=20)
Memmove/4096 107.7n ± 0% 107.6n ± 0% -0.09% (p=0.000 n=20)
MemmoveOverlap/32 4.002n ± 0% 4.003n ± 0% +0.02% (p=0.002 n=20)
MemmoveOverlap/64 4.603n ± 0% 4.603n ± 0% ~ (p=0.286 n=20)
MemmoveOverlap/128 8.704n ± 0% 8.699n ± 0% ~ (p=0.180 n=20)
MemmoveOverlap/256 12.01n ± 0% 11.76n ± 0% -2.08% (p=0.000 n=20)
MemmoveOverlap/512 18.42n ± 0% 18.36n ± 0% -0.33% (p=0.000 n=20)
MemmoveOverlap/1024 31.23n ± 0% 31.16n ± 0% -0.21% (p=0.000 n=20)
MemmoveOverlap/2048 57.42n ± 0% 56.82n ± 0% -1.04% (p=0.000 n=20)
MemmoveOverlap/4096 108.5n ± 0% 108.0n ± 0% -0.46% (p=0.000 n=20)
MemmoveUnalignedDst/0 2.804n ± 0% 2.447n ± 0% -12.70% (p=0.000 n=20)
MemmoveUnalignedDst/1 2.802n ± 0% 2.491n ± 0% -11.12% (p=0.000 n=20)
MemmoveUnalignedDst/2 3.202n ± 0% 2.808n ± 0% -12.29% (p=0.000 n=20)
MemmoveUnalignedDst/3 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20)
MemmoveUnalignedDst/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/5 3.202n ± 0% 3.203n ± 0% +0.03% (p=0.014 n=20)
MemmoveUnalignedDst/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/10 3.602n ± 0% 3.602n ± 0% ~ (p=0.091 n=20)
MemmoveUnalignedDst/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.613 n=20)
MemmoveUnalignedDst/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.165 n=20)
MemmoveUnalignedDst/13 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/14 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/15 3.602n ± 0% 3.602n ± 0% 0.00% (p=0.027 n=20)
MemmoveUnalignedDst/16 3.602n ± 0% 3.602n ± 0% ~ (p=0.661 n=20)
MemmoveUnalignedDst/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/64 6.804n ± 0% 6.804n ± 0% ~ (p=0.204 n=20)
MemmoveUnalignedDst/128 12.61n ± 0% 12.61n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedDst/256 16.33n ± 2% 16.32n ± 2% ~ (p=0.839 n=20)
MemmoveUnalignedDst/512 25.61n ± 0% 24.71n ± 0% -3.51% (p=0.000 n=20)
MemmoveUnalignedDst/1024 42.81n ± 0% 42.82n ± 0% ~ (p=0.973 n=20)
MemmoveUnalignedDst/2048 74.86n ± 0% 76.03n ± 0% +1.56% (p=0.000 n=20)
MemmoveUnalignedDst/4096 152.0n ± 11% 152.0n ± 0% 0.00% (p=0.013 n=20)
MemmoveUnalignedDstOverlap/32 5.319n ± 0% 5.558n ± 1% +4.50% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/64 8.006n ± 0% 8.025n ± 0% +0.24% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/128 9.631n ± 0% 9.601n ± 0% -0.31% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/256 13.79n ± 2% 13.58n ± 1% ~ (p=0.234 n=20)
MemmoveUnalignedDstOverlap/512 21.38n ± 0% 21.30n ± 0% -0.37% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/1024 41.71n ± 0% 41.70n ± 0% ~ (p=0.887 n=20)
MemmoveUnalignedDstOverlap/2048 81.63n ± 0% 81.61n ± 0% ~ (p=0.481 n=20)
MemmoveUnalignedDstOverlap/4096 162.6n ± 0% 162.6n ± 0% ~ (p=0.171 n=20)
MemmoveUnalignedSrc/0 2.808n ± 0% 2.482n ± 0% -11.61% (p=0.000 n=20)
MemmoveUnalignedSrc/1 2.804n ± 0% 2.577n ± 0% -8.08% (p=0.000 n=20)
MemmoveUnalignedSrc/2 3.202n ± 0% 2.806n ± 0% -12.37% (p=0.000 n=20)
MemmoveUnalignedSrc/3 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20)
MemmoveUnalignedSrc/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/5 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/10 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.746 n=20)
MemmoveUnalignedSrc/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.407 n=20)
MemmoveUnalignedSrc/13 3.603n ± 0% 3.602n ± 0% -0.03% (p=0.001 n=20)
MemmoveUnalignedSrc/14 3.603n ± 0% 3.602n ± 0% -0.01% (p=0.013 n=20)
MemmoveUnalignedSrc/15 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/16 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/64 4.803n ± 0% 4.803n ± 0% 0.00% (p=0.008 n=20)
MemmoveUnalignedSrc/128 8.405n ± 0% 8.405n ± 0% 0.00% (p=0.003 n=20)
MemmoveUnalignedSrc/256 12.04n ± 3% 12.20n ± 2% ~ (p=0.151 n=20)
MemmoveUnalignedSrc/512 19.11n ± 0% 19.10n ± 3% ~ (p=0.621 n=20)
MemmoveUnalignedSrc/1024 35.62n ± 0% 35.62n ± 0% ~ (p=0.407 n=20)
MemmoveUnalignedSrc/2048 68.04n ± 0% 68.35n ± 0% +0.46% (p=0.000 n=20)
MemmoveUnalignedSrc/4096 133.2n ± 1% 133.3n ± 0% ~ (p=0.131 n=20)
MemmoveUnalignedSrcDst/f_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=0.661 n=20)
MemmoveUnalignedSrcDst/f_16_7 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_7 4.203n ± 0% 4.202n ± 0% -0.02% (p=0.008 n=20)
MemmoveUnalignedSrcDst/f_64_0 6.103n ± 0% 6.100n ± 0% ~ (p=0.595 n=20)
MemmoveUnalignedSrcDst/b_64_0 6.103n ± 0% 6.102n ± 0% ~ (p=0.973 n=20)
MemmoveUnalignedSrcDst/f_64_1 7.419n ± 0% 7.226n ± 0% -2.59% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_1 6.745n ± 0% 6.941n ± 0% +2.89% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_4 7.420n ± 0% 7.223n ± 0% -2.65% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_4 6.753n ± 0% 6.941n ± 0% +2.79% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_7 7.423n ± 0% 7.204n ± 0% -2.96% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_7 6.750n ± 0% 6.941n ± 0% +2.83% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_0 12.96n ± 0% 12.99n ± 0% +0.27% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_256_0 12.91n ± 0% 12.94n ± 0% +0.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_1 17.21n ± 0% 17.21n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_1 17.61n ± 0% 17.61n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_4 16.21n ± 0% 16.21n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_4 16.41n ± 0% 16.41n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_7 14.12n ± 0% 14.10n ± 0% ~ (p=0.307 n=20)
MemmoveUnalignedSrcDst/b_256_7 14.81n ± 0% 14.81n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_0 109.3n ± 0% 109.4n ± 0% +0.09% (p=0.004 n=20)
MemmoveUnalignedSrcDst/b_4096_0 109.6n ± 0% 109.6n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_4096_1 113.5n ± 0% 113.5n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_4096_1 113.7n ± 0% 113.7n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_4 112.3n ± 0% 112.3n ± 0% ~ (p=0.763 n=20)
MemmoveUnalignedSrcDst/b_4096_4 112.6n ± 0% 112.9n ± 1% +0.31% (p=0.032 n=20)
MemmoveUnalignedSrcDst/f_4096_7 110.6n ± 0% 110.6n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/b_4096_7 111.1n ± 0% 111.1n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_65536_0 4.801µ ± 0% 4.818µ ± 0% +0.34% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_0 5.027µ ± 0% 5.036µ ± 0% +0.19% (p=0.007 n=20)
MemmoveUnalignedSrcDst/f_65536_1 4.815µ ± 0% 4.729µ ± 0% -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_1 4.659µ ± 0% 4.737µ ± 1% +1.69% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_4 4.807µ ± 0% 4.721µ ± 0% -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_4 4.659µ ± 0% 4.601µ ± 0% -1.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_7 4.868µ ± 0% 4.759µ ± 0% -2.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_7 4.665µ ± 0% 4.709µ ± 0% +0.93% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/32 6.804n ± 0% 6.810n ± 0% +0.09% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/64 10.41n ± 0% 10.42n ± 0% +0.10% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/128 11.59n ± 0% 11.58n ± 0% ~ (p=0.414 n=20)
MemmoveUnalignedSrcOverlap/256 14.22n ± 0% 14.29n ± 0% +0.46% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/512 23.11n ± 0% 23.04n ± 0% -0.28% (p=0.001 n=20)
MemmoveUnalignedSrcOverlap/1024 41.44n ± 0% 41.47n ± 0% ~ (p=0.693 n=20)
MemmoveUnalignedSrcOverlap/2048 81.25n ± 0% 81.25n ± 0% ~ (p=0.405 n=20)
MemmoveUnalignedSrcOverlap/4096 166.1n ± 0% 166.1n ± 0% ~ (p=0.451 n=20)
geomean 13.02n 12.69n -2.51%
¹ all samples are equal
Change-Id: I712adc7670f6ae360714ec5a770d00d76c8700ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/618815
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
2024-11-05 00:44:11 +00:00
Xiaolin Zhao
aef81a7551
cmd/compile: add rules to optimize go codes to constant 0 on loong64
...
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
BinaryTree17 7.735 ± 1% 7.716 ± 1% -0.23% (p=0.041 n=15)
Fannkuch11 2.645 ± 0% 2.646 ± 0% +0.05% (p=0.013 n=15)
FmtFprintfEmpty 35.87n ± 0% 35.89n ± 0% +0.06% (p=0.000 n=15)
FmtFprintfString 59.54n ± 0% 59.47n ± 0% ~ (p=0.213 n=15)
FmtFprintfInt 62.23n ± 0% 62.06n ± 0% ~ (p=0.212 n=15)
FmtFprintfIntInt 98.16n ± 0% 97.90n ± 0% -0.26% (p=0.000 n=15)
FmtFprintfPrefixedInt 117.0n ± 0% 116.7n ± 0% -0.26% (p=0.000 n=15)
FmtFprintfFloat 204.6n ± 0% 204.2n ± 0% -0.20% (p=0.000 n=15)
FmtManyArgs 456.3n ± 0% 455.4n ± 0% -0.20% (p=0.000 n=15)
GobDecode 7.210m ± 0% 7.156m ± 1% -0.75% (p=0.000 n=15)
GobEncode 8.143m ± 1% 8.177m ± 1% ~ (p=0.806 n=15)
Gzip 280.2m ± 0% 279.7m ± 0% -0.19% (p=0.005 n=15)
Gunzip 32.71m ± 0% 32.65m ± 0% -0.19% (p=0.000 n=15)
HTTPClientServer 53.76µ ± 0% 53.65µ ± 0% ~ (p=0.083 n=15)
JSONEncode 9.297m ± 0% 9.295m ± 0% ~ (p=0.806 n=15)
JSONDecode 46.97m ± 1% 47.07m ± 1% ~ (p=0.683 n=15)
Mandelbrot200 4.602m ± 0% 4.600m ± 0% -0.05% (p=0.001 n=15)
GoParse 4.682m ± 0% 4.670m ± 1% -0.25% (p=0.001 n=15)
RegexpMatchEasy0_32 59.80n ± 0% 59.63n ± 0% -0.28% (p=0.000 n=15)
RegexpMatchEasy0_1K 458.3n ± 0% 457.3n ± 0% -0.22% (p=0.001 n=15)
RegexpMatchEasy1_32 59.39n ± 0% 59.23n ± 0% -0.27% (p=0.000 n=15)
RegexpMatchEasy1_1K 557.9n ± 0% 556.6n ± 0% -0.23% (p=0.001 n=15)
RegexpMatchMedium_32 803.6n ± 0% 801.8n ± 0% -0.22% (p=0.001 n=15)
RegexpMatchMedium_1K 27.32µ ± 0% 27.26µ ± 0% -0.21% (p=0.000 n=15)
RegexpMatchHard_32 1.385µ ± 0% 1.382µ ± 0% -0.22% (p=0.000 n=15)
RegexpMatchHard_1K 40.93µ ± 0% 40.83µ ± 0% -0.24% (p=0.000 n=15)
Revcomp 474.8m ± 0% 474.3m ± 0% ~ (p=0.250 n=15)
Template 77.41m ± 1% 76.63m ± 1% -1.01% (p=0.023 n=15)
TimeParse 271.1n ± 0% 271.2n ± 0% +0.04% (p=0.022 n=15)
TimeFormat 290.0n ± 0% 289.8n ± 0% ~ (p=0.118 n=15)
geomean 51.73µ 51.64µ -0.18%
Change-Id: I45a1e6c85bb3cea0f62766ec932432803e9af10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/619315
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn >
Reviewed-by: Meidan Li <limeidan@loongson.cn >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
Reviewed-by: Michael Pratt <mpratt@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
2024-10-29 01:17:54 +00:00
Youlin Feng
bb07aa644b
cmd/compile: add shift optimization test
...
For #69635
Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621755
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Michael Pratt <mpratt@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Auto-Submit: Michael Pratt <mpratt@google.com >
2024-10-25 15:35:29 +00:00
Youlin Feng
711552e98a
cmd/compile: optimize type switch for a single runtime known type with a case var
...
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/618476
Reviewed-by: Keith Randall <khr@golang.org >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Keith Randall <khr@google.com >
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com >
2024-10-25 02:56:11 +00:00
Paul E. Murphy
1846dd5a31
cmd/compile/internal/ssa: fix PPC64 shift codegen regression
...
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes #70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Keith Randall <khr@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
2024-10-24 17:32:18 +00:00
Xiaolin Zhao
91d07ac71c
cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
...
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MemclrKnownSize1 2.4070n ± 0% 0.4004n ± 0% -83.37% (p=0.000 n=20)
MemclrKnownSize2 2.1365n ± 0% 0.4004n ± 0% -81.26% (p=0.000 n=20)
MemclrKnownSize4 2.4445n ± 0% 0.4004n ± 0% -83.62% (p=0.000 n=20)
MemclrKnownSize8 2.4200n ± 0% 0.4004n ± 0% -83.45% (p=0.000 n=20)
MemclrKnownSize16 2.8030n ± 0% 0.8007n ± 0% -71.43% (p=0.000 n=20)
MemclrKnownSize32 2.803n ± 0% 1.602n ± 0% -42.85% (p=0.000 n=20)
MemclrKnownSize64 3.250n ± 0% 2.402n ± 0% -26.08% (p=0.000 n=20)
MemclrKnownSize112 6.006n ± 0% 2.819n ± 0% -53.06% (p=0.000 n=20)
MemclrKnownSize128 6.006n ± 0% 3.240n ± 0% -46.05% (p=0.000 n=20)
MemclrKnownSize192 6.807n ± 0% 5.205n ± 0% -23.53% (p=0.000 n=20)
MemclrKnownSize248 7.608n ± 0% 6.301n ± 0% -17.19% (p=0.000 n=20)
MemclrKnownSize256 7.608n ± 0% 6.707n ± 0% -11.84% (p=0.000 n=20)
MemclrKnownSize512 13.61n ± 0% 13.61n ± 0% ~ (p=0.374 n=20)
MemclrKnownSize1024 26.43n ± 0% 26.43n ± 0% ~ (p=0.826 n=20)
MemclrKnownSize4096 103.3n ± 0% 103.3n ± 0% ~ (p=1.000 n=20)
MemclrKnownSize512KiB 26.29µ ± 0% 26.29µ ± 0% -0.00% (p=0.012 n=20)
geomean 10.05n 5.006n -50.18%
| bench.old | bench.new |
| B/s | B/s vs base |
MemclrKnownSize1 396.2Mi ± 0% 2381.9Mi ± 0% +501.21% (p=0.000 n=20)
MemclrKnownSize2 892.8Mi ± 0% 4764.0Mi ± 0% +433.59% (p=0.000 n=20)
MemclrKnownSize4 1.524Gi ± 0% 9.305Gi ± 0% +510.56% (p=0.000 n=20)
MemclrKnownSize8 3.079Gi ± 0% 18.609Gi ± 0% +504.42% (p=0.000 n=20)
MemclrKnownSize16 5.316Gi ± 0% 18.609Gi ± 0% +250.05% (p=0.000 n=20)
MemclrKnownSize32 10.63Gi ± 0% 18.61Gi ± 0% +75.00% (p=0.000 n=20)
MemclrKnownSize64 18.34Gi ± 0% 24.81Gi ± 0% +35.27% (p=0.000 n=20)
MemclrKnownSize112 17.37Gi ± 0% 37.01Gi ± 0% +113.08% (p=0.000 n=20)
MemclrKnownSize128 19.85Gi ± 0% 36.80Gi ± 0% +85.39% (p=0.000 n=20)
MemclrKnownSize192 26.27Gi ± 0% 34.35Gi ± 0% +30.77% (p=0.000 n=20)
MemclrKnownSize248 30.36Gi ± 0% 36.66Gi ± 0% +20.75% (p=0.000 n=20)
MemclrKnownSize256 31.34Gi ± 0% 35.55Gi ± 0% +13.43% (p=0.000 n=20)
MemclrKnownSize512 35.02Gi ± 0% 35.03Gi ± 0% +0.00% (p=0.030 n=20)
MemclrKnownSize1024 36.09Gi ± 0% 36.09Gi ± 0% ~ (p=0.101 n=20)
MemclrKnownSize4096 36.93Gi ± 0% 36.93Gi ± 0% +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB 18.57Gi ± 0% 18.57Gi ± 0% +0.00% (p=0.041 n=20)
geomean 10.13Gi 20.33Gi +100.72%
Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
Reviewed-by: Michael Pratt <mpratt@google.com >
2024-10-24 08:55:31 +00:00
Keith Randall
74163c895a
cmd/compile: use STP/LDP around morestack on arm64
...
The spill/restore code around morestack is almost never exectued, so
we should make it as small as possible. Using 2-register loads/stores
makes sense here. Also, the offsets from SP are pretty small so the
offset almost always fits in the (smaller than a normal load/store)
offset field of the instruction.
Makes cmd/go 0.6% smaller.
Change-Id: I8845283c1b269a259498153924428f6173bda293
Reviewed-on: https://go-review.googlesource.com/c/go/+/621556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-10-22 16:23:12 +00:00
Xiaolin Zhao
ef3e1dae2f
cmd/compile: optimize loong64 with register indexed load/store
...
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20)
Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20)
FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20)
FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20)
FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20)
FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20)
FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20)
FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20)
GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20)
GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20)
Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20)
Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20)
HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20)
JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20)
JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20)
Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20)
GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20)
RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20)
RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20)
RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20)
Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20)
Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20)
TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20)
TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20)
geomean 51.98µ 50.82µ -2.22%
Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
Reviewed-by: Michael Knyszek <mknyszek@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
2024-10-17 07:32:25 +00:00
Cuong Manh Le
7e2487cf65
cmd/compile: avoid dynamic type when possible
...
If the expression type is a single compile-time known type, use that
type instead of the dynamic one, so the later passes of the compiler
could skip un-necessary runtime calls.
Thanks Youlin Feng for writing the original test case.
Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773
Reviewed-on: https://go-review.googlesource.com/c/go/+/616975
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-10-07 19:12:01 +00:00
Xiaolin Zhao
f243cf6016
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on loong64
...
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values like arm64 and mips64.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 15.98n ± 0% 15.94n ± 0% -0.25% (p=0.000 n=20)
Acosh 27.75n ± 0% 25.56n ± 0% -7.89% (p=0.000 n=20)
Asin 15.85n ± 0% 15.76n ± 0% -0.57% (p=0.000 n=20)
Asinh 39.79n ± 0% 37.69n ± 0% -5.28% (p=0.000 n=20)
Atan 7.261n ± 0% 7.242n ± 0% -0.27% (p=0.000 n=20)
Atanh 28.30n ± 0% 27.62n ± 0% -2.40% (p=0.000 n=20)
Atan2 15.85n ± 0% 15.75n ± 0% -0.63% (p=0.000 n=20)
Cbrt 27.02n ± 0% 21.08n ± 0% -21.98% (p=0.000 n=20)
Ceil 2.830n ± 1% 2.896n ± 1% +2.31% (p=0.000 n=20)
Copysign 0.8022n ± 0% 0.8004n ± 0% -0.22% (p=0.000 n=20)
Cos 11.64n ± 0% 11.61n ± 0% -0.26% (p=0.000 n=20)
Cosh 35.98n ± 0% 33.44n ± 0% -7.05% (p=0.000 n=20)
Erf 10.09n ± 0% 10.08n ± 0% -0.10% (p=0.000 n=20)
Erfc 11.40n ± 0% 11.35n ± 0% -0.44% (p=0.000 n=20)
Erfinv 12.31n ± 0% 12.29n ± 0% -0.16% (p=0.000 n=20)
Erfcinv 12.16n ± 0% 12.17n ± 0% +0.08% (p=0.000 n=20)
Exp 28.41n ± 0% 26.44n ± 0% -6.95% (p=0.000 n=20)
ExpGo 28.68n ± 0% 27.07n ± 0% -5.60% (p=0.000 n=20)
Expm1 17.21n ± 0% 16.75n ± 0% -2.67% (p=0.000 n=20)
Exp2 24.71n ± 0% 23.01n ± 0% -6.88% (p=0.000 n=20)
Exp2Go 25.17n ± 0% 23.91n ± 0% -4.99% (p=0.000 n=20)
Abs 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.224 n=20)
Dim 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
Floor 2.848n ± 0% 2.859n ± 0% +0.39% (p=0.000 n=20)
Max 3.074n ± 0% 3.071n ± 0% ~ (p=0.481 n=20)
Min 3.179n ± 0% 3.176n ± 0% -0.09% (p=0.003 n=20)
Mod 49.62n ± 0% 44.82n ± 0% -9.67% (p=0.000 n=20)
Frexp 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=20)
Gamma 18.01n ± 0% 17.61n ± 0% -2.22% (p=0.000 n=20)
Hypot 7.204n ± 0% 7.604n ± 0% +5.55% (p=0.000 n=20)
HypotGo 7.204n ± 0% 7.604n ± 0% +5.56% (p=0.000 n=20)
Ilogb 6.003n ± 0% 6.003n ± 0% ~ (p=0.407 n=20)
J0 76.43n ± 0% 76.24n ± 0% -0.25% (p=0.000 n=20)
J1 76.44n ± 0% 76.44n ± 0% ~ (p=1.000 n=20)
Jn 168.2n ± 0% 168.5n ± 0% +0.18% (p=0.000 n=20)
Ldexp 8.804n ± 0% 7.604n ± 0% -13.63% (p=0.000 n=20)
Lgamma 19.01n ± 0% 19.01n ± 0% ~ (p=0.695 n=20)
Log 19.38n ± 0% 19.12n ± 0% -1.34% (p=0.000 n=20)
Logb 6.003n ± 0% 6.003n ± 0% ~ (p=1.000 n=20)
Log1p 18.57n ± 0% 16.72n ± 0% -9.96% (p=0.000 n=20)
Log10 20.67n ± 0% 20.45n ± 0% -1.06% (p=0.000 n=20)
Log2 9.605n ± 0% 8.804n ± 0% -8.34% (p=0.000 n=20)
Modf 4.402n ± 0% 4.402n ± 0% ~ (p=1.000 n=20)
Nextafter32 7.204n ± 0% 5.603n ± 0% -22.22% (p=0.000 n=20)
Nextafter64 6.803n ± 0% 6.003n ± 0% -11.76% (p=0.000 n=20)
PowInt 39.62n ± 0% 37.22n ± 0% -6.06% (p=0.000 n=20)
PowFrac 120.9n ± 0% 108.9n ± 0% -9.93% (p=0.000 n=20)
Pow10Pos 1.601n ± 0% 1.601n ± 0% ~ (p=0.487 n=20)
Pow10Neg 2.675n ± 0% 2.675n ± 0% ~ (p=1.000 n=20)
Round 3.018n ± 0% 2.401n ± 0% -20.46% (p=0.000 n=20)
RoundToEven 3.822n ± 0% 3.001n ± 0% -21.48% (p=0.000 n=20)
Remainder 45.62n ± 0% 42.42n ± 0% -7.01% (p=0.000 n=20)
Signbit 0.9075n ± 0% 0.8004n ± 0% -11.81% (p=0.000 n=20)
Sin 12.65n ± 0% 12.65n ± 0% ~ (p=0.503 n=20)
Sincos 14.81n ± 0% 14.60n ± 0% -1.42% (p=0.000 n=20)
Sinh 36.75n ± 0% 35.11n ± 0% -4.46% (p=0.000 n=20)
SqrtIndirect 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
SqrtLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtIndirectLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtGoLatency 52.85n ± 0% 40.82n ± 0% -22.76% (p=0.000 n=20)
SqrtPrime 887.4n ± 0% 887.4n ± 0% ~ (p=0.751 n=20)
Tan 13.95n ± 0% 13.97n ± 0% +0.18% (p=0.000 n=20)
Tanh 36.79n ± 0% 34.89n ± 0% -5.16% (p=0.000 n=20)
Trunc 2.849n ± 0% 2.861n ± 0% +0.42% (p=0.000 n=20)
Y0 77.44n ± 0% 77.64n ± 0% +0.26% (p=0.000 n=20)
Y1 74.41n ± 0% 74.33n ± 0% -0.11% (p=0.000 n=20)
Yn 158.7n ± 0% 159.0n ± 0% +0.19% (p=0.000 n=20)
Float64bits 0.8774n ± 0% 0.4002n ± 0% -54.39% (p=0.000 n=20)
Float64frombits 0.8042n ± 0% 0.4002n ± 0% -50.24% (p=0.000 n=20)
Float32bits 1.1230n ± 0% 0.5336n ± 0% -52.48% (p=0.000 n=20)
Float32frombits 1.0670n ± 0% 0.8004n ± 0% -24.99% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.605 n=20)
geomean 10.87n 10.10n -7.15%
¹ all samples are equal
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 33.10n ± 0% 31.95n ± 2% -3.46% (p=0.000 n=20)
Acosh 58.38n ± 0% 50.44n ± 0% -13.60% (p=0.000 n=20)
Asin 32.70n ± 0% 31.94n ± 0% -2.32% (p=0.000 n=20)
Asinh 57.65n ± 0% 50.83n ± 0% -11.82% (p=0.000 n=20)
Atan 14.21n ± 0% 14.21n ± 0% ~ (p=0.501 n=20)
Atanh 60.86n ± 0% 54.44n ± 0% -10.56% (p=0.000 n=20)
Atan2 32.02n ± 0% 34.02n ± 0% +6.25% (p=0.000 n=20)
Cbrt 55.58n ± 0% 40.64n ± 0% -26.88% (p=0.000 n=20)
Ceil 9.566n ± 0% 9.566n ± 0% ~ (p=0.463 n=20)
Copysign 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.806 n=20)
Cos 18.02n ± 0% 18.02n ± 0% ~ (p=0.191 n=20)
Cosh 64.44n ± 0% 65.64n ± 0% +1.86% (p=0.000 n=20)
Erf 16.15n ± 0% 16.16n ± 0% ~ (p=0.770 n=20)
Erfc 18.71n ± 0% 18.83n ± 0% +0.61% (p=0.000 n=20)
Erfinv 19.33n ± 0% 19.34n ± 0% ~ (p=0.513 n=20)
Erfcinv 18.90n ± 0% 19.78n ± 0% +4.63% (p=0.000 n=20)
Exp 50.04n ± 0% 49.66n ± 0% -0.75% (p=0.000 n=20)
ExpGo 50.03n ± 0% 50.03n ± 0% ~ (p=0.723 n=20)
Expm1 28.41n ± 0% 28.27n ± 0% -0.49% (p=0.000 n=20)
Exp2 50.08n ± 0% 51.23n ± 0% +2.31% (p=0.000 n=20)
Exp2Go 49.77n ± 0% 49.89n ± 0% +0.24% (p=0.000 n=20)
Abs 0.8009n ± 0% 0.8006n ± 0% ~ (p=0.317 n=20)
Dim 1.987n ± 0% 1.993n ± 0% +0.28% (p=0.001 n=20)
Floor 8.543n ± 0% 8.548n ± 0% ~ (p=0.509 n=20)
Max 6.670n ± 0% 6.672n ± 0% ~ (p=0.335 n=20)
Min 6.694n ± 0% 6.694n ± 0% ~ (p=0.459 n=20)
Mod 56.44n ± 0% 53.23n ± 0% -5.70% (p=0.000 n=20)
Frexp 8.409n ± 0% 7.606n ± 0% -9.55% (p=0.000 n=20)
Gamma 35.64n ± 0% 35.23n ± 0% -1.15% (p=0.000 n=20)
Hypot 11.21n ± 0% 10.61n ± 0% -5.31% (p=0.000 n=20)
HypotGo 11.50n ± 0% 11.01n ± 0% -4.30% (p=0.000 n=20)
Ilogb 7.606n ± 0% 6.804n ± 0% -10.54% (p=0.000 n=20)
J0 125.3n ± 0% 126.5n ± 0% +0.96% (p=0.000 n=20)
J1 124.9n ± 0% 125.3n ± 0% +0.32% (p=0.000 n=20)
Jn 264.3n ± 0% 265.9n ± 0% +0.61% (p=0.000 n=20)
Ldexp 9.606n ± 0% 9.204n ± 0% -4.19% (p=0.000 n=20)
Lgamma 38.82n ± 0% 38.85n ± 0% +0.06% (p=0.019 n=20)
Log 38.44n ± 0% 28.04n ± 0% -27.06% (p=0.000 n=20)
Logb 8.405n ± 0% 7.605n ± 0% -9.52% (p=0.000 n=20)
Log1p 31.62n ± 0% 27.11n ± 0% -14.26% (p=0.000 n=20)
Log10 38.83n ± 0% 28.42n ± 0% -26.81% (p=0.000 n=20)
Log2 11.21n ± 0% 10.41n ± 0% -7.14% (p=0.000 n=20)
Modf 5.204n ± 0% 5.205n ± 0% ~ (p=0.983 n=20)
Nextafter32 8.809n ± 0% 7.208n ± 0% -18.18% (p=0.000 n=20)
Nextafter64 8.405n ± 0% 8.406n ± 0% +0.01% (p=0.007 n=20)
PowInt 48.83n ± 0% 44.78n ± 0% -8.28% (p=0.000 n=20)
PowFrac 146.9n ± 0% 142.1n ± 0% -3.23% (p=0.000 n=20)
Pow10Pos 2.334n ± 0% 2.333n ± 0% ~ (p=0.110 n=20)
Pow10Neg 4.803n ± 0% 4.803n ± 0% ~ (p=0.130 n=20)
Round 4.816n ± 0% 3.819n ± 0% -20.70% (p=0.000 n=20)
RoundToEven 5.735n ± 0% 5.204n ± 0% -9.26% (p=0.000 n=20)
Remainder 52.05n ± 0% 49.64n ± 0% -4.63% (p=0.000 n=20)
Signbit 1.201n ± 0% 1.001n ± 0% -16.65% (p=0.000 n=20)
Sin 20.63n ± 0% 20.64n ± 0% +0.05% (p=0.040 n=20)
Sincos 23.82n ± 0% 24.62n ± 0% +3.36% (p=0.000 n=20)
Sinh 71.25n ± 0% 68.44n ± 0% -3.94% (p=0.000 n=20)
SqrtIndirect 2.001n ± 0% 2.001n ± 0% ~ (p=0.182 n=20)
SqrtLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.754 n=20)
SqrtIndirectLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.773 n=20)
SqrtGoLatency 60.84n ± 0% 81.26n ± 0% +33.56% (p=0.000 n=20)
SqrtPrime 1.791µ ± 0% 1.791µ ± 0% ~ (p=0.784 n=20)
Tan 27.22n ± 0% 27.22n ± 0% ~ (p=0.819 n=20)
Tanh 70.88n ± 0% 69.04n ± 0% -2.60% (p=0.000 n=20)
Trunc 8.543n ± 0% 8.543n ± 0% ~ (p=0.784 n=20)
Y0 122.9n ± 0% 122.9n ± 0% ~ (p=0.559 n=20)
Y1 123.3n ± 0% 121.7n ± 0% -1.30% (p=0.000 n=20)
Yn 263.0n ± 0% 262.6n ± 0% -0.15% (p=0.000 n=20)
Float64bits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float64frombits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float32bits 1.7010n ± 0% 0.8005n ± 0% -52.94% (p=0.000 n=20)
Float32frombits 1.5010n ± 0% 0.8005n ± 0% -46.67% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.238 n=20)
geomean 17.41n 16.15n -7.19%
Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/604737
Auto-Submit: Tim King <taking@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
Reviewed-by: Meidan Li <limeidan@loongson.cn >
Reviewed-by: Tim King <taking@google.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
2024-09-13 19:29:23 +00:00
Xiaolin Zhao
2c5b707b3b
cmd/compile: optimize RotateLeft8/16 on loong64
...
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
RotateLeft8 1.401n ± 0% 1.201n ± 0% -14.28% (p=0.000 n=20)
RotateLeft16 1.4010n ± 0% 0.8032n ± 0% -42.67% (p=0.000 n=20)
geomean 1.401n 0.9822n -29.90%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
RotateLeft8 1.576n ± 0% 1.310n ± 0% -16.88% (p=0.000 n=20)
RotateLeft16 1.576n ± 0% 1.166n ± 0% -26.02% (p=0.000 n=20)
geomean 1.576n 1.236n -21.58%
Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604738
Auto-Submit: abner chenc <chenguoqi@loongson.cn >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Tim King <taking@google.com >
2024-09-13 17:15:09 +00:00
Meng Zhuo
2982253c42
test/codegen: add Rotate test for riscv64
...
Change-Id: I7d996b8d46fbeef933943f806052a30f1f8d50c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/588836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Joel Sing <joel@sing.id.au >
Reviewed-by: Tim King <taking@google.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
2024-09-11 01:37:00 +00:00
Paschalis Tsilias
fe69121bc5
cmd/compile: optimize []byte(string1 + string2)
...
This CL optimizes the compilation of string-to-bytes conversion in the
case of string additions.
Fixes #62407
Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/527935
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com >
Reviewed-by: Keith Randall <khr@golang.org >
Auto-Submit: Keith Randall <khr@golang.org >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Tim King <taking@google.com >
2024-09-10 21:20:57 +00:00
Joel Sing
e126129d76
cmd/compile/internal/ssa: combine shift and addition for riscv64 rva22u64
...
When GORISCV64 enables rva22u64, combined shift and addition using the
SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba
extension. This results in more than 2000 instructions being removed
from the Go binary on riscv64.
Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8
Reviewed-on: https://go-review.googlesource.com/c/go/+/606636
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Mark Ryan <markdryan@rivosinc.com >
Reviewed-by: Michael Pratt <mpratt@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com >
2024-08-28 13:46:24 +00:00
Keith Randall
36b45bca66
cmd/compile: regalloc: drop values that aren't used until after a call
...
No point in keeping values in registers when their next use is after
a call, as we'd have to spill/restore them anyway.
cmd/go is 0.1% smaller.
Fixes #59297
Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/509255
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Michael Pratt <mpratt@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-08-26 22:29:43 +00:00
Paul E. Murphy
2b0a157d68
cmd/compile: intrinsify math.MulUintptr on PPC64
...
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Archana Ravindar <aravinda@redhat.com >
Reviewed-by: Michael Pratt <mpratt@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-08-26 17:02:43 +00:00
Joel Sing
02a9f51011
test/codegen: add initial codegen tests for integer min/max
...
Change-Id: I006370053748edbec930c7279ee88a805009aa0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/606976
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
2024-08-23 15:17:17 +00:00
Keith Randall
b2cdaf7346
cmd/compile: improve unneeded zeroing removal
...
After newobject, we don't need to write zeroes to initialize the
object. It has already been zeroed by the allocator.
This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.
Fixes #68845
Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com >
Auto-Submit: Keith Randall <khr@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
2024-08-14 18:16:29 +00:00
khr@golang.org
7273509466
cmd/compile: add additional arm64 bit field rules
...
Get rid of TODO in prove pass.
We currently avoid marking shifts of constants as bounded, where
bounded means we don't have to worry about <0 or >=bitwidth shifts.
We do this because it causes different rule applications during lowering
which cause some codegen tests to fail.
Add some new rules which ensure that we get the right final instruction
sequence regardless of the ordering. Then we can remove this special case.
Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205
Reviewed-on: https://go-review.googlesource.com/c/go/+/602935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Michael Knyszek <mknyszek@google.com >
2024-08-12 21:03:55 +00:00
khr@golang.org
9b4268c3df
cmd/compile: simplify prove pass
...
We don't need noLimit checks in a bunch of places.
Also simplify folding of provable constant results.
At this point in the CL stack, compilebench reports no performance
changes. The only thing of note is that binaries got a bit smaller.
name old text-bytes new text-bytes delta
HelloSize 960kB ± 0% 952kB ± 0% -0.83% (p=0.000 n=10+10)
CmdGoSize 12.3MB ± 0% 12.1MB ± 0% -1.53% (p=0.000 n=10+10)
Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/599197
Reviewed-by: David Chase <drchase@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Michael Knyszek <mknyszek@google.com >
2024-08-07 16:08:20 +00:00
khr@golang.org
3b96eebcbd
cmd/compile: rewrite the constant parts of the prove pass
...
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.
Fixes #66826
Fixes #66692
Fixes #48213
Update #57959
TODO: remove constant logic from poset code, no longer needed.
Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Michael Knyszek <mknyszek@google.com >
2024-08-07 16:07:33 +00:00
Xiaolin Zhao
ff14e08cd3
cmd/compile, math: improve implementation of math.{Max,Min} on loong64
...
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20)
Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20)
MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20)
MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20)
geomean 16.18n 3.792n -76.57%
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20)
Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20)
MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20)
MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20)
geomean 23.37n 7.566n -67.63%
Updates #59120 .
Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: abner chenc <chenguoqi@loongson.cn >
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: David Chase <drchase@google.com >
2024-08-07 01:16:28 +00:00
Michael Pratt
1985c0ccf9
cmd/compile,runtime: disable swissmap fast variants
...
Temporary measure to reduce the required MVP code.
For #54766 .
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a
Reviewed-on: https://go-review.googlesource.com/c/go/+/580915
Reviewed-by: Keith Randall <khr@golang.org >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
2024-08-02 16:47:38 +00:00
Keith Randall
c18ff29295
cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64
...
Update #61395
Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00
Reviewed-on: https://go-review.googlesource.com/c/go/+/594738
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Nicolas Hillegeer <aktau@google.com >
2024-07-23 21:29:38 +00:00
Keith Randall
f66db49976
cmd/compile: store constant floats using integer constants
...
x86 is better at storing constant ints than constant floats.
(It uses a constant directly in the instruction stream, instead of
loading it from a constant global memory.)
Noticed as part of #67957
Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d
Reviewed-on: https://go-review.googlesource.com/c/go/+/592596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-07-23 20:53:57 +00:00
Paul E. Murphy
d5e5b14305
cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64
...
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.
Fixes #67844 .
Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Keith Randall <khr@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
2024-06-07 19:02:52 +00:00
Meng Zhuo
019353d532
test/codegen: add Mul test for riscv64
...
Change-Id: I51e9832317e5dee1e3fe0772e7592b3dae95a625
Reviewed-on: https://go-review.googlesource.com/c/go/+/586797
Reviewed-by: Keith Randall <khr@golang.org >
Reviewed-by: Keith Randall <khr@google.com >
Auto-Submit: Keith Randall <khr@golang.org >
Reviewed-by: Cherry Mui <cherryyz@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
2024-05-23 18:51:17 +00:00
Paul E. Murphy
c6d142c4a7
cmd/compile/internal/ssa: fix ppc64 merging of (CLRLSLDI (SRD ...))
...
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.
Fixes #67526
Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
Reviewed-by: Carlos Amedee <carlos@golang.org >
2024-05-21 18:53:43 +00:00
Paul E. Murphy
d11e417285
cmd/compile/internal/ssa: cleanup ANDCCconst rewrite rules on PPC64
...
Avoid creating duplicate usages of ANDCCconst. This is preparation for
a patch to reintroduce ANDconst to simplify the lower pass while
treating ANDCCconst like other *CC* ssa opcodes.
Also, move many of the similar rules wich retarget ANDCCconst users
to the flag result to a common rule for all compares against zero.
Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/585400
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
2024-05-17 15:28:00 +00:00
Paul E. Murphy
0222a028f1
cmd/compile/internal/ssa: combine more shift and masking on PPC64
...
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
2024-05-15 13:27:41 +00:00
Paul E. Murphy
7994da4cc1
cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into RLWINM
...
This provides a small performance bump to crc64 as measured on ppc64le/power10:
name old time/op new time/op delta
Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18%
Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83%
Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46%
Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20%
Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04%
Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73%
Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
Reviewed-by: Eli Bendersky <eliben@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
2024-05-03 21:12:29 +00:00
khr@golang.org
1a0b86375f
cmd/compile: remove redundant calls to cmpstring
...
The results of cmpstring are reuseable if the second call has the
same arguments and memory.
Note that this gets rid of cmpstring, but we still generate a
redundant </<= test and branch afterwards, because the compiler
doesn't know that cmpstring only ever returns -1,0,1.
Update #61725
Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/578835
Reviewed-by: David Chase <drchase@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
2024-04-19 16:31:02 +00:00
Paul E. Murphy
ebf7747dbe
cmd/internal/obj/ppc64: on Power10, use xxspltidp for float constants
...
Any normal float32 constant can be generated by this instruction;
use xxspltidp when possible. This prefixed instruction is much
faster than the two instruction load sequence from the
float32/float64 constant pool.
Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/575555
Reviewed-by: Than McIntosh <thanm@google.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
TryBot-Result: Gopher Robot <gobot@golang.org >
Run-TryBot: Paul Murphy <murp@ibm.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
2024-04-04 15:24:29 +00:00
Cuong Manh Le
973befe714
cmd/compile: check ODEREF for safe lhs in assignment during static init
...
For #66585
Change-Id: Iddc407e3ef4c3b6ecf5173963b66b3e65e43c92d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Keith Randall <khr@golang.org >
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com >
Reviewed-by: Keith Randall <khr@google.com >
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com >
2024-04-02 17:12:59 +00:00
Paul E. Murphy
dfb17c126c
cmd/compile: support float min/max instructions on PPC64
...
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
TryBot-Result: Gopher Robot <gobot@golang.org >
Run-TryBot: Paul Murphy <murp@ibm.com >
Reviewed-by: Than McIntosh <thanm@google.com >
2024-04-01 18:50:29 +00:00
Andrey Bokhanko
0ae8468b20
cmd/compile,cmd/go,cmd/internal,runtime: remove dynamic checks for atomics for ARM64 targets that support LSE
...
Remove dynamic checks for atomic instructions for ARM64 targets that support LSE extension.
For #66131
Change-Id: I0ec1b183a3f4ea4c8a537430646e6bc4b4f64271
Reviewed-on: https://go-review.googlesource.com/c/go/+/569536
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com >
Reviewed-by: Shu-Chun Weng <scw@google.com >
2024-03-21 20:08:06 +00:00
Keith Randall
802473cfda
cmd/compile: include constant bools in memcombine
...
Constant bools are like constant 1-byte values, they memcombine just fine.
(There are still trickier cases that this pass doesn't catch
yet, see TODO at memcombine.go:503.)
Fixes #66413
Change-Id: Ia67cf72ed1c416e27ac22da443bd88a3f09a6cc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/573416
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: David Chase <drchase@google.com >
Reviewed-by: Joseph Tsai <joetsai@digital-static.net >
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com >
Reviewed-by: Keith Randall <khr@google.com >
2024-03-21 19:45:41 +00:00
Paul E. Murphy
c7065bb9db
cmd/compile/internal: generate ADDZE on PPC64
...
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.
Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org >
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com >
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com >
Reviewed-by: Cherry Mui <cherryyz@google.com >
Reviewed-by: Michael Knyszek <mknyszek@google.com >
Run-TryBot: Paul Murphy <murp@ibm.com >
2024-03-15 17:57:45 +00:00