797 Commits

Author SHA1 Message Date
Julian Zhu
e7e45d770c math: add assembly func archExp and archExp2 for riscv64
goos: linux
goarch: riscv64
pkg: math
                       │   math-old    │               math-new               │
                       │    sec/op     │    sec/op     vs base                │
Exp-64                    41.21n ±  0%    32.03n ±  0%  -22.28% (p=0.000 n=8)
Exp2-64                   38.86n ±  1%    28.18n ±  0%  -27.49% (p=0.000 n=8)
Exp2Go-64                 40.36n ±  1%    40.51n ±  1%   +0.36% (p=0.049 n=8)
Frexp-64                  5.681n ±  1%    5.446n ±  0%   -4.14% (p=0.000 n=8)
Ldexp-64                  7.676n ±  1%    7.555n ±  0%   -1.58% (p=0.001 n=8)

Change-Id: Ic122bf9598302f947c6dbf751db591f403c50373
Reviewed-on: https://go-review.googlesource.com/c/go/+/754687
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2026-03-27 21:02:30 -07:00
Meng Zhuo
f6be78b539 math: add benchmark for float32/float64 conversions
This CL adds benchmarks for float32/float64 conversions
for following functions:

- Abs
- Sqrt
- Round
- RoundToEven
- Floor
- Ceil
- Trunc

Update #75463

Change-Id: Idb57e2e28a0fb59321e480cd676115c6e540cf4f
Reviewed-on: https://go-review.googlesource.com/c/go/+/734680
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
2026-03-01 16:43:52 -08:00
limeidan
287451f09c math/big: optimize the implementation of lshVU on loong64
Change-Id: I1aac166aea4f907a7fb93028a39ef9d1e3888c9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/743800
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2026-02-12 10:17:33 -08:00
Xiaolin Zhao
7f0f671951 math: optimize the floating-point pipeline on loong64
Using the FSEL instruction on loong64 to eliminate branches and reduce
pipeline interruptions.

On the Loongson CPU 3A6000, there is a 0.09% performance improvement, as follows:
goos: linux
goarch: loong64
pkg: math/big
cpu: Loongson-3A6000-HV @ 2500.00MHz
        │  old.bench  │             new.bench              │
        │   sec/op    │   sec/op     vs base               │
Exp       7.748m ± 0%   7.740m ± 0%  -0.10% (p=0.001 n=10)
Exp2      7.747m ± 0%   7.741m ± 0%  -0.09% (p=0.002 n=10)
geomean   7.747m        7.740m       -0.09%

Change-Id: If62f2e81bf345c83a1fa9350ace131240cfa3b9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/693458
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2026-01-28 17:01:37 -08:00
mohanson
c04335e33a math/pow10: remove overlapping boundary (n=0)
Both the first and second conditions include n=0, creating an
implicit logic that relies on order of evaluation.

Change-Id: Iefbc25a676242fabf272203a3e845363585bff56
Reviewed-on: https://go-review.googlesource.com/c/go/+/730060
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2026-01-23 08:26:22 -08:00
mohanson
23f7ba554d math: use shared signMask constant
In abs.go and copysign.go, magic numbers and local constants are
used for the sign bit mask (1 << 63), even though a shared constant
signMask already exists in bits.go.

Change-Id: Ic3aeb9b52674538443cbe074acfeb373a3c74a8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/732060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
2026-01-23 08:23:44 -08:00
Srinivas Pokala
489d3dafb7 math: switch s390x math.Pow to generic implementation
The s390x assembly implementation of math.Pow incorrectly handles
certain subnormal cases. This change switches the function to use the
generic implementation instead.

Updates #76247

Cq-Include-Trybots: luci.golang.try:gotip-linux-s390x
Change-Id: I794339080d5a7acf79bbffaeb0214809006fd30c
Reviewed-on: https://go-review.googlesource.com/c/go/+/720540
Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Kiran M Vijay IBM <kiran.m.vijay@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-11-19 13:32:20 -08:00
Alan Donovan
4bfc3a9d14 std,cmd: go fix -any std cmd
This change mechanically replaces all occurrences of interface{}
by 'any' (where deemed safe by the 'any' modernizer) throughout
std and cmd, minus their vendor trees.

Since this fix is relatively numerous, it gets its own CL.

Also, 'go generate go/types'.

Change-Id: I14a6b52856c3291c1d27935409bca8d5fd4242a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/719702
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Alan Donovan <adonovan@google.com>
2025-11-11 19:59:40 -08:00
Joel Sing
bc5ffe5c79 internal/runtime/sys,math/bits: eliminate bounds checks on len8tab
The compiler cannot currently determine that the accesses to len8tab
are within bounds. Cast to uint8 to avoid unnecessary bounds checks.

Fixes #76166

Change-Id: I1fd930bba2b20d3998252c476308642e08ce00b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/718040
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-10 09:48:20 -08:00
Tobias Klauser
b5aefe07e5 all: remove unnecessary loop variable copies in tests
Copying the loop variable is no longer necessary since Go 1.22.

Change-Id: Iebb21dac44a20ec200567f1d786f105a4ee4999d
Reviewed-on: https://go-review.googlesource.com/c/go/+/711640
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-17 13:10:27 -07:00
Michael Munday
3e596d448f math: rename Modf parameter int to integer
Avoid using int as a parameter name. Also, rename frac to
fractional for consistency.

Addresses comment on CL 694896:
https://go-review.googlesource.com/c/go/+/694896/comment/a9723a07_8352e3aa/

Change-Id: Icedeecf65ad2f51d4e8d5bcf6e64c0eae9885dec
Reviewed-on: https://go-review.googlesource.com/c/go/+/699035
Auto-Submit: Sean Liao <sean@liao.dev>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Joel Sing <joel@sing.id.au>
2025-09-03 06:50:43 -07:00
Michael Munday
25c2d4109f math: use Trunc to implement Modf
By implementing Modf using Trunc, rather than the other way round,
we can get a significant performance improvement on platforms where
Trunc is implemented as an intrinsic.

Trunc is implemented as an intrinsic on ppc64x and arm64 so the assembly
implementations of Modf are no longer needed (the compiler can generate
very similar code that can now potentially be inlined).

GOAMD64=v1

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
        │       sec/op       │    sec/op     vs base                │
Gamma            4.257n ± 0%    3.890n ± 0%   -8.61% (p=0.000 n=10)
Modf            1.6110n ± 0%   0.4243n ± 0%  -73.67% (p=0.000 n=10)
geomean          2.619n         1.285n       -50.94%

GOAMD64=v2

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
        │       sec/op       │    sec/op     vs base                │
Gamma            4.100n ± 1%    3.717n ± 0%   -9.35% (p=0.000 n=10)
Modf            1.6070n ± 0%   0.2158n ± 1%  -86.57% (p=0.000 n=10)
geomean          2.567n        0.8957n       -65.11%

Change-Id: I689a560c344cf1d39ef002b540749bacc3179786
Reviewed-on: https://go-review.googlesource.com/c/go/+/694896
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-25 12:46:11 -07:00
Michael Munday
4e05a070c4 math: implement IsInf using Abs
Abs is an intrinsic (or a relatively cheap operation) on most
architectures. Using it in IsInf typically saves a branch when
`sign` is 0 (note the `sign` variable is typically a constant).

This change doesn't make a huge difference on amd64 (these
benchmarks are fairly noisy too) but removing the branch will
allow rewrite rules to detect and optimize infinity checks on
other architectures. For example, riscv64 can check for
infinities with the FCLASSD instruction and s390x can use the
TCDB instruction.

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
                    │          sec/op          │    sec/op      vs base                │
Acos                              4.317n ±  1%    4.321n ±  0%        ~ (p=0.466 n=10)
Acosh                             8.857n ±  1%    8.411n ±  2%   -5.05% (p=0.001 n=10)
Asin                              4.260n ±  1%    4.204n ±  6%   -1.31% (p=0.021 n=10)
Asinh                             10.63n ±  2%    10.37n ±  0%   -2.49% (p=0.000 n=10)
Atan                              2.493n ±  1%    2.368n ±  0%   -5.01% (p=0.000 n=10)
Atanh                             8.820n ±  4%    8.770n ±  2%        ~ (p=0.579 n=10)
Atan2                             4.212n ±  1%    4.066n ± 11%   -3.45% (p=0.023 n=10)
Cbrt                              4.859n ±  0%    4.845n ±  0%   -0.29% (p=0.000 n=10)
Ceil                             0.3877n ±  3%   0.2514n ±  0%  -35.17% (p=0.000 n=10)
Copysign                         0.3479n ±  2%   0.4179n ±  0%  +20.14% (p=0.000 n=10)
Cos                               4.734n ±  2%    4.486n ±  0%   -5.26% (p=0.000 n=10)
Cosh                              5.244n ±  0%    5.071n ±  0%   -3.29% (p=0.000 n=10)
Erf                               2.975n ±  1%    2.788n ±  0%   -6.29% (p=0.000 n=10)
Erfc                              3.259n ±  1%    3.121n ±  0%   -4.23% (p=0.000 n=10)
Erfinv                            4.015n ±  1%    3.904n ±  0%   -2.76% (p=0.000 n=10)
Erfcinv                           4.166n ±  1%    4.039n ±  0%   -3.04% (p=0.000 n=10)
Exp                               3.567n ±  1%    3.429n ±  0%   -3.87% (p=0.000 n=10)
ExpGo                             9.173n ±  1%    8.368n ±  2%   -8.78% (p=0.000 n=10)
Expm1                             4.466n ±  3%    4.419n ±  0%   -1.05% (p=0.000 n=10)
Exp2                              8.328n ±  0%    8.046n ±  0%   -3.39% (p=0.000 n=10)
Exp2Go                            8.796n ±  5%    8.237n ±  2%   -6.36% (p=0.000 n=10)
Abs                              0.2400n ±  2%   0.2144n ±  0%  -10.71% (p=0.000 n=10)
Dim                              0.4077n ±  3%   0.3795n ±  1%   -6.91% (p=0.000 n=10)
Floor                            0.3616n ±  2%   0.2528n ±  3%  -30.10% (p=0.000 n=10)
Max                               1.401n ±  1%    1.344n ±  1%   -4.14% (p=0.000 n=10)
Min                               1.391n ±  1%    1.345n ±  1%   -3.27% (p=0.000 n=10)
Mod                               15.45n ±  1%    15.62n ±  2%        ~ (p=0.066 n=10)
Frexp                             1.838n ±  2%    1.605n ±  1%  -12.70% (p=0.000 n=10)
Gamma                             4.465n ±  1%    4.458n ±  1%        ~ (p=0.256 n=10)
Hypot                             2.237n ±  1%    2.208n ±  0%   -1.32% (p=0.000 n=10)
HypotGo                           2.610n ±  3%    2.663n ±  5%        ~ (p=0.280 n=10)
Ilogb                             1.793n ±  1%    1.566n ±  1%  -12.66% (p=0.000 n=10)
J0                                22.11n ±  1%    21.45n ±  1%   -2.99% (p=0.000 n=10)
J1                                21.71n ±  1%    21.38n ±  1%   -1.54% (p=0.000 n=10)
Jn                                46.43n ±  1%    45.83n ±  1%   -1.30% (p=0.001 n=10)
Ldexp                             2.360n ±  1%    2.111n ±  1%  -10.51% (p=0.000 n=10)
Lgamma                            4.728n ±  1%    4.850n ±  2%   +2.59% (p=0.000 n=10)
Log                               4.304n ±  2%    4.228n ±  1%   -1.78% (p=0.000 n=10)
Logb                              1.833n ±  2%    1.635n ±  2%  -10.80% (p=0.000 n=10)
Log1p                             5.262n ±  2%    5.173n ±  2%   -1.69% (p=0.001 n=10)
Log10                             4.534n ±  1%    4.474n ±  1%   -1.33% (p=0.024 n=10)
Log2                              2.510n ±  2%    2.246n ±  2%  -10.48% (p=0.000 n=10)
Modf                              1.712n ±  3%    1.700n ±  1%        ~ (p=0.055 n=10)
Nextafter32                       2.190n ±  3%    2.187n ±  0%        ~ (p=0.266 n=10)
Nextafter64                       2.184n ±  0%    2.183n ±  0%   -0.05% (p=0.017 n=10)
PowInt                            11.45n ±  7%    11.32n ±  9%        ~ (p=0.137 n=10)
PowFrac                           27.46n ±  3%    27.04n ±  1%   -1.55% (p=0.001 n=10)
Pow10Pos                         0.5367n ±  3%   0.5466n ±  2%   +1.84% (p=0.009 n=10)
Pow10Neg                         0.8939n ±  1%   0.8720n ±  2%   -2.45% (p=0.000 n=10)
Round                             1.218n ±  1%    1.198n ±  1%   -1.56% (p=0.005 n=10)
RoundToEven                       1.711n ±  0%    1.710n ±  0%        ~ (p=0.464 n=10)
Remainder                         12.87n ± 10%    13.79n ± 14%   +7.11% (p=0.027 n=10)
Signbit                          0.4072n ±  2%   0.3839n ±  2%   -5.71% (p=0.000 n=10)
Sin                               4.102n ±  1%    4.058n ±  3%        ~ (p=0.138 n=10)
Sincos                            5.837n ±  1%    5.715n ±  2%   -2.10% (p=0.000 n=10)
Sinh                              5.622n ±  1%    5.567n ±  2%   -0.96% (p=0.006 n=10)
SqrtIndirect                     0.4284n ±  0%   0.4279n ±  0%        ~ (p=0.084 n=10)
SqrtLatency                       2.779n ±  0%    2.777n ±  0%        ~ (p=0.089 n=10)
SqrtIndirectLatency               2.777n ±  0%    2.778n ±  0%        ~ (p=0.305 n=10)
SqrtGoLatency                     24.00n ±  0%    24.51n ±  0%   +2.12% (p=0.000 n=10)
SqrtPrime                         673.0n ±  0%    673.0n ±  0%        ~ (p=0.574 n=10)
Tan                               4.111n ±  4%    4.123n ±  5%        ~ (p=0.424 n=10)
Tanh                              5.787n ±  1%    5.723n ±  1%   -1.11% (p=0.010 n=10)
Trunc                            0.3441n ±  3%   0.2596n ±  2%  -24.56% (p=0.000 n=10)
Y0                                21.63n ±  2%    21.07n ±  2%   -2.61% (p=0.001 n=10)
Y1                                21.42n ±  1%    20.93n ±  3%   -2.29% (p=0.041 n=10)
Yn                                45.78n ±  1%    45.83n ±  1%        ~ (p=0.671 n=10)
Float64bits                      0.2187n ±  2%   0.2199n ±  2%        ~ (p=0.138 n=10)
Float64frombits                  0.2198n ±  1%   0.2199n ±  1%        ~ (p=0.956 n=10)
Float32bits                      0.2237n ±  2%   0.2213n ±  1%        ~ (p=0.060 n=10)
Float32frombits                  0.2251n ±  1%   0.2219n ±  2%   -1.42% (p=0.000 n=10)
FMA                              0.8557n ±  1%   0.8555n ±  0%        ~ (p=0.286 n=10)
geomean                           3.186n          3.070n         -3.61%

Change-Id: I4814bb1e3d9d20e9d8cd7689e8d5383e36b00331
Reviewed-on: https://go-review.googlesource.com/c/go/+/694955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Sean Liao <sean@liao.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-25 12:43:51 -07:00
Michael Munday
1eed4f32a0 math: optimize Signbit implementation slightly
This small tweak to Signbit improves code generation on riscv64 and
possibly other architectures by removing the need to apply a 64 bit
mask.

Before:
  MOV $-9223372036854775808, X6
  AND X6, X5, X5
  SNEZ X5, X10

After:
  SLTI $0, X5, X10

This transformation could also be added to the optimization rules
but it is quite a special case.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
        │    sec/op     │   sec/op     vs base                │
Signbit     13.05n ± 0%   11.42n ± 0%  -12.49% (p=0.000 n=10)

Change-Id: Ic218017c5bbb720ec24c6fe7cc230df539b2630c
Reviewed-on: https://go-review.googlesource.com/c/go/+/698419
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Sean Liao <sean@liao.dev>
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-25 12:35:20 -07:00
Michael Munday
0201524c52 math: remove redundant infinity tests
These cases are covered by existing comparisons against constants.

Change-Id: I19ad530e95d2437a8617f5229495da591ceb779a
Reviewed-on: https://go-review.googlesource.com/c/go/+/692255
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Sean Liao <sean@liao.dev>
2025-08-08 12:47:59 -07:00
ICHINOSE Shogo
498899e205 math: fix portable FMA implementation when x*y ~ 0, x*y < 0 and z = 0
Adding zero usually does not change the original value.
However, there is an exception with negative zero. (e.g. (-0) + (+0) = (+0))
This applies when x * y is negative and underflows.

Fixes #73757

Change-Id: Ib7b54bdacd1dcfe3d392802ea35cdb4e989f9371
GitHub-Last-Rev: 30d74883b2
GitHub-Pull-Request: golang/go#73759
Reviewed-on: https://go-review.googlesource.com/c/go/+/673856
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-05-19 14:45:48 -07:00
Julian Zhu
6109185cf9 math/big: fix incorrect register allocation for mipsx/mips64x
According to the MIPS ABI, R26/R27 are reserved for OS kernel, and may be clobbered by it. They must not be used by user mode.

See Figure 3-18 of MIPS ELF ABI specification: https://refspecs.linuxfoundation.org/elf/mipsabi.pdf

Fixes #73472

Change-Id: Ifda692a803176bfaab2c70d6623636c5d135f42e
Reviewed-on: https://go-review.googlesource.com/c/go/+/667816
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-01 05:04:39 -07:00
Russ Cox
a4d0269a4f math/big: use clearer loop bounds check elimination
Checking that the lengths are equal and panicking teaches the compiler
that it can assume “i in range for z” implies “i in range for x”, letting us
simplify the actual loops a bit.

It also turns up a few places in math/big that were playing maybe a little
too fast and loose with slice lengths. Update those to explicitly set all the
input slices to the same length.

These speedups are basically irrelevant, since they only happen
in real code if people are compiling with -tags math_big_pure_go.
But at least the code is clearer.

benchmark \ system                   c3h88    c2s16       s7      386   s7-386   c4as16      mac      arm  loong64  ppc64le  riscv64    s390x
AddVV/words=1/impl=go                    ~  +11.20%   +5.11%   -7.67%   -7.77%   +1.90%  +10.76%  -33.22%        ~  +10.98%        ~   +6.60%
AddVV/words=10/impl=go             -22.12%  -13.48%  -10.37%  -17.95%  -18.07%  -24.58%  -22.04%  -29.95%  -14.22%        ~   -6.33%   +3.66%
AddVV/words=16/impl=go              -9.75%  -13.73%        ~  -21.90%  -18.66%  -30.03%  -20.45%  -28.09%  -17.33%   -7.15%   -8.96%  +12.55%
AddVV/words=100/impl=go             -5.91%   -1.02%        ~  -29.23%  -22.18%  -25.62%   -6.49%  -23.59%  -22.31%   -1.88%  -14.13%   +9.23%
AddVV/words=1000/impl=go            -0.52%   -0.19%   -3.58%  -33.89%  -23.46%  -22.46%        ~  -24.00%  -24.73%   +0.93%  -15.79%  +12.32%
AddVV/words=10000/impl=go                ~        ~        ~  -33.79%  -23.72%  -23.79%   -5.98%  -23.92%        ~   +0.78%  -15.45%   +8.59%
AddVV/words=100000/impl=go               ~        ~        ~  -33.90%  -24.25%  -22.82%   -4.09%  -24.63%        ~   +1.00%  -13.56%        ~
SubVV/words=1/impl=go                    ~  +11.64%  +14.05%        ~   -4.07%        ~  +10.79%  -33.69%        ~        ~   +3.89%  +12.33%
SubVV/words=10/impl=go             -10.31%  -14.09%   -7.38%  +13.76%  -13.25%  -18.05%  -20.08%  -24.97%  -14.15%  +10.13%   -0.97%   -2.51%
SubVV/words=16/impl=go              -8.06%  -13.73%   -5.70%  +17.00%  -12.83%  -23.76%  -17.52%  -25.25%  -17.30%   -2.80%   -4.96%  -18.25%
SubVV/words=100/impl=go             -9.22%   -1.30%   -2.76%  +20.88%  -14.35%  -15.29%   -8.49%  -19.64%  -22.31%   -0.68%  -14.30%   -9.04%
SubVV/words=1000/impl=go            -0.60%        ~   -3.43%  +23.08%  -16.14%  -11.96%        ~  -28.52%  -24.73%        ~  -15.95%   -9.91%
SubVV/words=10000/impl=go                ~        ~        ~  +26.01%  -15.24%  -11.92%        ~  -28.26%   +4.25%        ~  -15.42%   -5.95%
SubVV/words=100000/impl=go               ~        ~        ~  +25.71%  -15.83%  -12.13%        ~  -27.88%   -1.27%        ~  -13.57%   -6.72%
LshVU/words=1/impl=go               +0.56%   +0.36%        ~        ~        ~        ~        ~        ~        ~        ~        ~        ~
LshVU/words=10/impl=go             +13.37%   +4.63%        ~        ~        ~        ~        ~   -2.90%        ~        ~        ~        ~
LshVU/words=16/impl=go             +22.83%   +6.47%        ~        ~        ~        ~        ~        ~   +0.80%        ~        ~   +5.88%
LshVU/words=100/impl=go             +7.56%  +13.95%        ~        ~        ~        ~        ~        ~   +0.33%   -2.50%        ~        ~
LshVU/words=1000/impl=go            +0.64%  +17.92%        ~        ~        ~        ~        ~   -6.52%        ~   -2.58%        ~        ~
LshVU/words=10000/impl=go                ~  +17.60%        ~        ~        ~        ~        ~   -6.64%   -6.22%   -1.40%        ~        ~
LshVU/words=100000/impl=go               ~  +14.57%        ~        ~        ~        ~        ~        ~   -5.47%        ~        ~        ~
RshVU/words=1/impl=go                    ~        ~        ~        ~        ~        ~        ~        ~        ~        ~        ~   +2.72%
RshVU/words=10/impl=go                   ~        ~        ~        ~        ~        ~        ~   +2.50%        ~        ~        ~        ~
RshVU/words=16/impl=go                   ~   +0.53%        ~        ~        ~        ~        ~   +3.82%        ~        ~        ~        ~
RshVU/words=100/impl=go                  ~        ~        ~        ~        ~        ~        ~   +6.18%        ~        ~        ~        ~
RshVU/words=1000/impl=go                 ~        ~        ~        ~        ~        ~        ~   +7.00%        ~        ~        ~        ~
RshVU/words=10000/impl=go                ~        ~        ~        ~        ~        ~        ~        ~        ~        ~        ~        ~
RshVU/words=100000/impl=go               ~        ~        ~        ~        ~        ~        ~   +7.05%        ~        ~        ~        ~
MulAddVWW/words=1/impl=go          -10.34%   +4.43%  +10.62%   -1.62%   -4.74%   -2.86%  +11.75%        ~   -8.00%   +8.89%   +3.87%        ~
MulAddVWW/words=10/impl=go          -1.61%   -5.87%        ~   -8.30%   -4.55%   +0.87%        ~   -5.28%  -20.82%        ~        ~   -2.32%
MulAddVWW/words=16/impl=go          -2.96%   -5.28%        ~   -9.22%   -5.28%        ~        ~   -3.74%  -19.52%   -1.48%   -2.53%   -9.52%
MulAddVWW/words=100/impl=go         -3.89%   -7.53%   +1.93%  -10.49%   -4.87%   -8.27%        ~        ~   -0.65%   -0.61%   -7.59%  -20.61%
MulAddVWW/words=1000/impl=go        -0.45%   -3.91%   +4.54%  -11.46%   -4.69%   -8.53%        ~        ~   -0.05%        ~   -8.88%  -19.77%
MulAddVWW/words=10000/impl=go            ~   -3.30%   +4.10%  -11.34%   -4.10%   -9.43%        ~   -0.61%        ~   -0.55%   -8.21%  -18.48%
MulAddVWW/words=100000/impl=go      -0.30%   -3.03%   +4.31%  -11.55%   -4.41%   -9.74%        ~   -0.75%   +0.63%        ~   -7.80%  -19.82%
AddMulVVWW/words=1/impl=go               ~  +13.09%  +12.50%   -7.05%  -10.41%   +2.53%  +13.32%   -3.49%        ~  +15.56%   +3.62%        ~
AddMulVVWW/words=10/impl=go        -15.96%   -9.06%   -5.06%  -14.56%  -11.83%   -5.44%  -26.30%  -14.23%  -11.44%   -1.79%   -5.93%   -6.60%
AddMulVVWW/words=16/impl=go        -19.05%  -12.43%   -6.19%  -14.24%  -12.67%   -8.65%  -18.64%  -16.56%  -10.64%   -3.00%   -7.61%  -12.80%
AddMulVVWW/words=100/impl=go       -22.13%  -16.59%  -13.04%  -13.79%  -11.46%  -12.01%   -6.46%  -21.80%   -5.08%   -3.13%  -13.60%  -22.53%
AddMulVVWW/words=1000/impl=go      -17.07%  -17.05%  -14.08%  -13.59%  -12.13%  -11.21%        ~  -22.81%   -4.27%   -1.27%  -16.35%  -23.47%
AddMulVVWW/words=10000/impl=go     -15.03%  -16.78%  -14.23%  -13.86%  -11.84%  -11.69%        ~  -22.75%  -13.39%   -1.10%  -14.37%  -22.01%
AddMulVVWW/words=100000/impl=go    -13.70%  -14.90%  -14.26%  -13.55%  -12.04%  -11.63%        ~  -22.61%        ~   -2.53%  -10.42%  -23.16%

Change-Id: Ic6f64344484a762b818c7090d1396afceb638607
Reviewed-on: https://go-review.googlesource.com/c/go/+/665155
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-19 08:19:27 -07:00
Russ Cox
7f516a31b0 math/big: replace assembly with mini-compiler output
Step 4 of the mini-compiler: switch to the new generated assembly.
No systematic performance regressions, and many many improvements.

In the benchmarks, the systems are:

	c3h88     GOARCH=amd64     c3h88 perf gomote (newer Intel, Google Cloud)
	c2s16     GOARCH=amd64     c2s16 perf gomote (Intel, Google Cloud)
	s7        GOARCH=amd64     rsc basement server (AMD Ryzen 9 7950X)
	386       GOARCH=386       gotip-linux-386 gomote (Intel, Google Cloud)
	s7-386    GOARCH=386       rsc basement server (AMD Ryzen 9 7950X)
	c4as16    GOARCH=arm64     c4as16 perf gomote (Google Cloud)
	mac       GOARCH=arm64     Apple M3 Pro in MacBook Pro
	arm       GOARCH=arm       gotip-linux-arm gomote
	loong64   GOARCH=loong64   gotip-linux-loong64 gomote
	ppc64le   GOARCH=ppc64le   gotip-linux-ppc64le gomote
	riscv64   GOARCH=riscv64   gotip-linux-riscv64 gomote
	s390x     GOARCH=s390x     linux-s390x-ibm old gomote

benchmark \ system           c3h88    c2s16       s7      386   s7-386   c4as16      mac      arm  loong64  ppc64le  riscv64    s390x
AddVV/words=1               -4.03%   +5.21%   -4.04%   +4.94%        ~        ~        ~        ~  -19.51%        ~        ~        ~
AddVV/words=10             -10.20%   +0.34%   -3.46%  -11.50%   -7.46%   +7.66%   +5.97%        ~  -17.90%        ~        ~        ~
AddVV/words=16             -10.91%   -6.45%   -8.45%  -21.86%  -17.90%   +2.73%   -1.61%        ~  -22.47%   -3.54%        ~        ~
AddVV/words=100             -3.77%   -4.30%   -3.17%  -47.27%  -45.34%   -0.78%        ~   -8.74%  -27.19%        ~        ~        ~
AddVV/words=1000            -0.08%   -0.71%        ~  -49.21%  -48.07%        ~        ~  -16.80%  -24.74%        ~        ~        ~
AddVV/words=10000                ~        ~        ~  -48.73%  -48.56%   -0.06%        ~  -17.08%        ~        ~   -4.81%        ~
AddVV/words=100000               ~        ~        ~  -47.80%  -48.38%        ~        ~  -15.10%  -25.06%        ~   -5.34%        ~
SubVV/words=1               -0.84%   +3.43%   -3.62%   +1.34%        ~   -0.76%        ~        ~  -18.18%   +5.58%        ~        ~
SubVV/words=10              -9.99%   +0.34%        ~  -11.23%   -8.24%   +7.53%   +6.15%        ~  -17.55%   +2.77%   -2.08%        ~
SubVV/words=16             -11.94%   -6.45%   -6.81%  -21.82%  -18.11%   +1.58%   -1.21%        ~  -20.36%        ~        ~        ~
SubVV/words=100             -3.38%   -4.32%   -1.80%  -46.14%  -46.43%   +0.41%        ~   -7.20%  -26.17%        ~   -0.42%        ~
SubVV/words=1000            -0.38%   -0.80%        ~  -49.22%  -48.90%        ~        ~  -15.86%  -24.73%        ~        ~        ~
SubVV/words=10000                ~        ~        ~  -49.57%  -49.64%   -0.03%        ~  -15.85%  -26.52%        ~   -5.05%        ~
SubVV/words=100000               ~        ~        ~  -46.88%  -49.66%        ~        ~  -15.45%  -16.11%        ~   -4.99%        ~
LshVU/words=1                    ~   +5.78%        ~        ~   -2.48%   +1.61%   +2.18%   +2.70%  -18.16%  -34.16%  -21.29%        ~
LshVU/words=10             -18.34%   -3.78%   +2.21%        ~        ~   -2.81%  -12.54%        ~  -25.02%  -24.78%  -38.11%  -66.98%
LshVU/words=16             -23.15%   +1.03%   +7.74%   +0.73%        ~   +8.88%   +1.56%        ~  -25.37%  -28.46%  -41.27%        ~
LshVU/words=100            -32.85%   -8.86%   -2.58%        ~   +2.69%   +1.24%        ~  -20.63%  -44.14%  -42.68%  -53.09%        ~
LshVU/words=1000           -37.30%   -0.20%   +5.67%        ~        ~   +1.44%        ~  -27.83%  -45.01%  -37.07%  -57.02%  -46.57%
LshVU/words=10000          -36.84%   -2.30%   +3.82%        ~   +1.86%   +1.57%  -66.81%  -28.00%  -13.15%  -35.40%  -41.97%        ~
LshVU/words=100000         -40.30%        ~   +3.96%        ~        ~        ~        ~  -24.91%  -19.06%  -36.14%  -40.99%  -66.03%
RshVU/words=1               -3.17%   +4.76%   -4.06%   +4.31%   +4.55%        ~        ~        ~  -20.61%        ~  -26.20%  -51.33%
RshVU/words=10             -22.08%   -4.41%  -17.99%   +3.64%  -11.87%        ~  -16.30%        ~  -30.01%        ~  -40.37%  -63.05%
RshVU/words=16             -26.03%   -8.50%  -18.09%        ~  -17.52%   +6.50%        ~   -2.85%  -30.24%        ~  -42.93%  -63.13%
RshVU/words=100            -20.87%  -28.83%  -29.45%        ~  -26.25%   +1.46%   -1.14%  -16.20%  -45.65%  -16.20%  -53.66%  -77.27%
RshVU/words=1000           -24.03%  -21.37%  -26.71%        ~  -28.95%   +0.98%        ~  -18.82%  -45.21%  -23.55%  -57.09%  -71.18%
RshVU/words=10000          -24.56%  -22.44%  -27.01%        ~  -28.88%   +0.78%   -5.35%  -17.47%  -16.87%  -20.67%  -41.97%        ~
RshVU/words=100000         -23.36%  -15.65%  -27.54%        ~  -29.26%   +1.73%   -6.67%  -13.68%  -21.40%  -23.02%  -40.37%  -66.31%
MulAddVWW/words=1           +2.37%   +8.14%        ~   +4.10%   +3.71%        ~        ~        ~  -21.62%        ~   +1.12%        ~
MulAddVWW/words=10               ~   -2.72%  -15.15%   +8.04%        ~        ~        ~   -2.52%  -19.48%        ~   -6.18%        ~
MulAddVWW/words=16               ~   +1.49%        ~   +4.49%   +6.58%   -8.70%   -7.16%  -12.08%  -21.43%   -6.59%   -9.05%        ~
MulAddVWW/words=100         +0.37%   +1.11%   -4.51%  -13.59%        ~  -11.10%   -3.63%  -21.40%  -22.27%   -2.92%  -14.41%        ~
MulAddVWW/words=1000             ~   +0.90%   -7.13%  -18.94%        ~  -14.02%   -9.97%  -28.31%  -18.72%   -2.32%  -15.80%        ~
MulAddVWW/words=10000            ~   +1.08%   -6.75%  -19.10%        ~  -14.61%   -9.04%  -28.48%  -14.29%   -2.25%   -9.40%        ~
MulAddVWW/words=100000           ~        ~   -6.93%  -18.09%        ~  -14.33%   -9.66%  -28.92%  -16.63%   -2.43%   -8.23%        ~
AddMulVVWW/words=1          +2.30%   +4.83%  -11.37%   +4.58%        ~   -3.14%        ~        ~  -10.58%  +30.35%        ~        ~
AddMulVVWW/words=10         -3.27%        ~   +8.96%   +5.74%        ~   +2.67%   -1.44%   -7.64%  -13.41%        ~        ~        ~
AddMulVVWW/words=16         -6.12%        ~        ~        ~   +1.91%   -7.90%  -16.22%  -14.07%  -14.26%   -4.15%   -7.30%        ~
AddMulVVWW/words=100        -5.48%   -2.14%        ~   -9.40%   +9.98%   -1.43%  -12.35%  -18.56%  -21.94%        ~   -9.84%        ~
AddMulVVWW/words=1000      -11.35%   -3.40%   -3.64%  -11.04%  +12.82%   -1.33%  -15.63%  -20.50%  -20.95%        ~  -11.06%  -51.97%
AddMulVVWW/words=10000     -10.31%   -1.61%   -8.41%  -12.15%  +13.10%   -1.03%  -16.34%  -22.46%   -1.00%        ~  -10.33%  -49.80%
AddMulVVWW/words=100000    -13.71%        ~   -8.31%  -12.18%  +12.98%   -1.35%  -15.20%  -21.89%        ~        ~   -9.38%  -48.30%

Change-Id: I0a33c33602c0d053c84d9946e662500cfa048e2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/664938
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-19 08:19:23 -07:00
Russ Cox
39070da4f8 math/big: add shift and mul to mini-compiler
Step 3 of the mini-compiler: add the generators for the shift and mul routines.

Change-Id: I981d5b7086262c740036f5db768d3e63083984e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/664937
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-19 08:19:20 -07:00
Russ Cox
2a88106617 math/big: add all architectures to mini-compiler
Step 2 of the mini-compiler: add all the remaining architectures.

Change-Id: I8c5283aa8baa497785a5c15f2248528fa9ae886e
Reviewed-on: https://go-review.googlesource.com/c/go/+/664936
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-04-19 08:19:17 -07:00
Russ Cox
8cc98a04ef math/big: new mini-compiler for arith assembly
The arith assembly is big enough, and the details that you have to keep
in mind are complex enough and varied enough, that it is worth using
a Go program to generate the assembly. That way, all the architectures
can use the same algorithms, and porting to new architectures will be
easier.

This is the first of a sequence of CLs to introduce a new mini-compiler
for generating the arith assembly, in math/big/internal/asmgen.
This CL has the basics of the compiler as well as a couple simple
architectures and the generator for addVV/subVV. It does not check
in the generated assembly yet. That will happen in a followup CL after
the other architectures and generators have been added.

Change-Id: Ib704c60fd972fc5690ac04d8fae3712ee2c1a80a
Reviewed-on: https://go-review.googlesource.com/c/go/+/664935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-04-19 08:19:14 -07:00
Russ Cox
a11643df8f math/big: replace addVW/subVW assembly with fast pure Go
The vast majority of the time, carry propagation is limited and
addVW/subVW only need to consider a single word for carry propagation.
As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying
is done, the remaining words can be handled faster with copy (memmove).
In the benchmarks below, this is the data=random case.

Even more important, if the source and destination are the same,
the copy can be optimized away entirely, making a small in-place
addition to a big.Int O(1) instead of O(N). To date, only a few
systems (amd64, arm64, and pure Go, meaning wasm) make use of this
asymptotic improvement. This is the data=shortcut case.

This CL deletes the addVW/subVW assembly and replaces it with
an optimized pure Go version. Using Go makes it easy to call
the real copy builtin, which will use optimized memmove code,
instead of recreating a worse memmove in assembly (as arm64 does)
or omitting the copy optimization entirely (as most others do).

The worst case for the Go version versus assembly is the case
of incrementing 2^N-1 by 1, which has to propagate a carry
the entire length of the array. This is the data=carry case.
On balance, we believe this case is rare enough to be worth
taking a hit in that case, in exchange for significant wins
in the other cases and the deletion of significant amounts of
assembly of varying quality. (Remember that half the assembly has
the copy optimization and shortcut, while half does not.)

In the benchmarks, the systems are:

	c2s16     GOARCH=amd64     c2s16 perf gomote (Intel, Google Cloud)
	c3h88     GOARCH=amd64     c3h88 perf gomote (newer Intel, Google Cloud)
	s7        GOARCH=amd64     rsc basement server (AMD Ryzen 9 7950X)
	c4as16    GOARCH=arm64     c4as16 perf gomote (Google Cloud)
	mac       GOARCH=arm64     Apple M3 Pro in MacBook Pro
	386       GOARCH=386       gotip-linux-386 gomote
	arm       GOARCH=arm       gotip-linux-arm gomote
	loong64   GOARCH=loong64   gotip-linux-loong64 gomote
	ppc64le   GOARCH=ppc64le   gotip-linux-ppc64le gomote
	riscv64   GOARCH=riscv64   gotip-linux-riscv64 gomote

benchmark \ system                    c2s16     c3h88       s7    c4as16       mac       386      arm  loong64   ppc64le  riscv64

AddVW/words=1/data=random            -1.15%    -1.74%   -5.89%    -9.80%   -11.54%   +23.71%  -12.74%  -14.25%   +14.67%  +10.27%
AddVW/words=2/data=random            -2.59%         ~   -4.38%   -19.31%   -15.41%   +24.80%        ~  -19.99%   +13.73%  +19.71%
AddVW/words=3/data=random            -3.75%   -19.10%   -3.79%   -23.15%   -17.04%   +20.04%  -10.07%  -23.20%         ~  +15.39%
AddVW/words=4/data=random            -2.84%    +7.05%   -8.77%   -22.64%   -15.77%   +16.01%   -7.36%  -28.22%         ~  +23.00%
AddVW/words=5/data=random           -10.97%    +2.16%  -12.09%   -20.89%   -17.14%    +9.42%   -4.69%  -32.60%         ~  +10.07%
AddVW/words=6/data=random            -9.87%         ~   -7.54%   -19.08%    -6.46%         ~   -3.44%  -34.61%         ~  +12.19%
AddVW/words=7/data=random           -14.36%         ~  -10.09%   -19.10%   -10.47%    -6.20%   -5.06%  -38.14%   -11.54%   +6.79%
AddVW/words=8/data=random           -17.50%         ~  -11.06%   -25.14%   -12.88%    -8.35%   -5.11%  -41.39%   -14.04%  +11.87%
AddVW/words=9/data=random           -19.76%    -4.05%  -15.47%   -24.08%   -16.50%   -12.34%  -21.56%  -44.25%   -14.82%        ~
AddVW/words=10/data=random          -13.89%         ~   -9.69%   -23.06%    -8.04%   -12.58%  -19.25%  -32.80%   -11.68%        ~
AddVW/words=16/data=random          -29.36%   -15.35%  -21.86%   -25.04%   -19.89%   -32.26%  -16.29%  -42.66%   -25.92%   -3.01%
AddVW/words=32/data=random          -39.02%   -28.76%  -39.87%   -11.22%    -2.85%   -55.40%  -31.17%  -55.37%   -37.92%  -16.28%
AddVW/words=64/data=random          -25.94%   -19.09%  -20.60%    -6.90%    +8.91%   -51.00%  -43.72%  -62.27%   -44.11%  -28.74%
AddVW/words=100/data=random         -22.79%   -18.13%  -18.25%         ~   +33.89%   -67.40%  -51.77%  -63.54%   -53.75%  -30.97%
AddVW/words=1000/data=random         -8.98%    -3.84%        ~    -3.15%         ~   -93.35%  -63.92%  -65.66%   -68.67%  -42.30%
AddVW/words=10000/data=random        -1.38%    -0.38%        ~         ~         ~   -89.16%  -65.18%  -44.65%   -70.35%  -20.08%
AddVW/words=100000/data=random            ~         ~        ~         ~         ~   -87.03%  -64.51%  -36.08%   -61.40%  -16.53%

SubVW/words=1/data=random            -3.67%         ~   -8.38%   -10.26%    -3.07%   +45.78%   -6.06%  -11.17%         ~        ~
SubVW/words=2/data=random            -3.48%   -10.07%   -5.76%   -20.14%    -8.45%   +44.28%        ~  -19.09%         ~  +16.98%
SubVW/words=3/data=random            -7.11%   -26.64%   -4.48%   -22.07%    -9.21%   +35.61%        ~  -23.93%   -18.20%        ~
SubVW/words=4/data=random            -4.23%    +7.19%   -8.95%   -22.62%   -13.89%   +33.20%   -8.96%  -29.96%         ~  +22.23%
SubVW/words=5/data=random           -11.49%    +1.92%  -10.86%   -22.27%   -17.53%   +24.48%   -2.88%  -35.19%   -19.55%        ~
SubVW/words=6/data=random            -7.67%         ~   -7.72%   -18.44%    -6.24%   +12.03%   -2.00%  -39.68%   -10.73%        ~
SubVW/words=7/data=random           -13.69%   -18.32%  -11.82%   -18.92%   -11.57%    +6.63%        ~  -43.54%   -30.81%        ~
SubVW/words=8/data=random           -16.02%         ~  -11.07%   -24.50%   -11.92%    +4.32%   -3.01%  -46.95%   -24.14%        ~
SubVW/words=9/data=random           -18.76%    -3.34%  -14.84%   -23.79%   -17.50%         ~  -21.80%  -49.98%   -29.62%        ~
SubVW/words=10/data=random          -13.23%         ~   -9.25%   -21.26%   -11.63%         ~  -18.58%  -39.19%   -20.09%        ~
SubVW/words=16/data=random          -28.25%   -13.24%  -22.66%   -27.18%   -19.13%   -23.38%  -20.24%  -51.01%   -28.06%   -3.05%
SubVW/words=32/data=random          -38.41%   -28.88%  -40.12%   -11.20%    -2.80%   -49.17%  -34.67%  -63.29%   -39.25%  -15.20%
SubVW/words=64/data=random          -25.51%   -19.24%  -22.20%    -6.57%    +9.98%   -48.52%  -48.14%  -69.50%   -49.44%  -27.92%
SubVW/words=100/data=random         -21.69%   -18.51%        ~    +1.92%   +34.42%   -65.88%  -54.67%  -71.24%   -58.88%  -30.71%
SubVW/words=1000/data=random         -9.81%    -4.05%   -2.14%    -3.06%         ~   -93.37%  -67.33%  -74.12%   -68.36%  -42.17%
SubVW/words=10000/data=random             ~    -0.52%        ~         ~         ~   -88.87%  -68.54%  -44.94%   -70.63%  -19.95%
SubVW/words=100000/data=random            ~         ~        ~         ~         ~   -86.69%  -68.09%  -48.36%   -62.42%  -19.32%

AddVW/words=1/data=shortcut         -29.38%   -25.38%  -27.37%   -23.15%   -25.41%    +3.01%  -33.60%  -36.12%   -15.76%        ~
AddVW/words=2/data=shortcut         -32.79%   -34.72%  -31.47%   -24.47%   -28.21%    -3.75%  -34.66%  -43.89%   -23.65%  -21.56%
AddVW/words=3/data=shortcut         -38.50%   -46.83%  -35.67%   -26.38%   -30.29%   -10.41%  -44.89%  -47.68%   -30.93%  -26.85%
AddVW/words=4/data=shortcut         -40.40%   -28.85%  -34.19%   -29.83%   -32.95%   -16.09%  -42.86%  -51.02%   -34.19%  -26.69%
AddVW/words=5/data=shortcut         -43.87%   -35.42%  -36.46%   -32.59%   -37.72%   -20.82%  -45.14%  -54.01%   -35.49%  -30.48%
AddVW/words=6/data=shortcut         -46.98%   -39.34%  -42.22%   -35.43%   -38.18%   -27.46%  -46.72%  -56.61%   -40.21%  -34.07%
AddVW/words=7/data=shortcut         -49.63%   -47.97%  -46.61%   -35.28%   -41.93%   -31.14%  -49.29%  -58.89%   -41.10%  -37.01%
AddVW/words=8/data=shortcut         -50.48%   -42.33%  -45.40%   -40.24%   -41.74%   -32.92%  -50.62%  -60.98%   -44.85%  -38.10%
AddVW/words=9/data=shortcut         -54.27%   -43.52%  -49.06%   -42.16%   -45.22%   -37.57%  -51.84%  -62.91%   -46.04%  -40.82%
AddVW/words=10/data=shortcut        -56.01%   -45.40%  -51.42%   -43.29%   -46.14%   -38.65%  -53.65%  -64.62%   -47.05%  -43.21%
AddVW/words=16/data=shortcut        -62.73%   -55.66%  -59.31%   -56.38%   -54.31%   -53.16%  -61.03%  -72.29%   -58.24%  -52.57%
AddVW/words=32/data=shortcut        -74.00%   -69.42%  -71.75%   -33.65%   -37.35%   -71.73%  -72.59%  -82.44%   -70.87%  -67.69%
AddVW/words=64/data=shortcut        -56.69%   -52.72%  -52.09%   -35.48%   -36.87%   -84.24%  -83.10%  -90.37%   -82.56%  -80.81%
AddVW/words=100/data=shortcut       -56.68%   -53.18%  -51.49%   -33.49%   -37.72%   -89.95%  -88.21%  -93.37%   -88.47%  -86.52%
AddVW/words=1000/data=shortcut      -56.68%   -52.45%  -51.66%   -35.31%   -36.65%   -98.88%  -98.62%  -99.24%   -98.78%  -98.41%
AddVW/words=10000/data=shortcut     -56.70%   -52.40%  -51.92%   -33.49%   -36.98%   -99.89%  -99.86%  -99.92%   -99.87%  -99.91%
AddVW/words=100000/data=shortcut    -56.67%   -52.46%  -52.38%   -35.31%   -37.20%   -99.99%  -99.99%  -99.99%   -99.99%  -99.99%

SubVW/words=1/data=shortcut         -29.80%   -20.71%  -26.94%   -23.24%   -25.33%   +26.97%  -32.02%  -37.85%   -40.20%  -12.67%
SubVW/words=2/data=shortcut         -35.47%   -36.38%  -31.93%   -25.43%   -30.18%   +18.96%  -33.48%  -46.48%   -39.38%  -18.65%
SubVW/words=3/data=shortcut         -39.22%   -49.96%  -36.90%   -25.82%   -30.96%   +12.53%  -40.67%  -51.07%   -43.71%  -23.78%
SubVW/words=4/data=shortcut         -40.46%   -24.90%  -34.66%   -29.87%   -33.97%    +4.60%  -42.32%  -54.92%   -42.83%  -22.45%
SubVW/words=5/data=shortcut         -43.84%   -34.17%  -38.00%   -32.55%   -37.27%    -2.46%  -43.09%  -58.18%   -45.70%  -26.45%
SubVW/words=6/data=shortcut         -47.69%   -37.49%  -42.73%   -35.90%   -37.73%    -8.52%  -46.55%  -61.01%   -44.00%  -30.14%
SubVW/words=7/data=shortcut         -49.45%   -50.66%  -46.88%   -34.77%   -41.64%   -14.46%  -48.92%  -63.46%   -50.47%  -33.39%
SubVW/words=8/data=shortcut         -50.45%   -39.31%  -47.14%   -40.47%   -41.70%   -15.77%  -50.21%  -65.64%   -47.71%  -34.01%
SubVW/words=9/data=shortcut         -54.28%   -43.07%  -49.42%   -41.34%   -44.99%   -19.39%  -51.55%  -67.61%   -56.92%  -36.82%
SubVW/words=10/data=shortcut        -56.85%   -47.88%  -50.92%   -42.76%   -45.67%   -23.60%  -53.04%  -69.34%   -60.18%  -39.43%
SubVW/words=16/data=shortcut        -62.36%   -54.83%  -58.80%   -55.83%   -53.74%   -41.04%  -60.16%  -76.75%   -60.56%  -48.63%
SubVW/words=32/data=shortcut        -73.68%   -68.64%  -71.57%   -33.52%   -37.34%   -64.73%  -72.67%  -85.89%   -71.87%  -64.56%
SubVW/words=64/data=shortcut        -56.68%   -51.66%  -52.56%   -34.75%   -37.54%   -80.30%  -83.58%  -92.39%   -83.41%  -78.70%
SubVW/words=100/data=shortcut       -56.68%   -50.97%  -51.57%   -33.68%   -36.78%   -87.42%  -88.53%  -94.84%   -88.87%  -84.96%
SubVW/words=1000/data=shortcut      -56.68%   -50.89%  -52.10%   -34.94%   -37.77%   -98.59%  -98.71%  -99.43%   -98.80%  -98.20%
SubVW/words=10000/data=shortcut     -56.68%   -51.00%  -52.44%   -33.65%   -37.27%   -99.86%  -99.87%  -99.94%   -99.88%  -99.90%
SubVW/words=100000/data=shortcut    -56.68%   -50.80%  -52.20%   -34.79%   -37.46%   -99.99%  -99.99%  -99.99%   -99.99%  -99.99%

AddVW/words=1/data=carry             -0.51%    -5.29%  -24.03%   -26.48%         ~         ~  -33.14%  -30.23%         ~  -20.74%
AddVW/words=2/data=carry             -6.36%         ~  -21.05%   -39.40%         ~   +10.72%  -29.12%  -31.34%         ~  -17.29%
AddVW/words=3/data=carry                  ~         ~  -17.46%   -19.53%   +17.58%         ~  -26.23%  -23.61%    +7.80%  -14.34%
AddVW/words=4/data=carry            +19.02%   +16.80%        ~         ~   +28.25%         ~  -27.90%  -20.31%   +19.16%        ~
AddVW/words=5/data=carry             +3.97%   +53.02%        ~         ~   +11.31%         ~  -19.05%  -17.47%   +16.81%        ~
AddVW/words=6/data=carry             +2.98%   +19.83%        ~         ~   +14.84%         ~  -18.48%  -14.92%   +18.25%        ~
AddVW/words=7/data=carry                  ~         ~        ~         ~   +27.17%         ~  -15.50%  -12.74%   +13.00%        ~
AddVW/words=8/data=carry             +0.58%   +22.32%        ~    +6.10%   +29.63%         ~  -13.04%        ~   +28.46%   +2.95%
AddVW/words=9/data=carry                  ~   +31.53%        ~         ~   +14.42%         ~  -11.32%        ~   +18.37%   +3.28%
AddVW/words=10/data=carry            +3.94%   +22.36%        ~    +6.29%   +19.22%         ~  -11.27%        ~   +20.10%   +3.91%
AddVW/words=16/data=carry            +2.82%   +14.23%        ~   +10.06%   +25.91%   -16.12%        ~        ~   +52.28%  +10.40%
AddVW/words=32/data=carry                 ~   +25.35%  +13.66%         ~   +34.89%   -34.39%   +6.51%  -18.71%   +41.06%  +19.42%
AddVW/words=64/data=carry           -42.03%         ~  -39.70%    +6.65%   +32.29%   -39.94%  +14.34%        ~   +19.68%  +20.86%
AddVW/words=100/data=carry          -33.95%   -34.28%  -39.65%         ~   +27.72%   -26.80%  +17.40%        ~   +26.39%  +23.32%
AddVW/words=1000/data=carry         -42.49%   -47.87%  -47.44%    +1.25%    +4.25%   -41.76%  +23.40%        ~   +25.48%  +27.99%
AddVW/words=10000/data=carry        -41.85%   -48.49%  -49.43%         ~         ~   -42.09%  +24.61%  -10.32%   +40.55%  +18.35%
AddVW/words=100000/data=carry       -28.18%   -48.13%  -48.24%    +1.35%         ~   -42.90%  +24.73%   -9.79%   +22.55%  +17.16%

SubVW/words=1/data=carry            -10.32%   -17.16%  -24.14%   -26.24%         ~   +18.43%  -34.10%  -29.54%    -9.57%        ~
SubVW/words=2/data=carry            -19.45%   -23.31%  -20.74%   -39.73%         ~   +15.74%  -28.13%  -30.21%         ~  -18.74%
SubVW/words=3/data=carry                  ~   -16.18%  -15.34%   -19.54%   +17.62%   +12.39%  -27.64%  -27.09%         ~  -14.97%
SubVW/words=4/data=carry            +11.67%   +24.42%        ~         ~   +25.11%   +14.07%  -28.08%  -26.18%         ~        ~
SubVW/words=5/data=carry             +8.08%   +25.64%        ~         ~   +10.35%    +8.12%  -21.75%  -25.50%         ~   -4.86%
SubVW/words=6/data=carry                  ~   +13.82%        ~         ~   +12.92%    +6.79%  -20.25%  -24.70%         ~   -2.74%
SubVW/words=7/data=carry                  ~         ~   +8.29%    +4.51%   +26.59%    +4.62%  -18.01%  -24.09%         ~   -1.26%
SubVW/words=8/data=carry                  ~   +23.16%  +16.19%    +6.16%   +25.46%    +6.74%  -15.57%  -22.74%         ~   +1.44%
SubVW/words=9/data=carry                  ~   +30.71%  +20.81%         ~   +12.36%         ~  -12.99%        ~         ~   +3.13%
SubVW/words=10/data=carry            +5.03%   +19.53%  +14.84%   +14.16%   +16.12%         ~  -11.64%  -16.00%   +15.45%   +3.29%
SubVW/words=16/data=carry           +14.42%   +15.58%  +33.07%   +11.43%   +24.65%         ~        ~  -21.90%   +25.59%   +9.40%
SubVW/words=32/data=carry                 ~   +27.57%  +46.58%         ~   +35.35%    -8.49%        ~  -24.04%   +11.86%  +18.40%
SubVW/words=64/data=carry           -24.34%   -27.83%  -20.90%   +13.34%   +37.17%   -14.90%        ~   -8.81%   +12.88%  +18.92%
SubVW/words=100/data=carry          -25.19%   -34.70%  -27.45%   +12.86%   +28.42%   -14.48%        ~        ~   +25.71%  +21.93%
SubVW/words=1000/data=carry         -24.93%   -47.86%  -47.26%    +2.66%         ~   -23.88%        ~        ~   +25.99%  +27.81%
SubVW/words=10000/data=carry        -24.17%   -36.48%  -49.41%    +1.06%         ~   -25.06%        ~  -26.50%   +27.94%  +18.36%
SubVW/words=100000/data=carry       -22.51%   -35.86%  -49.46%    +3.96%         ~   -25.18%        ~  -22.15%   +26.86%  +15.44%

Change-Id: I8f252073040e674780ac6ec9912082fb205329dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/664898
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-18 15:07:59 -07:00
Russ Cox
b44b360dd4 math/big: add more complete tests and benchmarks of assembly
Also fix a few real but currently harmless bugs from CL 664895.
There were a few places that were still wrong if z != x or if a != 0.

Change-Id: Id8971e2505523bc4708780c82bf998a546f4f081
Reviewed-on: https://go-review.googlesource.com/c/go/+/664897
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-18 14:14:06 -07:00
Keith Randall
f4803ddc2c math/big: fix loong64 assembly for vet
Vet is failing on this code because some arguments of mulAddVWW
got renamed in the go decl (CL 664895) but not the assembly accessors.

Looks like the assembly got written before that CL but checked in
after that CL.

Change-Id: I270e8db5f8327aa2029c21a126fab1231a3506a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/665717
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2025-04-15 19:23:32 -07:00
Huang Qiqi
396a48bea6 math/big: optimize subVV function for loong64
Benchmark results on Loongson 3C5000 (which is an LA464 implementation):

goos: linux
goarch: loong64
pkg: math/big
cpu: Loongson-3C5000 @ 2200.00MHz
             │ test/old_3c5000_subvv.log │      test/new_3c5000_subvv.log      │
             │          sec/op           │   sec/op     vs base                │
SubVV/1                     10.920n ± 0%   7.657n ± 0%  -29.88% (p=0.000 n=20)
SubVV/2                     14.100n ± 0%   8.841n ± 0%  -37.30% (p=0.000 n=20)
SubVV/3                      16.38n ± 0%   11.06n ± 0%  -32.48% (p=0.000 n=20)
SubVV/4                      18.65n ± 0%   12.85n ± 0%  -31.10% (p=0.000 n=20)
SubVV/5                      20.93n ± 0%   14.79n ± 0%  -29.34% (p=0.000 n=20)
SubVV/10                     32.30n ± 0%   22.29n ± 0%  -30.99% (p=0.000 n=20)
SubVV/100                    244.3n ± 0%   149.2n ± 0%  -38.93% (p=0.000 n=20)
SubVV/1000                   2.292µ ± 0%   1.378µ ± 0%  -39.88% (p=0.000 n=20)
SubVV/10000                  26.26µ ± 0%   25.64µ ± 0%   -2.33% (p=0.000 n=20)
SubVV/100000                 341.3µ ± 0%   238.0µ ± 0%  -30.26% (p=0.000 n=20)
geomean                      209.1n        144.5n       -30.86%

Change-Id: I3863c2c6728f1b0f8fecbf77de13254299c5b1cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/659877
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15 04:56:52 -07:00
Huang Qiqi
72fa8adbdc math/big: optimize mulAddVWW function for loong64
Benchmark results on Loongson 3A5000 (which is an LA464 implementation):

goos: linux
goarch: loong64
pkg: math/big
cpu: Loongson-3A5000-HV @ 2500.00MHz
                 │ test/old_3a5000_muladdvww.log │    test/new_3a5000_muladdvww.log    │
                 │            sec/op             │   sec/op     vs base                │
MulAddVWW/1                          7.606n ± 0%   6.987n ± 0%   -8.14% (p=0.000 n=20)
MulAddVWW/2                          9.207n ± 0%   8.567n ± 0%   -6.95% (p=0.000 n=20)
MulAddVWW/3                         10.810n ± 0%   9.223n ± 0%  -14.68% (p=0.000 n=20)
MulAddVWW/4                          13.01n ± 0%   12.41n ± 0%   -4.61% (p=0.000 n=20)
MulAddVWW/5                          15.79n ± 0%   12.99n ± 0%  -17.73% (p=0.000 n=20)
MulAddVWW/10                         25.62n ± 0%   20.02n ± 0%  -21.86% (p=0.000 n=20)
MulAddVWW/100                        217.0n ± 0%   170.9n ± 0%  -21.24% (p=0.000 n=20)
MulAddVWW/1000                       2.064µ ± 0%   1.612µ ± 0%  -21.90% (p=0.000 n=20)
MulAddVWW/10000                      24.50µ ± 0%   16.74µ ± 0%  -31.66% (p=0.000 n=20)
MulAddVWW/100000                     239.1µ ± 0%   171.1µ ± 0%  -28.45% (p=0.000 n=20)
geomean                              159.2n        130.3n       -18.18%

Change-Id: I063434bc382f4f1234f879172ab671a3d6f2eb80
Reviewed-on: https://go-review.googlesource.com/c/go/+/659881
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15 04:56:20 -07:00
Huang Qiqi
24daaeea09 math/big: optimize subVW function for loong64
Benchmark results on Loongson 3C5000 (which is an LA464 implementation):

goos: linux
goarch: loong64
pkg: math/big
cpu: Loongson-3C5000 @ 2200.00MHz
                │ test/old_3c5000_subvw.log │      test/new_3c5000_subvw.log      │
                │          sec/op           │   sec/op     vs base                │
SubVW/1                         8.564n ± 0%   5.915n ± 0%  -30.93% (p=0.000 n=20)
SubVW/2                        11.675n ± 0%   6.825n ± 0%  -41.54% (p=0.000 n=20)
SubVW/3                        13.410n ± 0%   7.969n ± 0%  -40.57% (p=0.000 n=20)
SubVW/4                        15.300n ± 0%   9.740n ± 0%  -36.34% (p=0.000 n=20)
SubVW/5                         17.34n ± 1%   10.66n ± 0%  -38.55% (p=0.000 n=20)
SubVW/10                        26.55n ± 0%   15.21n ± 0%  -42.70% (p=0.000 n=20)
SubVW/100                       199.2n ± 0%   102.5n ± 0%  -48.52% (p=0.000 n=20)
SubVW/1000                     1866.5n ± 1%   924.6n ± 0%  -50.46% (p=0.000 n=20)
SubVW/10000                     17.67µ ± 2%   12.04µ ± 2%  -31.83% (p=0.000 n=20)
SubVW/100000                    186.4µ ± 0%   132.0µ ± 0%  -29.17% (p=0.000 n=20)
SubVWext/1                      8.616n ± 0%   5.949n ± 0%  -30.95% (p=0.000 n=20)
SubVWext/2                     11.410n ± 0%   7.008n ± 1%  -38.58% (p=0.000 n=20)
SubVWext/3                     13.255n ± 1%   8.073n ± 0%  -39.09% (p=0.000 n=20)
SubVWext/4                     15.095n ± 0%   9.893n ± 0%  -34.47% (p=0.000 n=20)
SubVWext/5                      16.87n ± 0%   10.86n ± 0%  -35.63% (p=0.000 n=20)
SubVWext/10                     26.00n ± 0%   15.54n ± 0%  -40.22% (p=0.000 n=20)
SubVWext/100                    196.0n ± 0%   104.3n ± 1%  -46.76% (p=0.000 n=20)
SubVWext/1000                  1847.0n ± 0%   923.7n ± 0%  -49.99% (p=0.000 n=20)
SubVWext/10000                  17.30µ ± 1%   11.71µ ± 1%  -32.31% (p=0.000 n=20)
SubVWext/100000                 187.5µ ± 0%   131.6µ ± 0%  -29.82% (p=0.000 n=20)
geomean                         159.7n        97.79n       -38.79%

Change-Id: I21a6903e79b02cb22282e80c9bfe2ae9f1a87589
Reviewed-on: https://go-review.googlesource.com/c/go/+/659878
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-04-15 04:56:10 -07:00
Huang Qiqi
2fe0330cd7 math/big: optimize addVW function for loong64
Benchmark results on Loongson 3C5000 (which is an LA464 implementation):

goos: linux
goarch: loong64
pkg: math/big
cpu: Loongson-3C5000 @ 2200.00MHz
                │ test/old_3c5000_addvw.log │      test/new_3c5000_addvw.log      │
                │          sec/op           │   sec/op     vs base                │
AddVW/1                         9.555n ± 0%   5.915n ± 0%  -38.09% (p=0.000 n=20)
AddVW/2                        11.370n ± 0%   6.825n ± 0%  -39.97% (p=0.000 n=20)
AddVW/3                        12.485n ± 0%   7.970n ± 0%  -36.16% (p=0.000 n=20)
AddVW/4                        14.980n ± 0%   9.718n ± 0%  -35.13% (p=0.000 n=20)
AddVW/5                         16.73n ± 0%   10.63n ± 0%  -36.46% (p=0.000 n=20)
AddVW/10                        24.57n ± 0%   15.18n ± 0%  -38.23% (p=0.000 n=20)
AddVW/100                       184.9n ± 0%   102.4n ± 0%  -44.62% (p=0.000 n=20)
AddVW/1000                     1721.0n ± 0%   921.4n ± 0%  -46.46% (p=0.000 n=20)
AddVW/10000                     16.83µ ± 0%   11.68µ ± 0%  -30.58% (p=0.000 n=20)
AddVW/100000                    184.7µ ± 0%   131.3µ ± 0%  -28.93% (p=0.000 n=20)
AddVWext/1                      9.554n ± 0%   5.915n ± 0%  -38.09% (p=0.000 n=20)
AddVWext/2                     11.370n ± 0%   6.825n ± 0%  -39.97% (p=0.000 n=20)
AddVWext/3                     12.505n ± 0%   7.969n ± 0%  -36.27% (p=0.000 n=20)
AddVWext/4                     14.980n ± 0%   9.718n ± 0%  -35.13% (p=0.000 n=20)
AddVWext/5                      16.70n ± 0%   10.63n ± 0%  -36.33% (p=0.000 n=20)
AddVWext/10                     24.54n ± 0%   15.18n ± 0%  -38.13% (p=0.000 n=20)
AddVWext/100                    185.0n ± 0%   102.4n ± 0%  -44.65% (p=0.000 n=20)
AddVWext/1000                  1721.0n ± 0%   921.4n ± 0%  -46.46% (p=0.000 n=20)
AddVWext/10000                  16.83µ ± 0%   11.68µ ± 0%  -30.60% (p=0.000 n=20)
AddVWext/100000                 184.9µ ± 0%   130.4µ ± 0%  -29.51% (p=0.000 n=20)
geomean                         155.5n        96.87n       -37.70%

Change-Id: I824a90cb365e09d7d0d4a2c53ff4b30cf057a75e
Reviewed-on: https://go-review.googlesource.com/c/go/+/659876
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-04-15 04:55:58 -07:00
Russ Cox
432fd9c60f math/big: remove copy responsibility from, rename shlVU, shrVU
It is annoying that non-x86 implementations of shlVU and shrVU
have to go out of their way to handle the trivial case shift==0
with their own copy loops. Instead, arrange to never call them
with shift==0, so that the code can be removed.

Unfortunately, there are linknames of shlVU, so we cannot
change that function. But we can rename the functions and
then leave behind a shlVU wrapper, so do that.

Since the big.Int API calls the operations Lsh and Rsh, rename
shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr
methods and functions to lsh/rsh.

Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121
Reviewed-on: https://go-review.googlesource.com/c/go/+/664896
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-11 06:03:38 -07:00
Russ Cox
4dffdd797b math/big: replace addMulVVW with addMulVVWW
addMulVVW is an unnecessarily special case.
All other assembly routines taking []Word (V as in vector) arguments
take separate source and destination. For example:

	addVV: z = x+y
	mulAddVWW: z = x*m+a

addMulVVW uses the z parameter as both destination and source:

	addMulVVW: z = z+x*m

Even looking at the signatures is confusing: all the VV routines take
two input vectors x and y, but addMulVVW takes only x: where is y?
(The answer is that the two inputs are z and x.)

It would be nice to fix this, both for understandability and regularity,
and to simplify a future assembly generator.

We cannot remove or redefine addMulVVW, because it has been used
in linknames. Instead, the CL adds a new final addend argument ‘a’
like in mulAddVWW, making the natural name addMulVVWW
(two input vectors, two input words):

	addMulVVWW: z = x+y*m+a

This CL updates all the assembly implementations to rename the
inputs z, x, y -> x, y, m, and then introduces a separate destination z.

Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488
Reviewed-on: https://go-review.googlesource.com/c/go/+/664895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-11 05:42:18 -07:00
Vishwanatha HD
4ae6ab2bdf cmd/asm: add LCDBR instruction on s390x
This CL is to add LCDBR assembly instruction mnemonics, mainly used in math package.

The LCDBR instruction has the same effect as the FNEG pseudo-instructions, just that it sets the flag.

Change-Id: I3f00f1ed19148d074c3b6c5f64af0772289f2802
Reviewed-on: https://go-review.googlesource.com/c/go/+/648036
Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2025-03-24 07:05:55 -07:00
Russ Cox
a812e5f3c3 math/big: update calibration tests and recalibrate
Refactor calibration tests to use the same logic for all.

Choosing thresholds that are broadly appropriate for all systems is part science
but also part guesswork and judgement. We could instead set per-GOOS/GOARCH
thresholds, but that seems like too much work, and even then there would be
variation between different chips within a GOOS/GOARCH.
(For example see the three linux/amd64 systems benchmarked below.)

The thresholds chosen in this CL are:

	karatsubaThreshold = 40 // unchanged
	basicSqrThreshold = 12 // was 20
	karatsubaSqrThreshold = 80 // was 260
	divRecursiveThreshold = 40 // was 100

The new file calibrate.md explains the calibration process and links to graphs
justifying those values. (The graphs are hosted on swtch.com to avoid adding
a megabyte of extra data to the Go repo and Go distributions.)

A rendered copy of calibrate.md is at https://swtch.com/math/big/calibrate.html.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                         │     old      │                 new                 │
                         │    sec/op    │   sec/op     vs base                │
Div/20/10-88                13.13n ± 2%   13.14n ± 2%        ~ (p=0.494 n=15)
Div/40/20-88                13.13n ± 2%   13.14n ± 2%        ~ (p=0.137 n=15)
Div/100/50-88               25.50n ± 0%   25.51n ± 0%        ~ (p=0.038 n=15)
Div/200/100-88              113.1n ± 1%   116.0n ± 3%   +2.56% (p=0.000 n=15)
Div/400/200-88              135.3n ± 0%   137.1n ± 1%        ~ (p=0.004 n=15)
Div/1000/500-88             259.9n ± 1%   259.0n ± 2%        ~ (p=0.182 n=15)
Div/2000/1000-88            568.8n ± 1%   564.7n ± 3%        ~ (p=0.927 n=15)
Div/20000/10000-88          25.79µ ± 1%   22.11µ ± 2%  -14.26% (p=0.000 n=15)
Div/200000/100000-88        755.1µ ± 1%   737.6µ ± 1%   -2.32% (p=0.000 n=15)
Div/2000000/1000000-88      31.30m ± 0%   31.20m ± 1%        ~ (p=0.081 n=15)
Div/20000000/10000000-88     1.268 ± 0%    1.265 ± 0%        ~ (p=0.011 n=15)
NatMul/10-88                142.6n ± 0%   142.9n ± 7%        ~ (p=0.145 n=15)
NatMul/100-88               4.347µ ± 0%   4.350µ ± 3%        ~ (p=0.430 n=15)
NatMul/1000-88              187.6µ ± 0%   188.4µ ± 2%        ~ (p=0.004 n=15)
NatMul/10000-88             8.052m ± 0%   8.057m ± 1%        ~ (p=0.148 n=15)
NatMul/100000-88            260.6m ± 0%   260.7m ± 0%        ~ (p=0.512 n=15)
NatSqr/1-88                 26.58n ± 5%   27.96n ± 8%        ~ (p=0.574 n=15)
NatSqr/2-88                 42.35n ± 7%   44.87n ± 6%        ~ (p=0.690 n=15)
NatSqr/3-88                 53.28n ± 4%   55.62n ± 5%        ~ (p=0.151 n=15)
NatSqr/5-88                 76.26n ± 6%   81.43n ± 6%   +6.78% (p=0.000 n=15)
NatSqr/8-88                 110.8n ± 5%   116.4n ± 6%        ~ (p=0.040 n=15)
NatSqr/10-88                141.4n ± 4%   147.8n ± 4%        ~ (p=0.011 n=15)
NatSqr/20-88                325.8n ± 3%   341.7n ± 4%   +4.88% (p=0.000 n=15)
NatSqr/30-88                536.8n ± 3%   556.1n ± 4%        ~ (p=0.027 n=15)
NatSqr/50-88                1.168µ ± 3%   1.197µ ± 3%        ~ (p=0.442 n=15)
NatSqr/80-88                2.527µ ± 2%   2.480µ ± 2%   -1.86% (p=0.000 n=15)
NatSqr/100-88               3.771µ ± 2%   3.535µ ± 2%   -6.26% (p=0.000 n=15)
NatSqr/200-88               14.03µ ± 2%   10.57µ ± 3%  -24.68% (p=0.000 n=15)
NatSqr/300-88               24.06µ ± 2%   20.57µ ± 2%  -14.52% (p=0.000 n=15)
NatSqr/500-88               65.43µ ± 1%   45.45µ ± 1%  -30.55% (p=0.000 n=15)
NatSqr/800-88              126.41µ ± 1%   94.13µ ± 2%  -25.54% (p=0.000 n=15)
NatSqr/1000-88              196.4µ ± 1%   135.1µ ± 1%  -31.18% (p=0.000 n=15)
NatSqr/10000-88             6.404m ± 0%   5.326m ± 1%  -16.84% (p=0.000 n=15)
NatSqr/100000-88            267.2m ± 0%   198.7m ± 0%  -25.64% (p=0.000 n=15)
geomean                     7.318µ        6.948µ        -5.06%

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               22.23n ± 0%   22.23n ± 0%        ~ (p=0.973 n=15)
Div/40/20-16               22.23n ± 0%   22.23n ± 0%        ~ (p=0.226 n=15)
Div/100/50-16              55.27n ± 1%   55.59n ± 0%        ~ (p=0.004 n=15)
Div/200/100-16             174.7n ± 3%   175.9n ± 2%        ~ (p=0.645 n=15)
Div/400/200-16             208.8n ± 1%   209.5n ± 2%        ~ (p=0.169 n=15)
Div/1000/500-16            378.7n ± 2%   380.5n ± 2%        ~ (p=0.091 n=15)
Div/2000/1000-16           778.4n ± 1%   781.1n ± 2%        ~ (p=0.104 n=15)
Div/20000/10000-16         25.16µ ± 1%   24.93µ ± 1%   -0.91% (p=0.000 n=15)
Div/200000/100000-16       926.4µ ± 0%   927.7µ ± 1%        ~ (p=0.436 n=15)
Div/2000000/1000000-16     35.58m ± 0%   35.53m ± 0%        ~ (p=0.267 n=15)
Div/20000000/10000000-16    1.333 ± 0%    1.330 ± 0%        ~ (p=0.126 n=15)
NatMul/10-16               172.6n ± 0%   165.4n ± 0%   -4.17% (p=0.000 n=15)
NatMul/100-16              5.706µ ± 0%   5.503µ ± 0%   -3.56% (p=0.000 n=15)
NatMul/1000-16             220.8µ ± 0%   219.1µ ± 0%   -0.76% (p=0.000 n=15)
NatMul/10000-16            8.688m ± 0%   8.621m ± 0%   -0.77% (p=0.000 n=15)
NatMul/100000-16           333.3m ± 0%   333.5m ± 0%        ~ (p=0.512 n=15)
NatSqr/1-16                28.66n ± 1%   28.42n ± 3%   -0.84% (p=0.000 n=15)
NatSqr/2-16                48.29n ± 2%   48.19n ± 2%        ~ (p=0.042 n=15)
NatSqr/3-16                59.93n ± 0%   59.64n ± 2%   -0.48% (p=0.000 n=15)
NatSqr/5-16                88.05n ± 0%   87.89n ± 3%        ~ (p=0.066 n=15)
NatSqr/8-16                127.7n ± 0%   126.9n ± 3%   -0.63% (p=0.000 n=15)
NatSqr/10-16               170.4n ± 0%   169.7n ± 3%        ~ (p=0.004 n=15)
NatSqr/20-16               388.8n ± 0%   392.9n ± 3%        ~ (p=0.123 n=15)
NatSqr/30-16               635.2n ± 0%   641.7n ± 3%        ~ (p=0.123 n=15)
NatSqr/50-16               1.304µ ± 1%   1.314µ ± 3%        ~ (p=0.927 n=15)
NatSqr/80-16               2.709µ ± 1%   2.899µ ± 4%   +7.01% (p=0.000 n=15)
NatSqr/100-16              3.885µ ± 0%   3.981µ ± 4%        ~ (p=0.123 n=15)
NatSqr/200-16              13.29µ ± 2%   12.14µ ± 4%   -8.67% (p=0.000 n=15)
NatSqr/300-16              23.39µ ± 0%   22.51µ ± 3%   -3.78% (p=0.000 n=15)
NatSqr/500-16              58.13µ ± 1%   50.56µ ± 2%  -13.02% (p=0.000 n=15)
NatSqr/800-16              118.4µ ± 1%   107.6µ ± 2%   -9.11% (p=0.000 n=15)
NatSqr/1000-16             172.7µ ± 1%   151.8µ ± 2%  -12.11% (p=0.000 n=15)
NatSqr/10000-16            6.065m ± 1%   5.757m ± 1%   -5.08% (p=0.000 n=15)
NatSqr/100000-16           240.9m ± 0%   228.1m ± 0%   -5.32% (p=0.000 n=15)
geomean                    8.601µ        8.453µ        -1.71%

goos: linux
goarch: amd64
pkg: math/big
cpu: AMD Ryzen 9 7950X 16-Core Processor
                         │     old      │                  new                  │
                         │    sec/op    │    sec/op      vs base                │
Div/20/10-32               11.11n ±  0%    11.11n ±  1%        ~ (p=0.532 n=15)
Div/40/20-32               11.08n ±  1%    11.11n ±  0%        ~ (p=0.815 n=15)
Div/100/50-32              16.81n ±  0%    16.84n ± 29%        ~ (p=0.020 n=15)
Div/200/100-32             73.91n ±  0%    76.85n ± 11%   +3.98% (p=0.000 n=15)
Div/400/200-32             87.35n ±  0%    88.91n ± 34%   +1.79% (p=0.000 n=15)
Div/1000/500-32            169.3n ±  1%    168.9n ±  1%        ~ (p=0.049 n=15)
Div/2000/1000-32           369.3n ±  0%    369.0n ±  0%        ~ (p=0.108 n=15)
Div/20000/10000-32         15.92µ ±  0%    13.55µ ±  2%  -14.91% (p=0.000 n=15)
Div/200000/100000-32       491.4µ ±  0%    482.4µ ±  1%   -1.84% (p=0.000 n=15)
Div/2000000/1000000-32     20.09m ±  0%    19.96m ±  0%   -0.69% (p=0.000 n=15)
Div/20000000/10000000-32   756.5m ±  0%    755.5m ±  0%        ~ (p=0.089 n=15)
NatMul/10-32               125.4n ±  5%    124.8n ±  1%        ~ (p=0.588 n=15)
NatMul/100-32              2.952µ ±  3%    2.969µ ±  0%        ~ (p=0.237 n=15)
NatMul/1000-32             120.7µ ±  0%    121.1µ ±  0%   +0.30% (p=0.000 n=15)
NatMul/10000-32            4.845m ±  0%    4.839m ±  1%        ~ (p=0.653 n=15)
NatMul/100000-32           173.3m ±  0%    173.3m ±  0%        ~ (p=0.838 n=15)
NatSqr/1-32                31.18n ± 23%    32.08n ±  2%        ~ (p=0.015 n=15)
NatSqr/2-32                57.22n ± 28%    58.88n ±  2%        ~ (p=0.054 n=15)
NatSqr/3-32                61.34n ± 18%    64.33n ±  2%        ~ (p=0.237 n=15)
NatSqr/5-32                72.47n ± 17%    79.81n ±  3%        ~ (p=0.067 n=15)
NatSqr/8-32                83.26n ± 26%   100.10n ±  3%        ~ (p=0.016 n=15)
NatSqr/10-32               87.31n ± 43%   125.50n ±  2%        ~ (p=0.003 n=15)
NatSqr/20-32               193.5n ± 25%    244.4n ± 13%        ~ (p=0.002 n=15)
NatSqr/30-32               323.9n ± 17%    380.9n ±  6%        ~ (p=0.003 n=15)
NatSqr/50-32               713.4n ±  9%    761.7n ±  8%        ~ (p=0.419 n=15)
NatSqr/80-32               1.486µ ±  7%    1.609µ ±  5%   +8.28% (p=0.000 n=15)
NatSqr/100-32              2.115µ ±  9%    2.253µ ±  1%        ~ (p=0.104 n=15)
NatSqr/200-32              7.201µ ±  4%    6.610µ ±  1%   -8.21% (p=0.000 n=15)
NatSqr/300-32              13.08µ ±  2%    12.37µ ±  1%   -5.41% (p=0.000 n=15)
NatSqr/500-32              32.56µ ±  2%    27.83µ ±  2%  -14.52% (p=0.000 n=15)
NatSqr/800-32              66.83µ ±  3%    59.59µ ±  1%  -10.83% (p=0.000 n=15)
NatSqr/1000-32             98.09µ ±  1%    83.59µ ±  1%  -14.78% (p=0.000 n=15)
NatSqr/10000-32            3.445m ±  1%    3.245m ±  0%   -5.81% (p=0.000 n=15)
NatSqr/100000-32           137.3m ±  0%    127.0m ±  0%   -7.54% (p=0.000 n=15)
geomean                    4.897µ          4.972µ         +1.52%

goos: linux
goarch: arm64
pkg: math/big
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               15.26n ± 2%   15.14n ± 1%        ~ (p=0.212 n=15)
Div/40/20-16               15.22n ± 1%   15.16n ± 0%        ~ (p=0.190 n=15)
Div/100/50-16              26.53n ± 2%   26.42n ± 0%   -0.41% (p=0.000 n=15)
Div/200/100-16             124.3n ± 0%   124.0n ± 0%        ~ (p=0.704 n=15)
Div/400/200-16             142.4n ± 0%   141.8n ± 0%        ~ (p=0.074 n=15)
Div/1000/500-16            262.0n ± 1%   261.3n ± 1%        ~ (p=0.046 n=15)
Div/2000/1000-16           532.6n ± 0%   532.5n ± 1%        ~ (p=0.798 n=15)
Div/20000/10000-16         22.27µ ± 0%   22.88µ ± 0%   +2.73% (p=0.000 n=15)
Div/200000/100000-16       890.4µ ± 0%   902.8µ ± 0%   +1.39% (p=0.000 n=15)
Div/2000000/1000000-16     35.03m ± 0%   35.10m ± 0%        ~ (p=0.305 n=15)
Div/20000000/10000000-16    1.380 ± 0%    1.385 ± 0%        ~ (p=0.019 n=15)
NatMul/10-16               177.6n ± 1%   175.6n ± 3%        ~ (p=0.480 n=15)
NatMul/100-16              5.675µ ± 0%   5.669µ ± 1%        ~ (p=0.705 n=15)
NatMul/1000-16             224.3µ ± 0%   224.6µ ± 0%        ~ (p=0.653 n=15)
NatMul/10000-16            8.735m ± 0%   8.739m ± 0%        ~ (p=0.567 n=15)
NatMul/100000-16           331.6m ± 0%   331.6m ± 1%        ~ (p=0.412 n=15)
NatSqr/1-16                43.69n ± 2%   42.77n ± 6%        ~ (p=0.383 n=15)
NatSqr/2-16                65.26n ± 2%   63.91n ± 5%        ~ (p=0.285 n=15)
NatSqr/3-16                73.95n ± 1%   72.25n ± 6%        ~ (p=0.198 n=15)
NatSqr/5-16                95.06n ± 1%   94.21n ± 3%        ~ (p=0.721 n=15)
NatSqr/8-16                155.5n ± 1%   153.4n ± 4%        ~ (p=0.170 n=15)
NatSqr/10-16               175.4n ± 1%   174.0n ± 2%        ~ (p=0.271 n=15)
NatSqr/20-16               360.8n ± 0%   358.5n ± 2%        ~ (p=0.170 n=15)
NatSqr/30-16               584.7n ± 0%   582.9n ± 1%        ~ (p=0.170 n=15)
NatSqr/50-16               1.323µ ± 0%   1.322µ ± 0%        ~ (p=0.627 n=15)
NatSqr/80-16               2.916µ ± 0%   2.674µ ± 0%   -8.30% (p=0.000 n=15)
NatSqr/100-16              4.365µ ± 0%   3.802µ ± 0%  -12.90% (p=0.000 n=15)
NatSqr/200-16              16.42µ ± 0%   11.29µ ± 0%  -31.26% (p=0.000 n=15)
NatSqr/300-16              28.07µ ± 0%   22.83µ ± 0%  -18.68% (p=0.000 n=15)
NatSqr/500-16              76.30µ ± 0%   50.06µ ± 0%  -34.39% (p=0.000 n=15)
NatSqr/800-16              147.5µ ± 0%   101.2µ ± 1%  -31.41% (p=0.000 n=15)
NatSqr/1000-16             228.6µ ± 0%   149.5µ ± 0%  -34.61% (p=0.000 n=15)
NatSqr/10000-16            7.417m ± 0%   6.025m ± 0%  -18.76% (p=0.000 n=15)
NatSqr/100000-16           309.2m ± 0%   214.9m ± 0%  -30.50% (p=0.000 n=15)
geomean                    8.559µ        7.906µ        -7.63%

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                         │     old      │                 new                 │
                         │    sec/op    │   sec/op     vs base                │
Div/20/10-12                9.577n ± 6%   9.473n ± 5%        ~ (p=0.384 n=15)
Div/40/20-12                9.480n ± 1%   9.430n ± 1%        ~ (p=0.019 n=15)
Div/100/50-12               14.82n ± 0%   14.82n ± 0%        ~ (p=0.845 n=15)
Div/200/100-12              83.94n ± 1%   84.35n ± 4%        ~ (p=0.512 n=15)
Div/400/200-12              102.7n ± 1%   102.9n ± 0%        ~ (p=0.845 n=15)
Div/1000/500-12             185.3n ± 1%   181.9n ± 1%   -1.83% (p=0.000 n=15)
Div/2000/1000-12            397.0n ± 1%   396.7n ± 0%        ~ (p=0.959 n=15)
Div/20000/10000-12          14.05µ ± 0%   13.70µ ± 1%        ~ (p=0.002 n=15)
Div/200000/100000-12        529.4µ ± 3%   526.7µ ± 2%        ~ (p=0.967 n=15)
Div/2000000/1000000-12      20.05m ± 0%   20.05m ± 0%        ~ (p=0.653 n=15)
Div/20000000/10000000-12    788.2m ± 1%   789.0m ± 1%        ~ (p=0.412 n=15)
NatMul/10-12                79.95n ± 1%   80.87n ± 1%   +1.15% (p=0.000 n=15)
NatMul/100-12               2.973µ ± 0%   2.986µ ± 2%        ~ (p=0.051 n=15)
NatMul/1000-12              122.6µ ± 5%   123.0µ ± 1%        ~ (p=0.783 n=15)
NatMul/10000-12             4.990m ± 1%   5.000m ± 1%        ~ (p=0.653 n=15)
NatMul/100000-12            185.3m ± 3%   190.3m ± 1%        ~ (p=0.089 n=15)
NatSqr/1-12                 11.84n ± 1%   11.88n ± 1%        ~ (p=0.735 n=15)
NatSqr/2-12                 21.01n ± 1%   21.44n ± 6%        ~ (p=0.039 n=15)
NatSqr/3-12                 25.59n ± 0%   26.74n ± 9%   +4.49% (p=0.000 n=15)
NatSqr/5-12                 36.78n ± 0%   37.04n ± 1%   +0.71% (p=0.000 n=15)
NatSqr/8-12                 63.09n ± 3%   63.22n ± 1%        ~ (p=0.846 n=15)
NatSqr/10-12                79.98n ± 0%   79.78n ± 0%        ~ (p=0.100 n=15)
NatSqr/20-12                174.0n ± 0%   175.5n ± 1%        ~ (p=0.361 n=15)
NatSqr/30-12                290.0n ± 0%   291.4n ± 0%        ~ (p=0.002 n=15)
NatSqr/50-12                655.2n ± 4%   658.1n ± 0%        ~ (p=0.060 n=15)
NatSqr/80-12                1.506µ ± 0%   1.397µ ± 5%   -7.24% (p=0.000 n=15)
NatSqr/100-12               2.273µ ± 0%   2.005µ ± 5%  -11.79% (p=0.000 n=15)
NatSqr/200-12               8.833µ ± 6%   6.109µ ± 0%  -30.84% (p=0.000 n=15)
NatSqr/300-12               15.15µ ± 4%   12.37µ ± 0%  -18.34% (p=0.000 n=15)
NatSqr/500-12               41.89µ ± 0%   27.70µ ± 1%  -33.88% (p=0.000 n=15)
NatSqr/800-12               80.72µ ± 0%   56.40µ ± 0%  -30.12% (p=0.000 n=15)
NatSqr/1000-12             127.06µ ± 1%   84.06µ ± 1%  -33.84% (p=0.000 n=15)
NatSqr/10000-12             4.130m ± 0%   3.390m ± 0%  -17.91% (p=0.000 n=15)
NatSqr/100000-12            173.2m ± 0%   131.2m ± 6%  -24.25% (p=0.000 n=15)
geomean                     4.489µ        4.189µ        -6.68%

Change-Id: Iaf65fd85457b003ebf07a787c875cda321b40cc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/652058
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-03-12 05:41:50 -07:00
Russ Cox
d037ed62bc math/big: simplify, speed up Karatsuba multiplication
The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.

I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.

Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.

This is simpler and about the same speed or, for large cases, faster.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-16         99.62n ± 3%   99.10n ± 3%        ~ (p=0.009 n=15)
GCD10x10/WithXY-16            243.4n ± 1%   245.2n ± 1%        ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16       921.9n ± 1%   919.2n ± 1%        ~ (p=0.076 n=15)
GCD100x100/WithXY-16          1.527µ ± 1%   1.526µ ± 0%        ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16     9.704µ ± 1%   9.696µ ± 0%        ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16        14.03µ ± 1%   13.96µ ± 0%        ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16   206.5µ ± 2%   206.5µ ± 0%        ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16      398.0µ ± 1%   397.4µ ± 0%        ~ (p=0.683 n=15)
Div/20/10-16                  22.22n ± 0%   22.23n ± 0%        ~ (p=0.105 n=15)
Div/40/20-16                  22.23n ± 0%   22.23n ± 0%        ~ (p=0.307 n=15)
Div/100/50-16                 55.47n ± 0%   55.47n ± 0%        ~ (p=0.573 n=15)
Div/200/100-16                174.9n ± 1%   174.6n ± 1%        ~ (p=0.814 n=15)
Div/400/200-16                209.5n ± 1%   210.5n ± 1%        ~ (p=0.454 n=15)
Div/1000/500-16               379.9n ± 0%   383.5n ± 2%        ~ (p=0.123 n=15)
Div/2000/1000-16              780.1n ± 0%   784.6n ± 1%   +0.58% (p=0.000 n=15)
Div/20000/10000-16            25.22µ ± 1%   25.15µ ± 0%        ~ (p=0.213 n=15)
Div/200000/100000-16          921.8µ ± 1%   926.1µ ± 0%        ~ (p=0.009 n=15)
Div/2000000/1000000-16        37.91m ± 0%   35.63m ± 0%   -6.02% (p=0.000 n=15)
Div/20000000/10000000-16       1.378 ± 0%    1.336 ± 0%   -3.03% (p=0.000 n=15)
NatMul/10-16                  166.8n ± 4%   168.9n ± 3%        ~ (p=0.008 n=15)
NatMul/100-16                 5.519µ ± 2%   5.548µ ± 4%        ~ (p=0.032 n=15)
NatMul/1000-16                230.4µ ± 1%   220.2µ ± 1%   -4.43% (p=0.000 n=15)
NatMul/10000-16               8.569m ± 1%   8.640m ± 1%        ~ (p=0.005 n=15)
NatMul/100000-16              376.5m ± 1%   334.1m ± 0%  -11.26% (p=0.000 n=15)
NatSqr/1-16                   27.85n ± 5%   28.60n ± 2%        ~ (p=0.123 n=15)
NatSqr/2-16                   47.99n ± 2%   48.84n ± 1%        ~ (p=0.008 n=15)
NatSqr/3-16                   59.41n ± 2%   60.87n ± 2%   +2.46% (p=0.001 n=15)
NatSqr/5-16                   87.27n ± 2%   89.31n ± 3%        ~ (p=0.087 n=15)
NatSqr/8-16                   124.6n ± 3%   128.9n ± 3%        ~ (p=0.006 n=15)
NatSqr/10-16                  166.3n ± 3%   172.7n ± 3%        ~ (p=0.002 n=15)
NatSqr/20-16                  385.2n ± 2%   394.7n ± 3%        ~ (p=0.036 n=15)
NatSqr/30-16                  622.7n ± 3%   642.9n ± 3%        ~ (p=0.032 n=15)
NatSqr/50-16                  1.274µ ± 3%   1.323µ ± 4%        ~ (p=0.003 n=15)
NatSqr/80-16                  2.606µ ± 4%   2.714µ ± 4%        ~ (p=0.044 n=15)
NatSqr/100-16                 3.731µ ± 4%   3.871µ ± 4%        ~ (p=0.038 n=15)
NatSqr/200-16                 12.99µ ± 2%   13.09µ ± 3%        ~ (p=0.838 n=15)
NatSqr/300-16                 22.87µ ± 2%   23.25µ ± 2%        ~ (p=0.285 n=15)
NatSqr/500-16                 58.43µ ± 1%   58.25µ ± 2%        ~ (p=0.345 n=15)
NatSqr/800-16                 115.3µ ± 3%   116.2µ ± 3%        ~ (p=0.126 n=15)
NatSqr/1000-16                173.9µ ± 1%   174.3µ ± 1%        ~ (p=0.935 n=15)
NatSqr/10000-16               6.133m ± 2%   6.034m ± 1%   -1.62% (p=0.000 n=15)
NatSqr/100000-16              253.8m ± 1%   241.5m ± 0%   -4.87% (p=0.000 n=15)
geomean                       7.745µ        7.760µ        +0.19%

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                            │     old     │                 new                  │
                            │   sec/op    │    sec/op     vs base                │
GCD10x10/WithoutXY-88         62.17n ± 4%   61.44n ±  0%   -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88            173.4n ± 2%   172.4n ±  4%        ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88       584.0n ± 1%   582.9n ±  0%        ~ (p=0.009 n=15)
GCD100x100/WithXY-88          1.098µ ± 1%   1.091µ ±  2%        ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88     6.055µ ± 0%   6.049µ ±  0%        ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88        9.430µ ± 0%   9.417µ ±  1%        ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88   153.4µ ± 2%   149.0µ ±  2%   -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88      350.6µ ± 3%   349.0µ ±  2%        ~ (p=0.126 n=15)
Div/20/10-88                  13.12n ± 0%   13.12n ±  1%    0.00% (p=0.042 n=15)
Div/40/20-88                  13.12n ± 0%   13.13n ±  0%        ~ (p=0.004 n=15)
Div/100/50-88                 25.49n ± 0%   25.49n ±  0%        ~ (p=0.452 n=15)
Div/200/100-88                115.7n ± 2%   113.8n ±  2%        ~ (p=0.212 n=15)
Div/400/200-88                135.0n ± 1%   136.1n ±  1%        ~ (p=0.005 n=15)
Div/1000/500-88               257.5n ± 1%   259.9n ±  1%        ~ (p=0.004 n=15)
Div/2000/1000-88              567.5n ± 1%   572.4n ±  2%        ~ (p=0.616 n=15)
Div/20000/10000-88            25.65µ ± 0%   25.77µ ±  1%        ~ (p=0.032 n=15)
Div/200000/100000-88          777.4µ ± 1%   754.3µ ±  1%   -2.97% (p=0.000 n=15)
Div/2000000/1000000-88        33.66m ± 0%   31.37m ±  0%   -6.81% (p=0.000 n=15)
Div/20000000/10000000-88       1.320 ± 0%    1.266 ±  0%   -4.04% (p=0.000 n=15)
NatMul/10-88                  151.9n ± 7%   143.3n ±  7%        ~ (p=0.878 n=15)
NatMul/100-88                 4.418µ ± 2%   4.337µ ±  3%        ~ (p=0.512 n=15)
NatMul/1000-88                206.8µ ± 1%   189.8µ ±  1%   -8.25% (p=0.000 n=15)
NatMul/10000-88               8.531m ± 1%   8.095m ±  0%   -5.12% (p=0.000 n=15)
NatMul/100000-88              298.9m ± 0%   260.5m ±  1%  -12.85% (p=0.000 n=15)
NatSqr/1-88                   27.55n ± 6%   28.25n ±  7%        ~ (p=0.024 n=15)
NatSqr/2-88                   44.71n ± 6%   46.21n ±  9%        ~ (p=0.024 n=15)
NatSqr/3-88                   55.44n ± 4%   58.41n ± 10%        ~ (p=0.126 n=15)
NatSqr/5-88                   80.71n ± 5%   81.41n ±  5%        ~ (p=0.032 n=15)
NatSqr/8-88                   115.7n ± 4%   115.4n ±  5%        ~ (p=0.814 n=15)
NatSqr/10-88                  147.4n ± 4%   147.3n ±  4%        ~ (p=0.505 n=15)
NatSqr/20-88                  337.8n ± 3%   337.3n ±  4%        ~ (p=0.814 n=15)
NatSqr/30-88                  556.9n ± 3%   557.6n ±  4%        ~ (p=0.814 n=15)
NatSqr/50-88                  1.208µ ± 4%   1.208µ ±  3%        ~ (p=0.910 n=15)
NatSqr/80-88                  2.591µ ± 3%   2.581µ ±  3%        ~ (p=0.705 n=15)
NatSqr/100-88                 3.870µ ± 3%   3.858µ ±  3%        ~ (p=0.846 n=15)
NatSqr/200-88                 14.43µ ± 3%   14.28µ ±  2%        ~ (p=0.383 n=15)
NatSqr/300-88                 24.68µ ± 2%   24.49µ ±  2%        ~ (p=0.624 n=15)
NatSqr/500-88                 66.27µ ± 1%   66.18µ ±  1%        ~ (p=0.735 n=15)
NatSqr/800-88                 128.7µ ± 1%   127.4µ ±  1%        ~ (p=0.050 n=15)
NatSqr/1000-88                198.7µ ± 1%   197.7µ ±  1%        ~ (p=0.229 n=15)
NatSqr/10000-88               6.582m ± 1%   6.426m ±  1%   -2.37% (p=0.000 n=15)
NatSqr/100000-88              274.3m ± 0%   267.3m ±  0%   -2.57% (p=0.000 n=15)
geomean                       6.518µ        6.438µ         -1.22%

goos: linux
goarch: arm64
pkg: math/big
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-16         61.70n ± 1%   61.32n ± 1%        ~ (p=0.361 n=15)
GCD10x10/WithXY-16            217.3n ± 1%   217.0n ± 1%        ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16       569.7n ± 0%   572.6n ± 2%        ~ (p=0.213 n=15)
GCD100x100/WithXY-16          1.241µ ± 1%   1.236µ ± 1%        ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16     5.558µ ± 0%   5.566µ ± 0%        ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16        9.319µ ± 0%   9.326µ ± 0%        ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16   126.4µ ± 2%   128.7µ ± 3%        ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16      279.3µ ± 0%   278.3µ ± 5%        ~ (p=0.187 n=15)
Div/20/10-16                  15.12n ± 1%   15.21n ± 1%        ~ (p=0.490 n=15)
Div/40/20-16                  15.11n ± 0%   15.23n ± 1%        ~ (p=0.107 n=15)
Div/100/50-16                 26.53n ± 0%   26.50n ± 0%        ~ (p=0.299 n=15)
Div/200/100-16                123.7n ± 0%   124.0n ± 0%        ~ (p=0.086 n=15)
Div/400/200-16                142.5n ± 0%   142.4n ± 0%        ~ (p=0.039 n=15)
Div/1000/500-16               259.9n ± 1%   261.2n ± 1%        ~ (p=0.044 n=15)
Div/2000/1000-16              539.4n ± 1%   532.3n ± 1%   -1.32% (p=0.001 n=15)
Div/20000/10000-16            22.43µ ± 0%   22.32µ ± 0%   -0.49% (p=0.000 n=15)
Div/200000/100000-16          898.3µ ± 0%   889.6µ ± 0%   -0.96% (p=0.000 n=15)
Div/2000000/1000000-16        38.37m ± 0%   35.11m ± 0%   -8.49% (p=0.000 n=15)
Div/20000000/10000000-16       1.449 ± 0%    1.384 ± 0%   -4.48% (p=0.000 n=15)
NatMul/10-16                  182.0n ± 1%   177.8n ± 1%   -2.31% (p=0.000 n=15)
NatMul/100-16                 5.537µ ± 0%   5.693µ ± 0%   +2.82% (p=0.000 n=15)
NatMul/1000-16                229.9µ ± 0%   224.8µ ± 0%   -2.24% (p=0.000 n=15)
NatMul/10000-16               8.985m ± 0%   8.751m ± 0%   -2.61% (p=0.000 n=15)
NatMul/100000-16              371.1m ± 0%   331.5m ± 0%  -10.66% (p=0.000 n=15)
NatSqr/1-16                   46.77n ± 6%   42.76n ± 1%   -8.57% (p=0.000 n=15)
NatSqr/2-16                   66.99n ± 4%   63.62n ± 1%   -5.03% (p=0.000 n=15)
NatSqr/3-16                   76.79n ± 4%   73.42n ± 1%        ~ (p=0.007 n=15)
NatSqr/5-16                   99.00n ± 3%   95.35n ± 1%   -3.69% (p=0.000 n=15)
NatSqr/8-16                   160.0n ± 3%   155.1n ± 1%   -3.06% (p=0.001 n=15)
NatSqr/10-16                  178.4n ± 2%   175.9n ± 0%   -1.40% (p=0.001 n=15)
NatSqr/20-16                  361.9n ± 2%   361.3n ± 0%        ~ (p=0.083 n=15)
NatSqr/30-16                  584.7n ± 0%   586.8n ± 0%   +0.36% (p=0.000 n=15)
NatSqr/50-16                  1.327µ ± 0%   1.329µ ± 0%        ~ (p=0.349 n=15)
NatSqr/80-16                  2.893µ ± 1%   2.925µ ± 0%   +1.11% (p=0.000 n=15)
NatSqr/100-16                 4.330µ ± 1%   4.381µ ± 0%   +1.18% (p=0.000 n=15)
NatSqr/200-16                 16.25µ ± 1%   16.43µ ± 0%   +1.07% (p=0.000 n=15)
NatSqr/300-16                 27.85µ ± 1%   28.06µ ± 0%   +0.77% (p=0.000 n=15)
NatSqr/500-16                 76.01µ ± 0%   76.34µ ± 0%        ~ (p=0.002 n=15)
NatSqr/800-16                 146.8µ ± 0%   148.1µ ± 0%   +0.83% (p=0.000 n=15)
NatSqr/1000-16                228.2µ ± 0%   228.6µ ± 0%        ~ (p=0.123 n=15)
NatSqr/10000-16               7.524m ± 0%   7.426m ± 0%   -1.31% (p=0.000 n=15)
NatSqr/100000-16              316.7m ± 0%   309.2m ± 0%   -2.36% (p=0.000 n=15)
geomean                       7.264µ        7.172µ        -1.27%

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                            │     old     │                new                 │
                            │   sec/op    │   sec/op     vs base               │
GCD10x10/WithoutXY-12         32.61n ± 1%   32.42n ± 1%       ~ (p=0.021 n=15)
GCD10x10/WithXY-12            87.70n ± 1%   88.42n ± 1%       ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12       305.9n ± 0%   306.4n ± 0%       ~ (p=0.003 n=15)
GCD100x100/WithXY-12          560.3n ± 2%   556.6n ± 1%       ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12     3.509µ ± 2%   3.464µ ± 1%       ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12        5.347µ ± 2%   5.372µ ± 1%       ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12   73.75µ ± 1%   73.99µ ± 1%       ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12      148.4µ ± 0%   147.8µ ± 1%       ~ (p=0.076 n=15)
Div/20/10-12                  9.481n ± 0%   9.462n ± 1%       ~ (p=0.631 n=15)
Div/40/20-12                  9.457n ± 0%   9.462n ± 1%       ~ (p=0.798 n=15)
Div/100/50-12                 14.91n ± 0%   14.79n ± 1%  -0.80% (p=0.000 n=15)
Div/200/100-12                84.56n ± 1%   84.60n ± 1%       ~ (p=0.271 n=15)
Div/400/200-12                103.8n ± 0%   102.8n ± 0%  -0.96% (p=0.000 n=15)
Div/1000/500-12               181.3n ± 1%   184.2n ± 2%       ~ (p=0.091 n=15)
Div/2000/1000-12              397.5n ± 0%   397.4n ± 0%       ~ (p=0.299 n=15)
Div/20000/10000-12            14.04µ ± 1%   13.99µ ± 0%       ~ (p=0.221 n=15)
Div/200000/100000-12          523.1µ ± 0%   514.0µ ± 3%       ~ (p=0.775 n=15)
Div/2000000/1000000-12        21.58m ± 0%   20.01m ± 1%  -7.29% (p=0.000 n=15)
Div/20000000/10000000-12      813.5m ± 0%   796.2m ± 1%  -2.13% (p=0.000 n=15)
NatMul/10-12                  80.46n ± 1%   80.02n ± 1%       ~ (p=0.063 n=15)
NatMul/100-12                 2.904µ ± 0%   2.979µ ± 1%  +2.58% (p=0.000 n=15)
NatMul/1000-12                127.8µ ± 0%   122.3µ ± 0%  -4.28% (p=0.000 n=15)
NatMul/10000-12               5.141m ± 0%   4.975m ± 1%  -3.23% (p=0.000 n=15)
NatMul/100000-12              208.8m ± 0%   189.6m ± 3%  -9.21% (p=0.000 n=15)
NatSqr/1-12                   11.90n ± 1%   11.76n ± 1%       ~ (p=0.059 n=15)
NatSqr/2-12                   21.33n ± 1%   21.12n ± 0%       ~ (p=0.063 n=15)
NatSqr/3-12                   26.05n ± 1%   25.79n ± 0%       ~ (p=0.002 n=15)
NatSqr/5-12                   37.31n ± 0%   36.98n ± 1%       ~ (p=0.008 n=15)
NatSqr/8-12                   63.07n ± 0%   62.75n ± 1%       ~ (p=0.061 n=15)
NatSqr/10-12                  79.48n ± 0%   79.59n ± 0%       ~ (p=0.455 n=15)
NatSqr/20-12                  173.1n ± 0%   173.2n ± 1%       ~ (p=0.518 n=15)
NatSqr/30-12                  288.6n ± 1%   289.2n ± 0%       ~ (p=0.030 n=15)
NatSqr/50-12                  653.3n ± 0%   653.3n ± 0%       ~ (p=0.361 n=15)
NatSqr/80-12                  1.492µ ± 0%   1.496µ ± 0%       ~ (p=0.018 n=15)
NatSqr/100-12                 2.270µ ± 1%   2.270µ ± 0%       ~ (p=0.326 n=15)
NatSqr/200-12                 8.776µ ± 1%   8.784µ ± 1%       ~ (p=0.083 n=15)
NatSqr/300-12                 15.07µ ± 0%   15.09µ ± 0%       ~ (p=0.455 n=15)
NatSqr/500-12                 41.71µ ± 0%   41.77µ ± 1%       ~ (p=0.305 n=15)
NatSqr/800-12                 80.77µ ± 1%   80.59µ ± 0%       ~ (p=0.113 n=15)
NatSqr/1000-12                126.4µ ± 1%   126.5µ ± 0%       ~ (p=0.683 n=15)
NatSqr/10000-12               4.204m ± 0%   4.119m ± 0%  -2.02% (p=0.000 n=15)
NatSqr/100000-12              177.0m ± 0%   172.9m ± 0%  -2.31% (p=0.000 n=15)
geomean                       3.790µ        3.757µ       -0.87%

Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-12 05:41:17 -07:00
Russ Cox
70dcc78871 math/big: avoid negative slice size in nat.rem
In a division, normally the answer to N digits / D digits has N-D digits,
but not when N-D is negative. Fix the calculation of the number of
digits for the temporary in nat.rem not to be negative.

Fixes #72043.

Change-Id: Ib9faa430aeb6c5f4c4a730f1ec631d2bf3f7472c
Reviewed-on: https://go-review.googlesource.com/c/go/+/655156
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-06 08:08:34 -08:00
Xiaolin Zhao
6ba91df153 math: implement func archExp and archExp2 in assembly on loong64
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
        |  bench.old  |              bench.new              |
        |   sec/op    |   sec/op     vs base                |
Exp       26.30n ± 0%   12.93n ± 0%  -50.85% (p=0.000 n=10)
ExpGo     26.86n ± 0%   26.92n ± 0%   +0.22% (p=0.000 n=10)
Expm1     16.76n ± 0%   16.75n ± 0%        ~ (p=0.060 n=10)
Exp2      23.05n ± 0%   12.12n ± 0%  -47.42% (p=0.000 n=10)
Exp2Go    23.41n ± 0%   23.47n ± 0%   +0.28% (p=0.000 n=10)
geomean   22.97n        17.54n       -23.64%

goos: linux
goarch: loong64
pkg: math/cmplx
cpu: Loongson-3A6000 @ 2500.00MHz
    |  bench.old  |              bench.new              |
    |   sec/op    |   sec/op     vs base                |
Exp   51.32n ± 0%   35.41n ± 0%  -30.99% (p=0.000 n=10)

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
        |  bench.old  |              bench.new              |
        |   sec/op    |   sec/op     vs base                |
Exp       50.27n ± 0%   48.75n ± 1%   -3.01% (p=0.000 n=10)
ExpGo     50.72n ± 0%   50.44n ± 0%   -0.55% (p=0.000 n=10)
Expm1     28.40n ± 0%   28.32n ± 0%        ~ (p=0.360 n=10)
Exp2      50.09n ± 0%   21.49n ± 1%  -57.10% (p=0.000 n=10)
Exp2Go    50.05n ± 0%   49.69n ± 0%   -0.72% (p=0.000 n=10)
geomean   44.85n        37.52n       -16.35%

goos: linux
goarch: loong64
pkg: math/cmplx
cpu: Loongson-3A5000 @ 2500.00MHz
    |  bench.old  |              bench.new              |
    |   sec/op    |   sec/op     vs base                |
Exp   88.56n ± 0%   67.29n ± 0%  -24.03% (p=0.000 n=10)

Change-Id: I89e456d26fc075d83335ee4a31227d2aface5714
Reviewed-on: https://go-review.googlesource.com/c/go/+/653935
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-03-05 18:30:54 -08:00
Russ Cox
2b73707358 math/big: add tests for allocation during multiply
Test that big.Int.Mul reusing the same target is not allocating
temporary garbage during its computation. That code is going
to be modified in an upcoming CL.

Change-Id: I3ed55c06da030282233c29cd7af2a04f395dc7a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/652056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 06:05:02 -08:00
Russ Cox
0ab2253ce1 math/big: move multiplication to natmul.go
No code changes.

This CL moves the multiplication (and squaring) code into natmul.go,
in preparation for cleaning up Karatsuba and then adding Toom-Cook
and FFT-based multiplication.

Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/652055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27 06:05:00 -08:00
Russ Cox
1f7e28acdf math/big: optimize atoi of base 2, 4, 16
Avoid multiplies when converting base 2, 4, 16 inputs,
reducing conversion time from O(N²) to O(N).

The Base8 and Base10 code paths should be unmodified,
but the base-2,4,16 changes tickle the compiler to generate
better (amd64) or worse (arm64) when really it should not.
This is described in detail in #71868 and should be ignored
for the purposes of this CL.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                      │     old      │                 new                 │
                      │    sec/op    │   sec/op     vs base                │
Scan/10/Base2-16         324.4n ± 0%   258.7n ± 0%  -20.25% (p=0.000 n=15)
Scan/100/Base2-16        2.376µ ± 0%   1.968µ ± 0%  -17.17% (p=0.000 n=15)
Scan/1000/Base2-16       23.89µ ± 0%   19.16µ ± 0%  -19.80% (p=0.000 n=15)
Scan/10000/Base2-16      311.5µ ± 0%   190.4µ ± 0%  -38.86% (p=0.000 n=15)
Scan/100000/Base2-16    10.508m ± 0%   1.904m ± 0%  -81.88% (p=0.000 n=15)
Scan/10/Base8-16         138.3n ± 0%   127.9n ± 0%   -7.52% (p=0.000 n=15)
Scan/100/Base8-16        886.1n ± 0%   790.2n ± 0%  -10.82% (p=0.000 n=15)
Scan/1000/Base8-16       9.227µ ± 0%   8.234µ ± 0%  -10.76% (p=0.000 n=15)
Scan/10000/Base8-16      165.8µ ± 0%   155.6µ ± 0%   -6.19% (p=0.000 n=15)
Scan/100000/Base8-16     9.044m ± 0%   8.935m ± 0%   -1.20% (p=0.000 n=15)
Scan/10/Base10-16        129.9n ± 0%   120.0n ± 0%   -7.62% (p=0.000 n=15)
Scan/100/Base10-16       816.3n ± 0%   730.0n ± 0%  -10.57% (p=0.000 n=15)
Scan/1000/Base10-16      8.518µ ± 0%   7.628µ ± 0%  -10.45% (p=0.000 n=15)
Scan/10000/Base10-16     158.6µ ± 0%   149.4µ ± 0%   -5.80% (p=0.000 n=15)
Scan/100000/Base10-16    8.962m ± 0%   8.855m ± 0%   -1.20% (p=0.000 n=15)
Scan/10/Base16-16        114.5n ± 0%   108.6n ± 0%   -5.15% (p=0.000 n=15)
Scan/100/Base16-16       648.3n ± 0%   525.0n ± 0%  -19.02% (p=0.000 n=15)
Scan/1000/Base16-16      7.375µ ± 0%   5.636µ ± 0%  -23.58% (p=0.000 n=15)
Scan/10000/Base16-16    171.18µ ± 0%   66.99µ ± 0%  -60.87% (p=0.000 n=15)
Scan/100000/Base16-16   9490.9µ ± 0%   682.8µ ± 0%  -92.81% (p=0.000 n=15)
geomean                  20.11µ        13.69µ       -31.94%

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                      │      old      │                 new                 │
                      │    sec/op     │   sec/op     vs base                │
Scan/10/Base2-88          275.4n ± 0%   215.0n ± 0%  -21.93% (p=0.000 n=15)
Scan/100/Base2-88         1.869µ ± 0%   1.629µ ± 0%  -12.84% (p=0.000 n=15)
Scan/1000/Base2-88        18.56µ ± 0%   15.81µ ± 0%  -14.82% (p=0.000 n=15)
Scan/10000/Base2-88       270.0µ ± 0%   157.2µ ± 0%  -41.77% (p=0.000 n=15)
Scan/100000/Base2-88     11.518m ± 0%   1.571m ± 0%  -86.36% (p=0.000 n=15)
Scan/10/Base8-88          108.9n ± 0%   106.0n ± 0%   -2.66% (p=0.000 n=15)
Scan/100/Base8-88         655.2n ± 0%   594.9n ± 0%   -9.20% (p=0.000 n=15)
Scan/1000/Base8-88        6.467µ ± 0%   5.966µ ± 0%   -7.75% (p=0.000 n=15)
Scan/10000/Base8-88       151.2µ ± 0%   147.4µ ± 0%   -2.53% (p=0.000 n=15)
Scan/100000/Base8-88      10.33m ± 0%   10.30m ± 0%   -0.25% (p=0.000 n=15)
Scan/10/Base10-88        100.20n ± 0%   98.53n ± 0%   -1.67% (p=0.000 n=15)
Scan/100/Base10-88        596.9n ± 0%   543.3n ± 0%   -8.98% (p=0.000 n=15)
Scan/1000/Base10-88       5.904µ ± 0%   5.485µ ± 0%   -7.10% (p=0.000 n=15)
Scan/10000/Base10-88      145.7µ ± 0%   142.0µ ± 0%   -2.55% (p=0.000 n=15)
Scan/100000/Base10-88     10.26m ± 0%   10.24m ± 0%   -0.18% (p=0.000 n=15)
Scan/10/Base16-88         90.33n ± 0%   87.60n ± 0%   -3.02% (p=0.000 n=15)
Scan/100/Base16-88        506.4n ± 0%   437.7n ± 0%  -13.57% (p=0.000 n=15)
Scan/1000/Base16-88       5.056µ ± 0%   4.007µ ± 0%  -20.75% (p=0.000 n=15)
Scan/10000/Base16-88     163.35µ ± 0%   65.37µ ± 0%  -59.98% (p=0.000 n=15)
Scan/100000/Base16-88   11027.2µ ± 0%   735.1µ ± 0%  -93.33% (p=0.000 n=15)
geomean                   17.13µ        11.74µ       -31.46%

goos: linux
goarch: arm64
pkg: math/big
                      │     old      │                 new                  │
                      │    sec/op    │    sec/op     vs base                │
Scan/10/Base2-16         324.7n ± 0%    348.4n ± 0%   +7.30% (p=0.000 n=15)
Scan/100/Base2-16        2.604µ ± 0%    3.031µ ± 0%  +16.40% (p=0.000 n=15)
Scan/1000/Base2-16       26.15µ ± 0%    29.94µ ± 0%  +14.52% (p=0.000 n=15)
Scan/10000/Base2-16      334.3µ ± 0%    298.8µ ± 0%  -10.64% (p=0.000 n=15)
Scan/100000/Base2-16    10.664m ± 0%    2.991m ± 0%  -71.95% (p=0.000 n=15)
Scan/10/Base8-16         144.4n ± 1%    162.2n ± 1%  +12.33% (p=0.000 n=15)
Scan/100/Base8-16        917.2n ± 0%   1084.0n ± 0%  +18.19% (p=0.000 n=15)
Scan/1000/Base8-16       9.367µ ± 0%   10.901µ ± 0%  +16.38% (p=0.000 n=15)
Scan/10000/Base8-16      164.2µ ± 0%    181.2µ ± 0%  +10.34% (p=0.000 n=15)
Scan/100000/Base8-16     8.871m ± 1%    9.140m ± 0%   +3.04% (p=0.000 n=15)
Scan/10/Base10-16        134.6n ± 1%    148.3n ± 1%  +10.18% (p=0.000 n=15)
Scan/100/Base10-16       837.1n ± 0%    986.6n ± 0%  +17.86% (p=0.000 n=15)
Scan/1000/Base10-16      8.563µ ± 0%    9.936µ ± 0%  +16.03% (p=0.000 n=15)
Scan/10000/Base10-16     156.5µ ± 1%    171.3µ ± 0%   +9.41% (p=0.000 n=15)
Scan/100000/Base10-16    8.863m ± 1%    9.011m ± 0%   +1.66% (p=0.000 n=15)
Scan/10/Base16-16        115.7n ± 2%    129.1n ± 1%  +11.58% (p=0.000 n=15)
Scan/100/Base16-16       708.6n ± 0%    796.8n ± 0%  +12.45% (p=0.000 n=15)
Scan/1000/Base16-16      7.314µ ± 0%    7.554µ ± 0%   +3.28% (p=0.000 n=15)
Scan/10000/Base16-16    149.05µ ± 0%    74.60µ ± 0%  -49.95% (p=0.000 n=15)
Scan/100000/Base16-16   9091.6µ ± 0%    741.5µ ± 0%  -91.84% (p=0.000 n=15)
geomean                  20.39µ         17.65µ       -13.44%

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                      │     old      │                 new                 │
                      │    sec/op    │   sec/op     vs base                │
Scan/10/Base2-12         193.8n ± 2%   157.3n ± 1%  -18.83% (p=0.000 n=15)
Scan/100/Base2-12        1.445µ ± 2%   1.362µ ± 1%   -5.74% (p=0.000 n=15)
Scan/1000/Base2-12       14.28µ ± 0%   13.51µ ± 0%   -5.42% (p=0.000 n=15)
Scan/10000/Base2-12      177.1µ ± 0%   134.6µ ± 0%  -24.04% (p=0.000 n=15)
Scan/100000/Base2-12     5.429m ± 1%   1.333m ± 0%  -75.45% (p=0.000 n=15)
Scan/10/Base8-12         75.52n ± 2%   76.09n ± 1%        ~ (p=0.010 n=15)
Scan/100/Base8-12        528.4n ± 1%   532.1n ± 1%        ~ (p=0.003 n=15)
Scan/1000/Base8-12       5.423µ ± 1%   5.427µ ± 0%        ~ (p=0.183 n=15)
Scan/10000/Base8-12      89.26µ ± 1%   89.37µ ± 0%        ~ (p=0.237 n=15)
Scan/100000/Base8-12     4.543m ± 2%   4.560m ± 1%        ~ (p=0.595 n=15)
Scan/10/Base10-12        69.87n ± 1%   70.51n ± 0%        ~ (p=0.002 n=15)
Scan/100/Base10-12       488.4n ± 1%   491.2n ± 0%        ~ (p=0.060 n=15)
Scan/1000/Base10-12      5.014µ ± 1%   5.008µ ± 0%        ~ (p=0.783 n=15)
Scan/10000/Base10-12     84.90µ ± 0%   85.10µ ± 0%        ~ (p=0.109 n=15)
Scan/100000/Base10-12    4.516m ± 1%   4.521m ± 1%        ~ (p=0.713 n=15)
Scan/10/Base16-12        59.21n ± 1%   57.70n ± 1%   -2.55% (p=0.000 n=15)
Scan/100/Base16-12       380.0n ± 1%   360.7n ± 1%   -5.08% (p=0.000 n=15)
Scan/1000/Base16-12      3.775µ ± 0%   3.421µ ± 0%   -9.38% (p=0.000 n=15)
Scan/10000/Base16-12     80.62µ ± 0%   34.44µ ± 1%  -57.28% (p=0.000 n=15)
Scan/100000/Base16-12   4826.4µ ± 2%   450.9µ ± 2%  -90.66% (p=0.000 n=15)
geomean                  11.05µ        8.448µ       -23.52%

Change-Id: Ifdb2049545f34072aa75cdbb72bed4cf465f0ad7
Reviewed-on: https://go-review.googlesource.com/c/go/+/650640
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-02-27 06:04:57 -08:00
Russ Cox
f8f08bfd7c math/big: improve scan test and benchmark
Add a few more test cases for scanning (integer conversion),
which were helpful in debugging some upcoming changes.

BenchmarkScan currently times converting the value 10**N
represented in base B back into []Word form.
When B = 10, the text is 1 followed by many zeros, which
could hit a "multiply by zero" special case when processing
many digit chunks, misrepresenting the actual time required
depending on whether that case is optimized.

Change the benchmark to use 9**N, which is about as big and
will not cause runs of zeros in any of the tested bases.

The benchmark comparison below is not showing faster code,
since of course the code is not changing at all here. Instead,
it is showing that the new benchmark work is roughly the same
size as the old benchmark work.

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                      │     old     │                new                 │
                      │   sec/op    │   sec/op     vs base               │
ScanPi-12               43.35µ ± 1%   43.59µ ± 1%       ~ (p=0.069 n=15)
Scan/10/Base2-12        202.3n ± 2%   193.7n ± 1%  -4.25% (p=0.000 n=15)
Scan/100/Base2-12       1.512µ ± 3%   1.447µ ± 1%  -4.30% (p=0.000 n=15)
Scan/1000/Base2-12      15.06µ ± 2%   14.33µ ± 0%  -4.83% (p=0.000 n=15)
Scan/10000/Base2-12     188.0µ ± 5%   177.3µ ± 1%  -5.65% (p=0.000 n=15)
Scan/100000/Base2-12    5.814m ± 3%   5.382m ± 1%  -7.43% (p=0.000 n=15)
Scan/10/Base8-12        78.57n ± 2%   75.02n ± 1%  -4.52% (p=0.000 n=15)
Scan/100/Base8-12       548.2n ± 2%   526.8n ± 1%  -3.90% (p=0.000 n=15)
Scan/1000/Base8-12      5.674µ ± 2%   5.421µ ± 0%  -4.46% (p=0.000 n=15)
Scan/10000/Base8-12     94.42µ ± 1%   88.61µ ± 1%  -6.15% (p=0.000 n=15)
Scan/100000/Base8-12    4.906m ± 2%   4.498m ± 3%  -8.31% (p=0.000 n=15)
Scan/10/Base10-12       73.42n ± 1%   69.56n ± 0%  -5.26% (p=0.000 n=15)
Scan/100/Base10-12      511.9n ± 1%   488.2n ± 0%  -4.63% (p=0.000 n=15)
Scan/1000/Base10-12     5.254µ ± 2%   5.009µ ± 0%  -4.66% (p=0.000 n=15)
Scan/10000/Base10-12    90.22µ ± 2%   84.52µ ± 0%  -6.32% (p=0.000 n=15)
Scan/100000/Base10-12   4.842m ± 3%   4.471m ± 3%  -7.65% (p=0.000 n=15)
Scan/10/Base16-12       62.28n ± 1%   58.70n ± 1%  -5.75% (p=0.000 n=15)
Scan/100/Base16-12      398.6n ± 0%   377.9n ± 1%  -5.19% (p=0.000 n=15)
Scan/1000/Base16-12     4.108µ ± 1%   3.782µ ± 0%  -7.94% (p=0.000 n=15)
Scan/10000/Base16-12    83.78µ ± 2%   80.51µ ± 1%  -3.90% (p=0.000 n=15)
Scan/100000/Base16-12   5.080m ± 3%   4.698m ± 3%  -7.53% (p=0.000 n=15)
geomean                 12.41µ        11.74µ       -5.36%

Change-Id: If3ce290ecc7f38672f11b42fd811afb53dee665d
Reviewed-on: https://go-review.googlesource.com/c/go/+/650639
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:58:49 -08:00
Russ Cox
872496da2b math/big: replace nat pool with Word stack
In the early days of math/big, algorithms that needed more space
grew the result larger than it needed to be and then used the
high words as extra space. This made results their own temporary
space caches, at the cost that saving a result in a data structure
might hold significantly more memory than necessary.
Specifically, new(big.Int).Mul(x, y) returned a big.Int with a
backing slice 3X as big as it strictly needed to be.
If you are storing many multiplication results, or even a single
large result, the 3X overhead can add up.

This approach to storage for temporaries also requires being able
to analyze the algorithms to predict the exact amount they need,
which can be difficult.

For both these reasons, the implementation of recursive long division,
which came later, introduced a “nat pool” where temporaries could be
stored and reused, or reclaimed by the GC when no longer used.
This avoids the storage and bookkeeping overheads but introduces a
per-temporary sync.Pool overhead. divRecursiveStep takes an array
of cached temporaries to remove some of that overhead.
The nat pool was better but is still not quite right.

This CL introduces something even better than the nat pool
(still probably not quite right, but the best I can see for now):
a sync.Pool holding stacks for allocating temporaries.
Now an operation can get one stack out of the pool and then
allocate as many temporaries as it needs during the operation,
eventually returning the stack back to the pool. The sync.Pool
operations are now per-exported-operation (like big.Int.Mul),
not per-temporary.

This CL converts both the pre-allocation in nat.mul and the
uses of the nat pool to use stack pools instead. This simplifies
some code and sets us up better for more complex algorithms
(such as Toom-Cook or FFT-based multiplication) that need
more temporaries. It is also a little bit faster.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               23.68n ± 0%   22.21n ± 0%   -6.21% (p=0.000 n=15)
Div/40/20-16               23.68n ± 0%   22.21n ± 0%   -6.21% (p=0.000 n=15)
Div/100/50-16              56.65n ± 0%   55.53n ± 0%   -1.98% (p=0.000 n=15)
Div/200/100-16             194.6n ± 1%   172.8n ± 0%  -11.20% (p=0.000 n=15)
Div/400/200-16             232.1n ± 0%   206.7n ± 0%  -10.94% (p=0.000 n=15)
Div/1000/500-16            405.3n ± 1%   383.8n ± 0%   -5.30% (p=0.000 n=15)
Div/2000/1000-16           810.4n ± 1%   795.2n ± 0%   -1.88% (p=0.000 n=15)
Div/20000/10000-16         25.88µ ± 0%   25.39µ ± 0%   -1.89% (p=0.000 n=15)
Div/200000/100000-16       931.5µ ± 0%   924.3µ ± 0%   -0.77% (p=0.000 n=15)
Div/2000000/1000000-16     37.77m ± 0%   37.75m ± 0%        ~ (p=0.098 n=15)
Div/20000000/10000000-16    1.367 ± 0%    1.377 ± 0%   +0.72% (p=0.003 n=15)
NatMul/10-16               168.5n ± 3%   164.0n ± 4%        ~ (p=0.751 n=15)
NatMul/100-16              6.086µ ± 3%   5.380µ ± 3%  -11.60% (p=0.000 n=15)
NatMul/1000-16             238.1µ ± 3%   228.3µ ± 1%   -4.12% (p=0.000 n=15)
NatMul/10000-16            8.721m ± 2%   8.518m ± 1%   -2.33% (p=0.000 n=15)
NatMul/100000-16           369.6m ± 0%   371.1m ± 0%   +0.42% (p=0.000 n=15)
geomean                    19.57µ        18.74µ        -4.21%

                 │     old      │                  new                   │
                 │     B/op     │     B/op      vs base                  │
NatMul/10-16         192.0 ± 0%     192.0 ± 0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      4.750Ki ± 0%   1.751Ki ± 0%  -63.14% (p=0.000 n=15)
NatMul/1000-16     48.16Ki ± 0%   16.02Ki ± 0%  -66.73% (p=0.000 n=15)
NatMul/10000-16    482.9Ki ± 1%   165.4Ki ± 3%  -65.75% (p=0.000 n=15)
NatMul/100000-16   5.747Mi ± 7%   4.197Mi ± 0%  -26.97% (p=0.000 n=15)
geomean            41.42Ki        20.63Ki       -50.18%
¹ all samples are equal

                 │     old     │                 new                  │
                 │  allocs/op  │  allocs/op   vs base                 │
NatMul/10-16       1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100-16      1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/1000-16     1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/10000-16    1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100000-16   7.000 ± 14%   7.000 ± 14%       ~ (p=0.668 n=15)
geomean            1.476         1.476        +0.00%
¹ all samples are equal

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-88               15.84n ± 1%   13.12n ± 0%  -17.17% (p=0.000 n=15)
Div/40/20-88               15.88n ± 1%   13.12n ± 0%  -17.38% (p=0.000 n=15)
Div/100/50-88              26.42n ± 0%   25.47n ± 0%   -3.60% (p=0.000 n=15)
Div/200/100-88             132.4n ± 0%   114.9n ± 0%  -13.22% (p=0.000 n=15)
Div/400/200-88             150.1n ± 0%   135.6n ± 0%   -9.66% (p=0.000 n=15)
Div/1000/500-88            275.5n ± 0%   264.1n ± 0%   -4.14% (p=0.000 n=15)
Div/2000/1000-88           586.5n ± 0%   581.1n ± 0%   -0.92% (p=0.000 n=15)
Div/20000/10000-88         25.87µ ± 0%   25.72µ ± 0%   -0.59% (p=0.000 n=15)
Div/200000/100000-88       772.2µ ± 0%   779.0µ ± 0%   +0.88% (p=0.000 n=15)
Div/2000000/1000000-88     33.36m ± 0%   33.63m ± 0%   +0.80% (p=0.000 n=15)
Div/20000000/10000000-88    1.307 ± 0%    1.320 ± 0%   +1.03% (p=0.000 n=15)
NatMul/10-88               140.4n ± 0%   148.8n ± 4%   +5.98% (p=0.000 n=15)
NatMul/100-88              4.663µ ± 1%   4.388µ ± 1%   -5.90% (p=0.000 n=15)
NatMul/1000-88             207.7µ ± 0%   205.8µ ± 0%   -0.89% (p=0.000 n=15)
NatMul/10000-88            8.456m ± 0%   8.468m ± 0%   +0.14% (p=0.021 n=15)
NatMul/100000-88           295.1m ± 0%   297.9m ± 0%   +0.94% (p=0.000 n=15)
geomean                    14.96µ        14.33µ        -4.23%

                 │     old      │                   new                   │
                 │     B/op     │     B/op       vs base                  │
NatMul/10-88         192.0 ± 0%     192.0 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/100-88      4.750Ki ± 0%   1.758Ki ±  0%  -62.99% (p=0.000 n=15)
NatMul/1000-88     48.44Ki ± 0%   16.08Ki ±  0%  -66.80% (p=0.000 n=15)
NatMul/10000-88    489.7Ki ± 1%   166.1Ki ±  3%  -66.08% (p=0.000 n=15)
NatMul/100000-88   5.546Mi ± 0%   3.819Mi ± 60%  -31.15% (p=0.000 n=15)
geomean            41.29Ki        20.30Ki        -50.85%
¹ all samples are equal

                 │     old     │                 new                  │
                 │  allocs/op  │  allocs/op   vs base                 │
NatMul/10-88       1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100-88      1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/1000-88     1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/10000-88    1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100000-88   5.000 ± 20%   6.000 ± 67%       ~ (p=0.672 n=15)
geomean            1.380         1.431        +3.71%
¹ all samples are equal

goos: linux
goarch: arm64
pkg: math/big
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               15.85n ± 0%   15.23n ± 0%   -3.91% (p=0.000 n=15)
Div/40/20-16               15.88n ± 0%   15.22n ± 0%   -4.16% (p=0.000 n=15)
Div/100/50-16              29.69n ± 0%   26.39n ± 0%  -11.11% (p=0.000 n=15)
Div/200/100-16             149.2n ± 0%   123.3n ± 0%  -17.36% (p=0.000 n=15)
Div/400/200-16             160.3n ± 0%   139.2n ± 0%  -13.16% (p=0.000 n=15)
Div/1000/500-16            271.0n ± 0%   256.1n ± 0%   -5.50% (p=0.000 n=15)
Div/2000/1000-16           545.3n ± 0%   527.0n ± 0%   -3.36% (p=0.000 n=15)
Div/20000/10000-16         22.60µ ± 0%   22.20µ ± 0%   -1.77% (p=0.000 n=15)
Div/200000/100000-16       889.0µ ± 0%   892.2µ ± 0%   +0.35% (p=0.000 n=15)
Div/2000000/1000000-16     38.01m ± 0%   38.12m ± 0%   +0.30% (p=0.000 n=15)
Div/20000000/10000000-16    1.437 ± 0%    1.444 ± 0%   +0.50% (p=0.000 n=15)
NatMul/10-16               166.4n ± 2%   169.5n ± 1%   +1.86% (p=0.000 n=15)
NatMul/100-16              5.733µ ± 1%   5.570µ ± 1%   -2.84% (p=0.000 n=15)
NatMul/1000-16             232.6µ ± 1%   229.8µ ± 0%   -1.22% (p=0.000 n=15)
NatMul/10000-16            9.039m ± 1%   8.969m ± 0%   -0.77% (p=0.000 n=15)
NatMul/100000-16           367.0m ± 0%   368.8m ± 0%   +0.48% (p=0.000 n=15)
geomean                    16.15µ        15.50µ        -4.01%

                 │     old      │                  new                   │
                 │     B/op     │     B/op      vs base                  │
NatMul/10-16         192.0 ± 0%     192.0 ± 0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      4.750Ki ± 0%   1.751Ki ± 0%  -63.14% (p=0.000 n=15)
NatMul/1000-16     48.33Ki ± 0%   16.02Ki ± 0%  -66.85% (p=0.000 n=15)
NatMul/10000-16    536.5Ki ± 1%   165.7Ki ± 3%  -69.12% (p=0.000 n=15)
NatMul/100000-16   6.078Mi ± 6%   4.197Mi ± 0%  -30.94% (p=0.000 n=15)
geomean            42.81Ki        20.64Ki       -51.78%
¹ all samples are equal

                 │     old     │                  new                  │
                 │  allocs/op  │  allocs/op   vs base                  │
NatMul/10-16       1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/1000-16     1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/10000-16    2.000 ± 50%   1.000 ±  0%  -50.00% (p=0.001 n=15)
NatMul/100000-16   9.000 ± 11%   8.000 ± 12%  -11.11% (p=0.001 n=15)
geomean            1.783         1.516        -14.97%
¹ all samples are equal

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                         │     old      │                new                 │
                         │    sec/op    │   sec/op     vs base               │
Div/20/10-12                9.850n ± 1%   9.405n ± 1%  -4.52% (p=0.000 n=15)
Div/40/20-12                9.858n ± 0%   9.403n ± 1%  -4.62% (p=0.000 n=15)
Div/100/50-12               16.40n ± 1%   14.81n ± 0%  -9.70% (p=0.000 n=15)
Div/200/100-12              88.48n ± 2%   80.88n ± 0%  -8.59% (p=0.000 n=15)
Div/400/200-12             107.90n ± 1%   99.28n ± 1%  -7.99% (p=0.000 n=15)
Div/1000/500-12             188.8n ± 1%   178.6n ± 1%  -5.40% (p=0.000 n=15)
Div/2000/1000-12            399.9n ± 0%   389.1n ± 0%  -2.70% (p=0.000 n=15)
Div/20000/10000-12          13.94µ ± 2%   13.81µ ± 1%       ~ (p=0.574 n=15)
Div/200000/100000-12        523.8µ ± 0%   521.7µ ± 0%  -0.40% (p=0.000 n=15)
Div/2000000/1000000-12      21.46m ± 0%   21.48m ± 0%       ~ (p=0.067 n=15)
Div/20000000/10000000-12    812.5m ± 0%   812.9m ± 0%       ~ (p=0.061 n=15)
NatMul/10-12                77.14n ± 0%   78.35n ± 1%  +1.57% (p=0.000 n=15)
NatMul/100-12               2.999µ ± 0%   2.871µ ± 1%  -4.27% (p=0.000 n=15)
NatMul/1000-12              126.2µ ± 0%   126.8µ ± 0%  +0.51% (p=0.011 n=15)
NatMul/10000-12             5.099m ± 0%   5.125m ± 0%  +0.51% (p=0.000 n=15)
NatMul/100000-12            206.7m ± 0%   208.4m ± 0%  +0.80% (p=0.000 n=15)
geomean                     9.512µ        9.236µ       -2.91%

                 │     old      │                   new                    │
                 │     B/op     │      B/op       vs base                  │
NatMul/10-12         192.0 ± 0%     192.0 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100-12      4.750Ki ± 0%   1.750Ki ±   0%  -63.16% (p=0.000 n=15)
NatMul/1000-12     48.13Ki ± 0%   16.01Ki ±   0%  -66.73% (p=0.000 n=15)
NatMul/10000-12    483.5Ki ± 1%   163.2Ki ±   2%  -66.24% (p=0.000 n=15)
NatMul/100000-12   5.480Mi ± 4%   1.532Mi ± 104%  -72.05% (p=0.000 n=15)
geomean            41.03Ki        16.82Ki         -59.01%
¹ all samples are equal

                 │    old     │                  new                   │
                 │ allocs/op  │  allocs/op    vs base                  │
NatMul/10-12       1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100-12      1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/1000-12     1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/10000-12    1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100000-12   5.000 ± 0%   1.000 ± 400%  -80.00% (p=0.007 n=15)
geomean            1.380        1.000         -27.52%
¹ all samples are equal

Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011
Reviewed-on: https://go-review.googlesource.com/c/go/+/650638
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27 05:58:09 -08:00
Russ Cox
763766e9ab math/big: report allocs in BenchmarkNatMul, BenchmarkNatSqr
Change-Id: I112f55c0e3ee3b75e615a06b27552de164565c04
Reviewed-on: https://go-review.googlesource.com/c/go/+/650637
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:50:45 -08:00
Russ Cox
b66d83ccea math/big: clean up GCD a little
The GCD code was setting one *Int to the value of another
by smashing one struct on top of the other, instead of using Set.
That was safe in this one case, but it's not idiomatic in math/big
nor safe in general, so rewrite the code not to do that.
(In one case, by swapping variables around; in another, by calling Set.)

The added Set call does slow down GCDs by a small amount,
since the answer has to be copied out. To compensate for that,
optimize a bit: remove the s, t temporaries entirely and handle
vector x word multiplication directly. The net result is that almost
all GCDs are faster, except for small ones, which are a few
nanoseconds slower.

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                              │ bench.before │             bench.after             │
                              │    sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-12            23.80n ± 1%   31.71n ± 1%  +33.24% (p=0.000 n=10)
GCD10x10/WithXY-12              100.40n ± 0%   92.14n ± 1%   -8.22% (p=0.000 n=10)
GCD10x100/WithoutXY-12           63.70n ± 0%   70.73n ± 0%  +11.05% (p=0.000 n=10)
GCD10x100/WithXY-12              278.6n ± 0%   233.1n ± 1%  -16.35% (p=0.000 n=10)
GCD10x1000/WithoutXY-12          153.4n ± 0%   162.2n ± 1%   +5.74% (p=0.000 n=10)
GCD10x1000/WithXY-12             456.0n ± 0%   411.8n ± 1%   -9.69% (p=0.000 n=10)
GCD10x10000/WithoutXY-12         1.002µ ± 1%   1.036µ ± 0%   +3.39% (p=0.000 n=10)
GCD10x10000/WithXY-12            2.330µ ± 1%   2.210µ ± 0%   -5.13% (p=0.000 n=10)
GCD10x100000/WithoutXY-12        8.894µ ± 0%   8.889µ ± 1%        ~ (p=0.754 n=10)
GCD10x100000/WithXY-12           20.84µ ± 0%   20.24µ ± 0%   -2.84% (p=0.000 n=10)
GCD100x100/WithoutXY-12          373.3n ± 3%   314.4n ± 0%  -15.76% (p=0.000 n=10)
GCD100x100/WithXY-12             662.5n ± 0%   572.4n ± 1%  -13.59% (p=0.000 n=10)
GCD100x1000/WithoutXY-12         641.8n ± 0%   598.1n ± 1%   -6.81% (p=0.000 n=10)
GCD100x1000/WithXY-12            1.123µ ± 0%   1.019µ ± 1%   -9.26% (p=0.000 n=10)
GCD100x10000/WithoutXY-12        2.870µ ± 0%   2.831µ ± 0%   -1.38% (p=0.000 n=10)
GCD100x10000/WithXY-12           4.930µ ± 1%   4.675µ ± 0%   -5.16% (p=0.000 n=10)
GCD100x100000/WithoutXY-12       24.08µ ± 0%   23.97µ ± 0%   -0.48% (p=0.007 n=10)
GCD100x100000/WithXY-12          43.66µ ± 0%   42.52µ ± 0%   -2.61% (p=0.001 n=10)
GCD1000x1000/WithoutXY-12        3.999µ ± 0%   3.569µ ± 1%  -10.75% (p=0.000 n=10)
GCD1000x1000/WithXY-12           6.397µ ± 0%   5.534µ ± 0%  -13.49% (p=0.000 n=10)
GCD1000x10000/WithoutXY-12       6.875µ ± 0%   6.450µ ± 0%   -6.18% (p=0.000 n=10)
GCD1000x10000/WithXY-12          20.75µ ± 1%   19.17µ ± 1%   -7.64% (p=0.000 n=10)
GCD1000x100000/WithoutXY-12      36.38µ ± 0%   35.60µ ± 1%   -2.13% (p=0.000 n=10)
GCD1000x100000/WithXY-12         172.1µ ± 0%   174.4µ ± 3%        ~ (p=0.052 n=10)
GCD10000x10000/WithoutXY-12      79.89µ ± 1%   75.16µ ± 2%   -5.92% (p=0.000 n=10)
GCD10000x10000/WithXY-12         160.1µ ± 0%   150.0µ ± 0%   -6.33% (p=0.000 n=10)
GCD10000x100000/WithoutXY-12     213.2µ ± 1%   209.0µ ± 1%   -1.98% (p=0.000 n=10)
GCD10000x100000/WithXY-12        1.399m ± 0%   1.342m ± 3%   -4.08% (p=0.002 n=10)
GCD100000x100000/WithoutXY-12    5.463m ± 1%   5.504m ± 2%        ~ (p=0.190 n=10)
GCD100000x100000/WithXY-12       11.36m ± 0%   11.46m ± 1%   +0.86% (p=0.000 n=10)
geomean                          6.953µ        6.695µ        -3.71%

goos: linux
goarch: amd64
pkg: math/big
cpu: AMD Ryzen 9 7950X 16-Core Processor
                              │ bench.before │             bench.after             │
                              │    sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-32           39.66n ±  4%   44.34n ± 4%  +11.77% (p=0.000 n=10)
GCD10x10/WithXY-32              156.7n ± 12%   130.8n ± 2%  -16.53% (p=0.000 n=10)
GCD10x100/WithoutXY-32          115.8n ±  5%   120.2n ± 2%   +3.89% (p=0.000 n=10)
GCD10x100/WithXY-32             465.3n ±  3%   368.1n ± 2%  -20.91% (p=0.000 n=10)
GCD10x1000/WithoutXY-32         201.1n ±  1%   210.8n ± 2%   +4.82% (p=0.000 n=10)
GCD10x1000/WithXY-32            652.9n ±  4%   605.0n ± 1%   -7.32% (p=0.002 n=10)
GCD10x10000/WithoutXY-32        1.046µ ±  2%   1.143µ ± 1%   +9.33% (p=0.000 n=10)
GCD10x10000/WithXY-32           3.360µ ±  1%   3.258µ ± 1%   -3.04% (p=0.000 n=10)
GCD10x100000/WithoutXY-32       9.391µ ±  3%   9.997µ ± 1%   +6.46% (p=0.000 n=10)
GCD10x100000/WithXY-32          27.92µ ±  1%   28.21µ ± 0%   +1.04% (p=0.043 n=10)
GCD100x100/WithoutXY-32         443.7n ±  5%   320.0n ± 2%  -27.88% (p=0.000 n=10)
GCD100x100/WithXY-32            789.9n ±  2%   690.4n ± 1%  -12.60% (p=0.000 n=10)
GCD100x1000/WithoutXY-32        718.4n ±  3%   600.0n ± 1%  -16.48% (p=0.000 n=10)
GCD100x1000/WithXY-32           1.388µ ±  4%   1.175µ ± 1%  -15.28% (p=0.000 n=10)
GCD100x10000/WithoutXY-32       2.750µ ±  1%   2.668µ ± 1%   -2.96% (p=0.000 n=10)
GCD100x10000/WithXY-32          6.016µ ±  1%   5.590µ ± 1%   -7.09% (p=0.000 n=10)
GCD100x100000/WithoutXY-32      21.40µ ±  1%   22.30µ ± 1%   +4.21% (p=0.000 n=10)
GCD100x100000/WithXY-32         47.02µ ±  4%   48.80µ ± 0%   +3.78% (p=0.015 n=10)
GCD1000x1000/WithoutXY-32       3.417µ ±  4%   3.020µ ± 1%  -11.65% (p=0.000 n=10)
GCD1000x1000/WithXY-32          5.752µ ±  0%   5.418µ ± 2%   -5.81% (p=0.000 n=10)
GCD1000x10000/WithoutXY-32      6.150µ ±  0%   6.246µ ± 1%   +1.55% (p=0.000 n=10)
GCD1000x10000/WithXY-32         24.68µ ±  3%   25.07µ ± 1%        ~ (p=0.051 n=10)
GCD1000x100000/WithoutXY-32     34.60µ ±  2%   36.85µ ± 1%   +6.51% (p=0.000 n=10)
GCD1000x100000/WithXY-32        209.5µ ±  4%   227.4µ ± 0%   +8.56% (p=0.000 n=10)
GCD10000x10000/WithoutXY-32     90.69µ ±  0%   88.48µ ± 0%   -2.44% (p=0.000 n=10)
GCD10000x10000/WithXY-32        197.1µ ±  0%   200.5µ ± 0%   +1.73% (p=0.000 n=10)
GCD10000x100000/WithoutXY-32    239.1µ ±  0%   242.5µ ± 0%   +1.42% (p=0.000 n=10)
GCD10000x100000/WithXY-32       1.963m ±  3%   2.028m ± 0%   +3.28% (p=0.000 n=10)
GCD100000x100000/WithoutXY-32   7.466m ±  0%   7.412m ± 0%   -0.71% (p=0.000 n=10)
GCD100000x100000/WithXY-32      16.10m ±  2%   16.47m ± 0%   +2.25% (p=0.000 n=10)
geomean                         8.388µ         8.127µ        -3.12%

Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742
Reviewed-on: https://go-review.googlesource.com/c/go/+/650636
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:49:50 -08:00
qmuntal
f707e53fd5 all: surround -test.run arguments with ^$
If the -test.run value is not surrounded by ^$ then any test that
matches the -test.run value will be run. This is normally not the
desired behavior, as it can lead to unexpected tests being run.

Change-Id: I3447aaebad5156bbef7f263cdb9f6b8c32331324
Reviewed-on: https://go-review.googlesource.com/c/go/+/651956
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-25 10:58:29 -08:00
Eng Zer Jun
cc874072f3 math/big: use built-in max function
Change-Id: I65721039dab311762e55c6a60dd75b82f6b4622f
Reviewed-on: https://go-review.googlesource.com/c/go/+/642335
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2025-02-03 08:25:31 -08:00
Sean Liao
32ff485c7c math/bits: update reference to debruijn paper
The old link no longer works.

Fixes #70684

Change-Id: I8711ef7d5721bf20ef83f5192dd0d1f73dda6ce1
Reviewed-on: https://go-review.googlesource.com/c/go/+/633775
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-12-04 22:15:36 +00:00
Jorropo
4c3aa5d324 math/rand/v2: replace <= 0 with == 0 for Uint function docs
This harmonize the docs with (*Rand).Uint* functions.
And it make it clearer, I wasn't sure if it would try to interpret
the uint as a signed number somehow, it does not pull any surprises
make that clear.

Change-Id: I5a87a0a5563dbabfc31e536e40ee69b11f5cb6cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/633535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2024-12-04 16:34:07 +00:00
Russ Cox
a947912d8a internal/byteorder: use canonical Go casing in names
If Be and Le stand for big-endian and little-endian,
then they should be BE and LE.

Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/627376
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-20 20:59:28 +00:00
Adam
f7a14ae0cd math/big: properly linkify a reference
Change-Id: Ie7649060db25f1573eeaadd534a600bb24d30572
GitHub-Last-Rev: c617848a4e
GitHub-Pull-Request: golang/go#70134
Reviewed-on: https://go-review.googlesource.com/c/go/+/623757
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2024-10-31 19:23:59 +00:00
Xiaolin Zhao
b521ebb55a math: implement arch{Floor, Ceil, Trunc} in hardware on loong64
benchmark:

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
        │  bench.old   │              bench.new              │
        │    sec/op    │   sec/op     vs base                │
Ceil      10.810n ± 0%   2.578n ± 0%  -76.15% (p=0.000 n=20)
Floor     10.810n ± 0%   2.531n ± 0%  -76.59% (p=0.000 n=20)
Trunc      9.606n ± 0%   2.530n ± 0%  -73.67% (p=0.000 n=20)
geomean    10.39n        2.546n       -75.50%

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
        │  bench.old   │              bench.new              │
        │    sec/op    │   sec/op     vs base                │
Ceil      13.220n ± 0%   7.703n ± 8%  -41.73% (p=0.000 n=20)
Floor     12.410n ± 0%   7.248n ± 2%  -41.59% (p=0.000 n=20)
Trunc     11.210n ± 0%   7.757n ± 4%  -30.80% (p=0.000 n=20)
geomean    12.25n        7.566n       -38.25%

Change-Id: I3af51e9852e9cf5f965fed895d68945a2e8675f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/612615
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-12 03:24:22 +00:00