abseil.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	PR #1662: Replace shift with addition in crc multiply	Pavel P	2024-05-07	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Imported from GitHub PR https://github.com/abseil/abseil-cpp/pull/1662 Merge 4b2c6c909b573d31a1cccba7cb72d4d8badeef8b into cba31a956209e68e4d4049e8a9bc03b1fd67320a Merging this change closes #1662 COPYBARA_INTEGRATE_REVIEW=https://github.com/abseil/abseil-cpp/pull/1662 from pps83:crc-add 4b2c6c909b573d31a1cccba7cb72d4d8badeef8b PiperOrigin-RevId: 631470883 Change-Id: I4a72be643ed341ddf0e0007418ab4a613a03db4b
*	Optimize crc32 V128_From2x64 on Arm	Connal de Souza	2024-04-04	1	-5/+5
\| \| \| \| \| \| \|	This removes redundant vector-vector moves and results in Extend being up to 3% faster. PiperOrigin-RevId: 621948170 Change-Id: Id82816aa6e294d34140ff591103cb20feac79d9a
*	Add entries for Neoverse N2,V1, and V2 into CRC dynamic dispatch table.	Connal de Souza	2023-10-06	1	-0/+5
\| \| \| \| \|	PiperOrigin-RevId: 571430428 Change-Id: I4777c37c5287d26a75f37fe059324ac218878f0e
*	Optimize CRC32 for Ampere Siryn	Connal de Souza	2023-09-26	1	-0/+3
\| \| \| \| \| \| \|	Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs. PiperOrigin-RevId: 568645559 Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
*	Include what you spell	Dmitri Gribenko	2023-08-08	1	-2/+2
\| \| \| \| \|	PiperOrigin-RevId: 554936252 Change-Id: Idb2ffbbc11aa6c98414fdd1ec38873d4687ab5e7
*	Fix spelling mistakes	Vertexwahn	2023-05-02	1	-1/+1
\|
*	Replace absl::base_internal::Prefetch* calls with absl::Prefetch* calls	Martijn Vels	2023-01-27	1	-7/+7
\| \| \| \| \|	PiperOrigin-RevId: 505184961 Change-Id: I64482558a76abda6896bec4b2d323833b6cd7edf
*	Add prefetch to crc32	Ilya Tokar	2022-12-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already prefetch in case of large inputs, do the same for medium sized inputs as well. This is mostly neutral for performance in most cases, so this also adds a new bench with working size >> cache size to ensure that we are seeing performance benefits of prefetch. Main benefits are on AMD with hardware prefetchers turned off: AMD prefetchers on: name old time/op new time/op delta BM_Calculate/0 2.43ns ± 1% 2.43ns ± 1% ~ (p=0.814 n=40+40) BM_Calculate/1 2.50ns ± 2% 2.50ns ± 2% ~ (p=0.745 n=39+39) BM_Calculate/100 9.17ns ± 1% 9.17ns ± 2% ~ (p=0.747 n=40+40) BM_Calculate/10000 474ns ± 1% 474ns ± 2% ~ (p=0.749 n=40+40) BM_Calculate/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.298 n=39+40) BM_Extend/0 1.38ns ± 1% 1.38ns ± 1% ~ (p=0.651 n=40+40) BM_Extend/1 1.53ns ± 2% 1.53ns ± 1% ~ (p=0.957 n=40+39) BM_Extend/100 9.48ns ± 1% 9.48ns ± 2% ~ (p=1.000 n=40+40) BM_Extend/10000 474ns ± 2% 474ns ± 1% ~ (p=0.928 n=40+40) BM_Extend/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.331 n=40+40) BM_Extend/100000000 4.79ms ± 1% 4.79ms ± 1% ~ (p=0.753 n=38+38) BM_ExtendCacheMiss/10 25.5ms ± 2% 25.5ms ± 2% ~ (p=0.988 n=38+40) BM_ExtendCacheMiss/100 23.1ms ± 2% 23.1ms ± 2% ~ (p=0.792 n=40+40) BM_ExtendCacheMiss/1000 37.2ms ± 1% 28.6ms ± 2% -23.00% (p=0.000 n=38+40) BM_ExtendCacheMiss/100000 7.77ms ± 2% 7.74ms ± 2% -0.45% (p=0.006 n=40+40) AMD prefetchers off: name old time/op new time/op delta BM_Calculate/0 2.43ns ± 2% 2.43ns ± 2% ~ (p=0.351 n=40+39) BM_Calculate/1 2.51ns ± 2% 2.51ns ± 1% ~ (p=0.535 n=40+40) BM_Calculate/100 9.18ns ± 2% 9.15ns ± 2% ~ (p=0.120 n=38+39) BM_Calculate/10000 475ns ± 2% 475ns ± 2% ~ (p=0.852 n=40+40) BM_Calculate/500000 22.9µs ± 2% 22.8µs ± 2% ~ (p=0.396 n=40+40) BM_Extend/0 1.38ns ± 2% 1.38ns ± 2% ~ (p=0.466 n=40+40) BM_Extend/1 1.53ns ± 2% 1.53ns ± 2% ~ (p=0.914 n=40+39) BM_Extend/100 9.49ns ± 2% 9.49ns ± 2% ~ (p=0.802 n=40+40) BM_Extend/10000 475ns ± 2% 474ns ± 1% ~ (p=0.589 n=40+40) BM_Extend/500000 22.8µs ± 2% 22.8µs ± 2% ~ (p=0.872 n=39+40) BM_Extend/100000000 10.0ms ± 3% 10.0ms ± 4% ~ (p=0.355 n=40+40) BM_ExtendCacheMiss/10 196ms ± 2% 196ms ± 2% ~ (p=0.698 n=40+40) BM_ExtendCacheMiss/100 129ms ± 1% 129ms ± 1% ~ (p=0.602 n=36+37) BM_ExtendCacheMiss/1000 88.6ms ± 1% 57.2ms ± 1% -35.49% (p=0.000 n=36+38) BM_ExtendCacheMiss/100000 14.9ms ± 1% 14.9ms ± 1% ~ (p=0.888 n=39+40) Intel skylake: BM_Calculate/0 2.49ns ± 2% 2.44ns ± 4% -2.15% (p=0.001 n=31+34) BM_Calculate/1 3.04ns ± 2% 2.98ns ± 9% -1.95% (p=0.003 n=31+35) BM_Calculate/100 8.64ns ± 3% 8.53ns ± 5% ~ (p=0.065 n=31+35) BM_Calculate/10000 290ns ± 3% 285ns ± 7% -1.80% (p=0.004 n=28+34) BM_Calculate/500000 11.8µs ± 2% 11.6µs ± 8% -1.59% (p=0.003 n=26+34) BM_Extend/0 1.56ns ± 1% 1.52ns ± 3% -2.44% (p=0.000 n=26+35) BM_Extend/1 1.88ns ± 3% 1.83ns ± 6% -2.17% (p=0.001 n=27+35) BM_Extend/100 9.31ns ± 3% 9.13ns ± 7% -1.92% (p=0.000 n=33+38) BM_Extend/10000 290ns ± 3% 283ns ± 3% -2.45% (p=0.000 n=32+38) BM_Extend/500000 11.8µs ± 2% 11.5µs ± 8% -1.80% (p=0.001 n=35+37) BM_Extend/100000000 6.39ms ±10% 6.11ms ± 8% -4.34% (p=0.000 n=40+40) BM_ExtendCacheMiss/10 36.2ms ± 7% 35.8ms ±14% ~ (p=0.281 n=33+37) BM_ExtendCacheMiss/100 26.9ms ±15% 25.9ms ±12% -3.93% (p=0.000 n=40+40) BM_ExtendCacheMiss/1000 23.8ms ± 5% 23.4ms ± 5% -1.68% (p=0.001 n=39+40) BM_ExtendCacheMiss/100000 10.1ms ± 5% 10.0ms ± 4% ~ (p=0.051 n=39+39) PiperOrigin-RevId: 495119444 Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
*	Allow Cord to store chunked checksums	Derek Mauro	2022-12-11	1	-7/+2
\| \| \| \| \|	PiperOrigin-RevId: 494587777 Change-Id: I41504edca6fcf750d52602fa84a33bc7fe5fbb48
*	Fixes many compilation issues that come from having no external CI	Derek Mauro	2022-11-30	1	-181/+211
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	coverage of the accelerated CRC implementation and some differences bewteen the internal and external implementation. This change adds CI coverage to the linux_clang-latest_libstdcxx_bazel.sh script assuming this script always runs on machines of at least the Intel Haswell generation. Fixes include: * Remove the use of the deprecated xor operator on crc32c_t * Remove #pragma unroll_completely, which isn't known by GCC or Clang: https://godbolt.org/z/97j4vbacs * Fixes for -Wsign-compare, -Wsign-conversion and -Wshorten-64-to-32 PiperOrigin-RevId: 491965029 Change-Id: Ic5e1f3a20f69fcd35fe81ebef63443ad26bf7931
*	CRC: Get CPU detection and hardware acceleration working on MSVC x86(_64)	Derek Mauro	2022-11-23	1	-0/+3
\| \| \| \| \| \| \|	Using /arch:AVX on MSVC now uses the accelerated implementation PiperOrigin-RevId: 490550573 Change-Id: I924259845f38ee41d15f23f95ad085ad664642b5
*	Release the CRC library	Derek Mauro	2022-11-09	1	-0/+691
	This implementation can advantage of hardware acceleration available on common CPUs when using GCC and Clang. A future update may enable this on MSVC as well. PiperOrigin-RevId: 487327024 Change-Id: I99a8f1bcbdf25297e776537e23bd0a902e0818a1