diff options
author | Abseil Team <absl-team@google.com> | 2022-05-26 08:32:42 -0700 |
---|---|---|
committer | Copybara-Service <copybara-worker@google.com> | 2022-05-26 08:33:47 -0700 |
commit | 591a2cdacb728f7c4f29782c31d7b694bb3a0782 (patch) | |
tree | 8ee2578ab7b20ac855778565503be39ef68fddf7 /absl/container/internal/raw_hash_set_test.cc | |
parent | 4bb6a0e045c211576c7066e7862ae212e8333639 (diff) | |
download | abseil-591a2cdacb728f7c4f29782c31d7b694bb3a0782.tar.gz abseil-591a2cdacb728f7c4f29782c31d7b694bb3a0782.tar.bz2 abseil-591a2cdacb728f7c4f29782c31d7b694bb3a0782.zip |
Optimize SwissMap iteration for aarch64 by 5-6%
Benchmarks: https://pastebin.com/tZ7dr67W. Works well especially on smaller ranges.
After a week on spending optimizing NEON SIMD where I almost managed to make hash tables work with NEON SIMD without performance hits (still 1 cycle to optimize and I gave up a little), I found an interesting optimization for aarch64 to use cls instruction (count leading sign bits).
The loop has a property that ctrl_ group is not matched against count when the first slot is empty or deleted.
```
void skip_empty_or_deleted() {
while (IsEmptyOrDeleted(*ctrl_)) {
uint32_t shift = Group{ctrl_}.CountLeadingEmptyOrDeleted();
ctrl_ += shift;
slot_ += shift;
}
...
}
```
However, `kEmpty` and `kDeleted` have format of `1xxxxxx0` and `~ctrl & (ctrl >> 7)` always sets the lowest bit to 1.
In naive implementation, it does +1 to start counting zero bits, however, in aarch64 we may start counting one bits immediately. This saves 1 cycle and 5% of iteration performance.
Then it becomes hard to find a supported and sustainable C++ version of it.
`__clsll` is not supported by GCC and was supported only since clang 8, `__builtin_clrsb` is not producing optimal codegen for clang. `__rbit` is not supported by GCC and there is no intrinsic to do that, however, in clang we have `__builtin_bitreverse{32,64}`. For now I decided to enable this only for clang, only if they have appropriate builtins.
PiperOrigin-RevId: 451168570
Change-Id: I7e9256a60aecdc88ced4e6eb15ebc257281b6664
Diffstat (limited to 'absl/container/internal/raw_hash_set_test.cc')
-rw-r--r-- | absl/container/internal/raw_hash_set_test.cc | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/absl/container/internal/raw_hash_set_test.cc b/absl/container/internal/raw_hash_set_test.cc index 914ec0c9..c79f8641 100644 --- a/absl/container/internal/raw_hash_set_test.cc +++ b/absl/container/internal/raw_hash_set_test.cc @@ -262,16 +262,20 @@ TEST(Group, CountLeadingEmptyOrDeleted) { for (ctrl_t empty : empty_examples) { std::vector<ctrl_t> e(Group::kWidth, empty); + EXPECT_TRUE(IsEmptyOrDeleted(e[0])); EXPECT_EQ(Group::kWidth, Group{e.data()}.CountLeadingEmptyOrDeleted()); for (ctrl_t full : full_examples) { - for (size_t i = 0; i != Group::kWidth; ++i) { + // First is always kEmpty or kDeleted. + for (size_t i = 1; i != Group::kWidth; ++i) { std::vector<ctrl_t> f(Group::kWidth, empty); f[i] = full; + EXPECT_TRUE(IsEmptyOrDeleted(f[0])); EXPECT_EQ(i, Group{f.data()}.CountLeadingEmptyOrDeleted()); } std::vector<ctrl_t> f(Group::kWidth, empty); f[Group::kWidth * 2 / 3] = full; f[Group::kWidth / 2] = full; + EXPECT_TRUE(IsEmptyOrDeleted(f[0])); EXPECT_EQ( Group::kWidth / 2, Group{f.data()}.CountLeadingEmptyOrDeleted()); } |