Cookie-Erklärung

Diese Webseite verwendet Cookies, um die technische Funktionalität sicherzustellen, das Benutzererlebnis zu verbessern und uns dabei zu helfen, die Webseite für unser Besucher zu optimieren. Erfahren Sie mehr in unserer Datenschutzerklärung

Notwendige Cookies

Diese Cookies sind für die Funktionen der Webseite erforderlich und erleichtern deren Bedienung. Ohne diese Cookies könnten wir Ihnen z.B. Dienste wie Kontaktformulare und Regionsauswahl nicht anbieten. Diese Cookies sammeln anonymisierte Informationen, sie können nicht Ihre Bewegungen auf anderen Websites verfolgen.

Cookies anzeigenCookies ausblenden

Name

Domain

__ga

censhare.com

__gid

censhare.com

allowCookies

censhare.com

prefGeo

censhare.com

Performance Cookies

Diese Cookies sammeln Informationen darüber, wie Sie unsere Webseiten verwenden. Damit können wir erkennen, welche Teile unseres Internetangebots besonders populärer sind und auf diese Weise unser Angebot für Sie verbessern. Diese Cookies speichern keine Informationen, die eine Identifikation des Nutzers zulassen. Die gesammelten Informationen werden aggregiert und somit anonym ausgewertet.

Cookies anzeigenCookies ausblenden

Name

Domain

__hssc

censhare.com

__hssrc

censhare.com

__hstc

censhare.com

_vwo_uuid_v2

censhare.com

hubspotutk

hubspot.com

__gat_UA-8

censhare.com

Marketing Cookies

Cookies für Marketingzwecke werden genutzt, um gezielt relevante Werbeinhalte für den Nutzer auszuspielen, die Erscheinung von Anzeigen zu steuern und die Effektivität von Werbekampagnen zu messen.

Cookies anzeigenCookies ausblenden

Name

Domain

1P_JAR:

google.de

CONSENT

google.de

NID

google.de

globe icon

This website is also available in English.

Select TerritoryDE - Deutsch
Blog

Yet another compressed bitset with union, intersection and difference implementations

In the last post I introduced a data structure and algorithm to implement a concurrent memory efficient long set and long-to-long map. The long set can be regarded as some kind of compressed bitset. This post focuses on adding the classic bitset operations for union (or), intersection (and) and difference (and-not).

Two approaches are implemented:

  • OpNode (see class IDTrieOps): An expression tree like approach with iterators that work like a merge sort (think zip fastener) and return the next value for the result of an operator. So with (2, 5, 10) and (4, 5, 7, 10) the implementation of the and operator will walk through both lists until it finds matching pairs (5, 10). As a performance optimization it will skip values in one list, if the current value in the other list is ahead (and, and-not). The result is calculated on the fly without creating any temporary results.

  • SetOps (see class IDTrieSetOps): A more classic bitset approach where the leaf values (64 bits are stored in one long) are combined using bitwise and, or and and-not. Inner nodes are handled in a quite similar way. Each operation creates an effective result: a new IDSetTrie.

For details of the algorithms look at the source code that you can find here.

Now for the important part: performance. Of course without actual data the following measurements won't reflect a real use case, but at least an artificial test reveals some common patterns.

The test evaluates the expression "(a and b) or c" where each bitset has up to 10,000,000 entries using increments of 1 (dense population), 100 (sparse population) and 1000 (even more sparse population). All tests were performed on a MacBook Pro with a 2.3 GHz Intel i7 (4 cores / 8 threads). BitSet is the standard java.util.BitSet.

test with size: 10000000 (a and b) or c,  increments: 1, 1, 1
OpNode: 301422us, count: 10000000, checksum: 50000005000000
SetOps: 51874us, count: 10000000, checksum: 50000005000000
BitSet: 50682us, count: 10000000, checksum: 50000005000000

Java BitSet is the best, but SetOps follows closely, OpNode falls far behind because iterating through the node values takes its time.

test with size: 10000000 (a and b) or c,  increments: 100, 100, 100
OpNode: 5054us, count: 100000, checksum: 499995100000
SetOps: 2738us, count: 100000, checksum: 499995100000
BitSet: 1252us, count: 100000, checksum: 499995100000

If the set is sparsely populated, OpNode performance isn't that bad.

test with size: 10000000 (a and b) or c,  increments: 1000, 1000, 1000
OpNode: 589us, count: 10000, checksum: 49995010000
SetOps: 390us, count: 10000, checksum: 49995010000
BitSet: 986us, count: 10000, checksum: 49995010000

If even more sparsely populated, BitSet even falls behind OpNode and SetOps shines.

So it's not that easy to choose the best approach. And it even gets worse taking into account multiple parallel threads. Then it's the combination of CPU performance, available CPU-Cache (L1/L2/L3) and especially memory bandwidth that influence the results:

test with size: 10000000 (a and b) or c,  increments: 100, 100, 100
test with 4 threads
OpNode: 5912us, count: 100000, checksum: 499995100000
SetOps: 8224us, count: 100000, checksum: 499995100000
BitSet: 3399us, count: 100000, checksum: 499995100000
test with size: 10000000 (a and b) or c,  increments: 100, 100, 100
test with 8 threads
OpNode: 11841us, count: 100000, checksum: 499995100000
SetOps: 18833us, count: 100000, checksum: 499995100000
BitSet: 8389us, count: 100000, checksum: 499995100000
test with size: 10000000 (a and b) or c,  increments: 1000, 1000, 1000
test with 4 threads
OpNode: 911us, count: 10000, checksum: 49995010000
SetOps: 712us, count: 10000, checksum: 49995010000
BitSet: 2945us, count: 10000, checksum: 49995010000
test with size: 10000000 (a and b) or c,  increments: 1000, 1000, 1000
test with 8 threads
OpNode: 740us, count: 10000, checksum: 49995010000
SetOps: 949us, count: 10000, checksum: 49995010000
BitSet: 6495us, count: 10000, checksum: 49995010000

Ouch! Without memory efficient storage BitSet saturates the memory bandwidth and doesn't scale. So you definitely don't want to use BitSet on your expensive eight-core server class machine.

Over all the best solution could be to combine OpNode and SetOps. A simple but well working solution is to use OpNode as default but replace it with SetOps if the left and right arguments are of type IDSetTrie and the expected result sizes are large.

But your mileage may vary...

Read part I here: Efficient concurrent long set and map

Walter Bauer Walter Bauer

Walter Bauer ist einer der Gründer der censhare AG und leidenschaftlicher Software-Designer und -Programmierer mit einem Spleen für Performance-Optimierungen und Refactoring.

Kommentare