Inserting Using std::map/set::lower_bound
Inserting into a map or set has the same type of performance, because they keep their elements sorted, so I will treat them identically; inserting requires finding the insert position (
logn - logarithmic), and inserting.
Reference
Similarly, a find operation is
logn for both.
A task: add a list of elements, probably with many duplicates, to a set or a map. So the first question one might ask, as I did, without proper knowledge: is it faster to insert each, or to check if they exist before inserting?
12 | for each element in list:
insert into set;
|
or
123 | for each element in list:
if not exists in set:
insert into set;
|
The answer should be obvious: inserting is
logn, searching is logn, so checking for existence should be slower than just inserting, since that makes it
logn + (if exists logn, else 0), versus a simple
logn for insertion.
Benchmarks, however (with optimize flag 1/2/3, and without, in g++) showed that searching seems to be a bit faster. Odd. I had some guesses as to cachine...
But then a thought occurred: I can check if exists, and in so doing, obtain the pointer to where it should be. This would be great because inserting when you know the precise desired position is O(1). Of course, std::set::find() and std::map::find() don't work that way; non-existence returns std::set::end() and std::map::end() respectively. They don't return anything like std::pair.
Then I thought of a new way to google for the problem and found the blindingly obvious solution: lower_bound. That does exactly what I want:
12345 | for each element in list of elements to add:
iterator = set::lower_bound(element);
if iterator == set::end() OR
iterator points to an object not equal to element:
insert into set at iterator position;
|
Boom. While
find() is the obvious solution, it doesn't work that way;
lower_bound() is the correct solution for checking whether you should insert.
Reference
Benchmarks show 15-30% improvement in doing this (depending on the dataset and the -Ox flag). As always, 90% of the problem was figuring out what question to ask google, and how to ask it.
Ches Koblents
October 9, 2012