Inserting Using std::map/set::lower_bound

Inserting into a map or set has the same type of performance, because they keep their elements sorted, so I will treat them identically; inserting requires finding the insert position (`logn` - logarithmic), and inserting. Reference

Similarly, a find operation is`logn` for both.

Similarly, a find operation is

A task: add a list of elements, probably with many duplicates, to a set or a map. So the first question one might ask, as I did, without proper knowledge: is it faster to insert each, or to check if they exist before inserting?

12 | for each element in list: insert into set; |

or

123 | for each element in list: if not exists in set: insert into set; |

The answer should be obvious: inserting is `logn`, searching is logn, so checking for existence should be slower than just inserting, since that makes it `logn + (if exists logn, else 0)`, versus a simple `logn` for insertion.

Benchmarks, however (with optimize flag 1/2/3, and without, in g++) showed that searching seems to be a bit faster. Odd. I had some guesses as to cachine...

But then a thought occurred: I can check if exists, and in so doing, obtain the pointer to where it should be. This would be great because inserting when you know the precise desired position is `O(1)`. Of course, `std::set::find()` and `std::map::find()` don't work that way; non-existence returns `std::set::end()` and `std::map::end()` respectively. They don't return anything like `std::pair`.

Then I thought of a new way to google for the problem and found the blindingly obvious solution: **lower_bound**. That does exactly what I want:

12345 | for each element in list of elements to add: iterator = set::lower_bound(element); if iterator == set::end() OR iterator points to an object not equal to element: insert into set at iterator position; |

Boom. While `find()` is the obvious solution, it doesn't work that way; `lower_bound()` is the correct solution for checking whether you should insert. Reference

Benchmarks show 15-30% improvement in doing this (depending on the dataset and the -Ox flag). As always, 90% of the problem was figuring out what question to ask google, and how to ask it.

Ches Koblents

October 9, 2012

October 9, 2012