select is buggy

select

poll

, and

epoll

provide the same functionality with different API.

Of them,

select

is the oldest and the best known. It is utilized widely. However,

select

and similar

pselect

do not scale well with the number of files, sockets, and other kernel objects that have file descriptors. Even worse,

select

and

pselect

may crash programs. They should not be used in enterprise‑strength codes. It is advisable to replace them at the first opportunity by

poll

ppoll

, or

epoll

that are safe.

Preliminary Notes and Terminology

File descriptors (
```
fd
```
) are Swiss knifes of UNIX / Linux. They serve as identifiers of all types of operation system kernel‑level objects: open «usual» files, directories, devices, sockets, pipes, events created by
```
eventfd()
```
, signal acceptors created by
```
signalfd()
```
, and many more. There is a saying that «everything is a file» in UNIX. Or, using more precise wording by Linus Torvalds, «everything is a file descriptor or a process».

As it is customary in manpages and other documentation, I am going to use a shorthand. Instead of saying that a function is working on an object with file descriptor

fd

, I will say that it works on the

fd

itself. For example, I'm going to say that a code reads from

fd

, instead of «from a socket that has file descriptor

fd

».

Pretty much everything that I am going to say about

select

is applied to

pselect

as well. Similarly, what I will tell about

poll

is also applied to

ppoll

. For the reason,

pselect

and

ppoll

are mentioned very seldom if at all.

Symptoms and Diagnostics

An application crashes, likely with a

SIGSEGV

signal. An usual call tracing in a dump file does not work. For an experienced programmer, the stack looks suspiciously reminiscent of what happens after out of bounds modifications of fixed size arrays. The stack is corrupted.

Debug prints may show that the crash happens when your program calls a standard system call

select

, or when preparing arguments for the call, or soon after returning from the call. Unfortunately, the debug prints not always are available. Or their last lines may be lost. Or a gibberish may be printed.

Unless you already suspected what is causing the crashes, it is unlikely that you are going to check values of

fd

monitored by

select

. If you will do it when crashes occurred, e.g. by logging the

fd

in a debug print or by running command

1	ls -l /proc/PID/fd \| less

where PID denotes a process ID, you may see that the

fd

was bigger than 1023.

Great! You probably found what crashes the program. Very likely your codes are OK, except they should not use

select

Why

select

should not be used

A few programmers would expect it from a standard facility, but

select

is buggy. The bug is in API, that is impossible to fix. Instead, it is documented as a limitation:

select

should not be used for monitoring

fd

that are equal or bigger than a compile‑time constant

FD_SETSIZE

defined in a header file. For Linux,

FD_SETSIZE

usually is equal to 1024.

What if you ignore the limitation and try to monitor a socket with a bigger

select

The Linux documentation does not answer the question. The behavior is undefined. Anything may happen. Particularly, the code may corrupt a stack. In a practice, it will corrupt a stack, though depending on content and layout of stack data, not all buggy programs crash as soon as

fd

exceeds 1023. They may crash after it surpasses a higher threshold, like 3000 or 6000. For smaller values,

select

is «only» going to corrupt local data stored on a stack, instead of crashing. Sarcasm is intended.

Note:

select

will corrupt stack and may crash an application when even a single watched

fd

is equal or exceeds

FD_SETSIZE

You need to monitor an absolutely legit

fd

assigned to a socket when it was created. Unfortunately, the

fd

is equal or exceeds

FD_SETSIZE

. What can you do in such a situation other than print a diagnostic message and abort? So, the limitation actually is a bug!

Informal explanation

In most of doctor's offices, you must fill a form describing your health. Some of the forms contain very long lists of health problems, that you must read thoroughly and check the respective boxes. Other forms simply provide a few blank lines, which you fill by names of relevant illnesses. Alternatively, a staff just asks if anything changed since your previous visit. API of

select

poll

and

epoll

are similar to the above forms / questions:

Arguments of
```
select
```
are long lists of checkboxes, one per
```
fd
```
. Most of the checkboxes are irrelevant and will be left empty.
```
poll
```
takes a list of
```
fd
```
to watch; it does not care about the irrelevant ones. You must supply the information at every call to both
```
select
```
and
```
poll
```
, much as you must fill forms at every visit to some of doctors offices.
The
```
epoll
```
facility is more like a doctor office that has your information on file and asks only for updates. The first time, a calling code tells
```
epoll
```
what
```
fd
```
must be added to a monitored list. Subsequently, it provides only updates: what new
```
fd
```
to add, what to delete, and similar.

It is pretty obvious that

poll

has a more efficient API than

select

, because the former does not read and discard the irrelevant checkboxes. The bigger issue with

select

is that, unlike with paper forms processed manually, in programming you cannot write on margins of fixed size forms. If its size is fixed, it is fixed. Filling more checkboxes than a maximum provided means writing out of bounds of a fixed size array. Assuming that the array is in a stack, the stack will be corrupted.

Solution

Just replace all

select

in your codes by

poll

. This is the easiest way to fix the problem. The

poll

API does no impose the arbitrary limitations on values of

fd

poll

can monitor any

fd

As a bonus, the

poll

API usually is more efficient. Some people say that this is the reason why

poll

is preferable to

select

. For me it sounds as a recommendation to keep out of a plague because the disease causes headaches. Though technically correct, it would be a serious understatement.

The difference is not abstract: it affects business decisions. Codes that are prone to crash should be fixed as soon as possible, typically in the next maintenance release. On the other hand, it is seldom advisable to modify correct, stable and proved production applications just to eliminate small to moderate inefficiencies.

Notes:

Sometimes it is more appropriate to use
```
epoll
```
. Just somewhat less simple than replacing
```
select
```
by
```
poll
```
.
There are workarounds the system limitations on
```
select
```
, but the workarounds are unreliable and/or much more complicated than replacing
```
select
```
by
```
poll
```
. And they are very inefficient.

Yuriy Koblents-Mishke