Modern operating systems support running multiple threads within the same process. In Windows, threads are the base unit of operation, and processes are simply logical wrappers for one or more threads vying for CPU time.
When multiple threads are executing on a multicore system, the operating system tries to spread them across the available cores in an optimal manner. Suppose you have 4 computationally intensive threads running on a 4-core system. Ideally, each will run on its own core and all will execute simultaneously. In this optimal case, this program will run around 4 times faster on a 4-core system than on a single core system.
The Shared Data Problem
Unfortunately, most games and simulation programs have sophisticated relationships between their state data, and the multiple threads can't operate completely independently. There are times when more than one thread needs access to the same piece of data. Since the OS is starting and stopping threads in an indeterminant manner based on the current state of the system, if one thread starts to modify a piece of shared data, it may stop executing before the modification completes, and another thread using the same piece of data may use it in the partially modified state.
To avoid this problem, a locking mechanism must be used around any code that accesses shared data. The OS provides these mechanisms, examples of which include Critical Sections, Mutexes, and Semaphores.
It turns out that when a program has just a few intense threads with a shared data dependency, the system will slow to a crawl, as each thread is spending most of its time waiting for access to the shared data. Even when running on a multi-core chip, only one core can be utilized effectively.
So, programmers of multi-threaded systems must be very intentional in what threads exist and what data they depend on and try to spread this load in a logical manner that will best utilize a multi-core system. This is a demanding intellectual challenge, and very difficult to get right. In games, usually only a few very well defined threads are created in an effort to minimize lock waiting. Software complexity goes way up, along with all the associated testing and maintenance costs.
Another performance issue with threads is the cost of switching between them. A thread is a pretty heavy weight OS construct with its own operating state. Switching between threads requires saving this state data for one thread and re-loading the state data for the thread that is about to start. This can be an expensive operation.
So, even if we have no shared data dependencies in an application, we aren't free to create numerous threads (say, 100,000 threads), without incurring great cost. We still only have a few cores so the OS is having to time-share these threads across the available cores and each time it switches out which thread is currently running there is a costly context switch.
But threads still have their uses
There are two areas in which threads are extremely useful, and they are used in these capacities within Gaffer.
- As stated previously, the OS will spread the execution of multiple threads across available cores. So, if we want to utilize all available cores, we must create at least as many threads as there are cores.
- Handling IO intensive operations (disk, network, etc.) with separate threads allows other threads to continue operating. If we're reading a mesh file from disk, for example, doing this in a separate thread allows us to continue other activities on the CPU while we wait. This is called Asynchronous IO.