Basic Threading in C
Often when we are programming there are situations where we need to perform the same operations over and over again. The for loop is the usual choice. However, when the order does not matter, we can do better: we can split execution between multiple threads, allowing many processors to perform the work and hence speeding execution.
For example, consider the simple unthreaded code, initial.c:
#include <stdio.h>
#define ARRAYSIZE 17
int main(void) {
int array[ARRAYSIZE];
int i;
/* puts i^2 into array positions i=0 to ARRAYSIZE-1 */
for (i=0; i<ARRAYSIZE; i++) {
array[i]=i*i;
}
/* Display Result */
for (i=0; i<ARRAYSIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");
return 0;
}
gcc initial.c -o initial -Wall
The command line should be similar for your compiler. When we run the
program, it gives the following output:
0 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256
initial.c is a simple program: it defines an array of length 17, calculates the squares of 0-16, stores them in the array, and then displays the contents of the array. Note that there is no advantage to calculating the square of 5 before calculating the square of 14; order does not matter. Parallelizing such a for loop can always be done with the following steps:
Step 1: Make a function that only does the loop
The new function that we create will take parameters for the beginning and ending points of the loop as well as any variables needed by the loop. In our case, the loop needs access to the array. Put a new for loop where the old one used to be that repeatedly calls the new function with different start and stop values. The following code does exactly this, and calls the new function a total of NUMTHREADS times. For now, the calls take place one after another, but ultimately we will not wait for each function call to finish before starting the next. We now have step1.c:
#include <stdio.h>
#define ARRAYSIZE 17
#define NUMTHREADS 4
/* puts i^2 into array positions i=start to stop-1 */
void* squarer(int start, int stop, int* array) {
int i;
for (i=start; i<stop; i++) {
array[i]=i*i;
}
return NULL;
}
int main(void) {
int array[ARRAYSIZE];
int i, start, stop;
/*
this has the effect of rounding up the number of tasks
per thread, which is useful in case ARRAYSIZE does not
divide evenly by NUMTHREADS.
*/
int tasksPerThread=(ARRAYSIZE+NUMTHREADS-1)/NUMTHREADS;
/* Do the loop in NUMTHREADS pieces */
for (i=0; i<NUMTHREADS; i++) {
start=i*tasksPerThread;
stop=(i+1)*tasksPerThread;
/* if the tasks do not divide evenly, we could have stop too big.
Check and fix. */
if (stop>ARRAYSIZE) {
stop=ARRAYSIZE;
}
squarer(start, stop, array);
}
/* Display Result */
for (i=0; i<ARRAYSIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");
return 0;
}
Step 2: Place all function inputs in a struct
The pthread library only allows you to pass one argument to your function and if you followed step 1, your function currently takes at least two. We can avoid this limitation by placing all parameters in a struct. While we are at it, we go ahead and prepare all the structs we will need before the for loop that calls the new function. This brings us to step2.c:
#include <stdio.h>
#define ARRAYSIZE 17
#define NUMTHREADS 4
struct ThreadData {
int start, stop;
int* array;
};
/* puts i^2 into array positions i=start to stop-1 */
void* squarer(struct ThreadData* td) {
struct ThreadData* data=(struct ThreadData*) td;
int start=data->start;
int stop=data->stop;
int* array=data->array;
int i;
for (i=start; i<stop; i++) {
array[i]=i*i;
}
return NULL;
}
int main(void) {
int array[ARRAYSIZE];
struct ThreadData data[NUMTHREADS];
int i;
/*
this has the effect of rounding up the number of tasks
per thread, which is useful in case ARRAYSIZE does not
divide evenly by NUMTHREADS.
*/
int tasksPerThread=(ARRAYSIZE+NUMTHREADS-1)/NUMTHREADS;
/* Divide work for threads, prepare parameters */
for (i=0; i<NUMTHREADS; i++) {
data[i].start=i*tasksPerThread;
data[i].stop=(i+1)*tasksPerThread;
data[i].array=array;
}
/* the last calculation must not go past the end of the array*/
data[NUMTHREADS-1].stop=ARRAYSIZE;
/* Do each part */
for (i=0; i<NUMTHREADS; i++) {
squarer(&data[i]);
}
/* Display Result */
for (i=0; i<ARRAYSIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");
return 0;
}
Step 3: Include threading library
The threading library for C and C++ is called pthread, short for POSIX threads. We modify the previous code to include pthread.h and use its functions to make asynchronous calls to squarer. We now have the final version of the code, threaded.c.
#include <pthread.h>
#include <stdio.h>
#define ARRAYSIZE 17
#define NUMTHREADS 4
struct ThreadData {
int start, stop;
int* array;
};
/* puts i^2 into array positions i=start to stop-1 */
void* squarer(struct ThreadData* td) {
struct ThreadData* data=(struct ThreadData*) td;
int start=data->start;
int stop=data->stop;
int* array=data->array;
int i;
for (i=start; i<stop; i++) {
array[i]=i*i;
}
return NULL;
}
int main(void) {
int array[ARRAYSIZE];
pthread_t thread[NUMTHREADS];
struct ThreadData data[NUMTHREADS];
int i;
/*
this has the effect of rounding up the number of tasks
per thread, which is useful in case ARRAYSIZE does not
divide evenly by NUMTHREADS.
*/
int tasksPerThread=(ARRAYSIZE+NUMTHREADS-1)/NUMTHREADS;
/* Divide work for threads, prepare parameters */
for (i=0; i<NUMTHREADS; i++) {
data[i].start=i*tasksPerThread;
data[i].stop=(i+1)*tasksPerThread;
data[i].array=array;
}
/* the last thread must not go past the end of the array */
data[NUMTHREADS-1].stop=ARRAYSIZE;
/* Launch Threads */
for (i=0; i<NUMTHREADS; i++) {
pthread_create(&thread[i], NULL, squarer, &data[i]);
}
/* Wait for Threads to Finish */
for (i=0; i<NUMTHREADS; i++) {
pthread_join(thread[i], NULL);
}
/* Display Result */
for (i=0; i<ARRAYSIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");
return 0;
}
Depending on your compiler, you may or may not have to explicitly tell it
to link to the pthread library. If you use
mex to make your C code work in matlab, no further action is
necessary. gcc on the other hand requires
you to modify the command line as follows:
gcc threaded.c -o threaded -Wall -lpthread
This tutorial only covered parallelizing computations that are inherently independent of each other. More care is needed if your threads interact in any way.
Threads only work when memory is shared. For example, code implemented as in this tutorial will work with multicore or multiprocessor systems, but will not be able to take advantage of a cluster. To distribute your calculation across a cluster, you will need another technique, like MPI.