Optimizing Performance Through Parallelism: A Primer


Table of Contents
Introduction
Problem Description
Serial Implementation
Multi-threaded Implementation
A Distributed Implementation
Performance
Resources

Introduction

In this tutorial we will look at an example of how to turn a serial algorithm into one which has higher performance in symmetric multi-processing (shared-memory), as well as distributed memory environments. In order to fulfill this task, we will develop a simple application in three stages:

  1. A serial version.

  2. A multi-threaded version.

  3. A distributed multi-threaded version.

In addition to the theoretical aspects of parallel programming, some of the practical problems encountered when programming will be discussed. We have chosen to implement all of the examples in C++, and use the POSIX threads (pthreads), and MPI libraries for symmetric muti-processing, and distributed processing respectively.