Concurrent Processes In Operating Systems The programming technique, to use interrupts to simulate the concurrent execution of several programs on Atlas computers was known as multiprogramming. It was pioneered by Tom Kilburn and David Howarth. Multiprogramming in early days was done using assembly level language. Slightest mistake in programs could make program unpredictable hence testing them was difficult also the assembly level language had no conceptual foundation. Operating systems designed using this multiprogramming techniques grew very huge and unpredictable their designers spoke about software crisis. This created an urgent research and development need for concurrent programming techniques. Computer scientists took the first step towards understanding the issues related to concurrent programming during mid 1960s, they discovered fundamental concepts, expressed them by programming notation, included them in programming languages and used these languages to write the model operating systems. These same concepts were then applied to any form of parallel computing. Introduction of Concurrent processes in operating systems Processes played a key role in shaping early operating systems. They were generally run in a strictly sequential order. Multiprogramming existed but the processes did not exactly run concurrently instead a time based mechanism was used in which a limited amount of time was given to each process. Even in those days the processors speed was fast enough to give and illusion that the multiple processes were running concurrently. They were called as timesharing or multiprogramming operating systems (November 1961, called CTSS Compatible Time-Sharing System also Multics the predecessors of UNIX developed by MIT) These type operating systems were very popular and were seen as a breakthrough during those times. The major drawback was complexity of the system design which made it difficult to make it more versatile and flexible so that a single all purpose OS could be built. Also the resource sharing done by these processes was primitive or inefficient and it only showed there was a lot of room for research and development. Work on these operating systems made way for concurrent processes. Most of the original concepts related to concurrency were developed during this period. These innovative ideas and concepts went on become the basic principles on which todays operating systems and concurrent applications are designed. (A major project undertaken by IBM in this direction was in 1964 the OS/360 for their new mainframes system 360) To build reliable concurrent processes understanding and developing basic concepts for concurrency was important let us talk about concurrency and some of its basic programming concepts. Concurrency In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. [Wikipedia] Let us consider a real life example a housing project such as the building of a house will require some work to go on in parallel with other works. In principle, a project like building a house does not require any concurrent activity, but a desirable feature of such a project is that the whole task can be completed in shorter time by allowing various sub tasks to be carried out concurrently. There is no reason any painter cannot paint the house from outside (weather permitting!), while the plasterer is busy in the upstairs rooms and the joiner is fitting the kitchen units downstairs. There are however some constraints on concurrency which is possible. The brick layer will normally have to wait until the foundation of the house had been layered before he could begin the task of building the walls. The various tasks involved in such a project can usually be regarded as independent of one another, but the scheduling of the tasks is constrained by notions of a task A must be completed b efore task B can begin A second example is that of a railway network. A number of trains making journeys within a railway network, and by contrast with the previous example, when they start and they end is generally independent of most of the other journeys. Where the journeys interact though is at places where routes cross or use common sections of track for parts of journeys. We can in this example regard the movement of trains as programs in execution, and the sections of track as the resources which these programs may or may not have to share with other programs. Hence the two trains run concurrently in case their routes interact sharing the same resources without interrupting each other similar to concurrent processes in operating systems. So as discussed earlier we understand that processes are important to implement concurrency so let us discuss the process as a concept which will introduce us to the most important concept for concurrency i.e. threads! Fundamental concepts Process A process is a running program; OS keeps track of running programs in form of processes and their data. A process is made of multiple threads. Threads The need to write concurrent applications introduced threads. In other words, threads are processes that share a single address space. Each thread has its own program counter and stack. Threads are often called lightweight processes as N threads have 1 page table, 1 address space and 1 PID while N processes have N page tables, N address spaces and N PIDs. Therefore, a sequence of executing instructions is called a thread that runs independently of other threads and yet can share data with other threads directly. A thread is contained inside a process. There can exist multiple threads within a process that share resources like memory, while different processes do not share these resources. A simple thread example There are two classes defined in this example namely SimpleThread which is a subclass of the Thread class and TwoThreads class. class SimpleThread extends Thread { public SimpleThread(String str) { super(str); } public void run() { for (int i = 0; i { System.out.println(i + + getName()); Try { sleep((int)(Math.random() * 1000)); } catch (InterruptedException e) {} } System.out.println(DONE! + getName()); } } The method SimpleThread() is a constructor which sets the Threads name used later in the program. The action takes place in the run() method which contains a for loop that iterates ten times that displays the iteration number and the name of the Thread, then sleeps for a random interval of up to a second. The TwoThreads class provides a main() method that creates two SimpleThread threads named London and NewYork. class TwoThreads { public static void main (String[] args) { new SimpleThread(London).start(); new SimpleThread(NewYork).start(); } } The main() method also starts each thread immediately following its construction by calling the start() method. Following concepts are mostly used at the thread level and also the issues discussed are encountered while implementing concurrency. Race condition A race condition occurs when multiple processes access and manipulate the same data concurrently, and the outcome of the execution depends on the particular order in which the access takes place.[http://www.topbits.com/race-condition.html] It is not so easy to detect race condition during program execution if it is observed that the value of shared variables is unpredictable, it may be caused because of race condition. In concurrent programming there are more than one legal possible thread executions hence order of thread execution cannot be predicted. Race condition may produce uncertain results. Outcome of race condition may occur after a long time. In order to prevent unpredictable results because of race condition, following methods are used- Mutual exclusion Mutual exclusion (often abbreviated to mutex) algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. (Wikipedia) -Critical Region (CR) A part of code that is always executed under mutual exclusion is called a critical region. Due to this, the compiler instead of the programmer is supposed to check that the resource is neither being used nor referred to outside its critical regions. While programming, critical section resides when semaphores are used. CRs are needed only if the data is writeable. It consists of two parts: Variables: These must be accessed under mutual exclusion. New language statement: It identifies a critical section that has access to variables. There are two processes namely A and B that contain critical regions i.e. the code where shared data is readable and writable. -Semaphores Semaphores are mechanisms which protect critical sections and can be used to implement condition synchronization. Semaphore encapsulates the shared variable and using semaphore, only allowed set of operations can be carried out. It can suspend or wake processes. The two operations performed using semaphores are wait and signal, also known as P and V respectively. When a process performs P operation it notifies semaphore that it wants to use the shared resource, if the semaphore is free the process gains access to the shared variable and semaphore is decremented by one else the process is delayed. If V operation is performed, then the process notifies the semaphore that it has finished using shared variable and semaphore value is incremented by one. By using semaphores, we attempt to avoid other multi-programming problem of Starvation. There are two kinds of Semaphores: Binary semaphores: Control access to a single resource, taking the value of 0 (resource is in use) or 1 (resource is available). Counting semaphores: Control access to multiple resources, thus assuming a range of nonnegative values. -Locks The most common way to implement mutex is using locks. A lock can be either locked or unlocked. The concept is analogues to locks we use in our doors; a person enters the room, locks the door and starts working and leaves the room after finishing the job, if another person wants to enter the room when one person is already inside, he has to wait until the door gets unlocked. Subtasks in a parallel program are often called threads. Smaller, lightweight versions of threads are known as fibres, which are used by some parallel computer architecture and bigger versions are called as processes. Many times threads need to change the value of shared variable, instruction interleaving between programs could be in any order For example, consider the following program: Thread A Thread B 1A -Read variable X 1B Read variable X 2A Increment value of X by 1 2B Increment value of X by 1 3A Write back to variable X 3B Write back to variable X As we can see in the example both the threads are carrying out same steps which are to read the shared variable, increment its value and write back its value to the same variable. It is clear how vital it is to execute these instructions in correct order, for instance if instruction 1A is executed between 1B and 3B it will generate an incorrect output. If locks are used by one thread, another thread cannot read, write the shared variable. Following example explains usage of locks: Thread A Thread B 1A Lock variable X 1B Lock variable X 2A Read variable X 2B Read variable X 3A Increment value of X by 1 3B Increment value of X by 1 4A Write back to variable X 4B Write back to variable X 5A Unlock variable X 5B Unlock variable X Whichever thread locks the variable first, uses that variable exclusively, any other thread will not be able to gain access to shared variable until it is unlocked again. Locks are useful for correct execution but on the other hand they slow down the program. -Monitors A monitor is a mutual exclusion enforcing synchronization construct. Monitors provide more structure than conditional critical regions and can be implemented as efficiently as semaphores. Monitors are supported by a programming language rather than by the operating system. They were introduced in Concurrent Pascal and are used as the synchronization mechanism in the Java language. A monitor consists of code and data. All of the data and some of the code can be private to the monitor, accessible only to the code that is part of the monitor. Monitor has a single lock that must be acquired by the task to execute monitor code i.e. mutual exclusion is provided by making sure that execution of procedures in the same monitor are not overlapped. Active task is the term used for the task which owns the monitor lock. There cannot be more than one active task in the monitor. The monitors lock can be acquired by a task through one of several monitor queues. It gives up the lock either by blocking a condition variable or by returning from a monitor method. A condition variable is a queue or event queue that is part of the monitor. Two monitor methods called as wait and notify can only be accessed by a condition variable queue. The behaviour of a monitor is known by the relative priorities and scheduling of various types of queues. The monitor locks are acquired by the processes in the monitor queues. The queues may be combined in some implementations. The tasks compete for the lock when the monitor lock becomes free. Condition Variable: In order to make sure that processes do not enter a busy waiting state, they should notify some events to each other; this facility is provided by Monitors with the help of condition variables. If a monitor function wants to proceed by making a condition true then it has to wait for the corresponding condition variable. When a process waits, it gives up the lock and is taken out from set of runnable processes. When a process makes condition true then it notifies a waiting process using condition variable. The methods mentioned above are used to prevent race condition but they might result into serious problems like deadlock and starvation let us have a look at these problems one at a time as we go further. Deadlock Deadlock refers to a specific condition where two or more processes are each waiting for each other to release a resource, or more than two processes are waiting for resources in a circular chain. Conditions for deadlock to occur 1] Mutual exclusion: Mutual exclusion means only one process can use a resource at a time. 2] Hold and wait: A process may hold a allocated resource while awaiting assignment of other resource. 3] No pre-emption: A resource can be released voluntarily by the process holding it. One process cannot use resource forcefully held by another process. A process that receives such resources cannot be interrupted until it is finished using the resource. 4] Circular wait: A closed chain of processes exists, such that each process holds a resource required by another process in the chain. Deadlock occurs only when circular wait condition is not resolvable and circular wait is not resolvable if first three conditions hold hence all four conditions taken together constitute necessary and sufficient condition for deadlock. In the diagram above we can see that process P1 holds resource R1 and requests for resource R2 held by process P2 , and process P2 is requesting for resource R1. Methods to handle Deadlock 1. Deadlock prevention Deadlock prevention is to ensure that one of the four necessary conditions for deadlock can never hold in following ways: I1. Mutual exclusion: allocate one resource to only one process at a time. 2. Hold and wait: It requires a process to request and be allocated its resources before it begins its execution, or allow process to request a resource only when process has none. This may lead to low resource utilization. It also may give rise to starvation problem, a process may be held for a long time waiting for all its required resources. The application need to be aware of all the resources it requires, if it needs additional resources it releases all the resources held and then requests for all those it needs. 3. No pre-emption: If a process is holding some resources and requests for another resource held by some other process that cannot be allocated to it, then it releases all the resources currently held. The state of pre-empted resource has to be saved and later restored. 4. Circular wait: To make this condition fail, we can impose a total ordering on all resources types. It is also required that each process requests resources in strict increasing order. Resources from the same resource type have to be requested together. 2. Deadlock avoidance In deadlock avoidance, the system checks if granting a request is safe or not . The system needs additional prior information regarding overall potential use of each resource for each process i.e. maximum requirement of each resource has to be stated in advance by each process. 3. Deadlock detection: It is important to know if there exists a deadlock situation in the system hence an algorithm is needed to periodically check existence deadlock. Recovery from deadlock To recover from deadlock, the process can be terminated or we can pre-empt the resource. In terminating processes method we can terminate all the processes at once or terminate one process and then again check for deadlock. Similarly there are mechanisms like fair scheduling that can be used to avoid starvation of resources. -Fair scheduling Fair scheduling is to allow multiple processes to fairly share the resources. The main idea is to ensure each thread gets equal CPU time and to minimize resource starvation. -First in first out (FIFO) FIFO or First Come, First Served (FCFS) is the simplest scheduling algorithm that queues processes in the order they arrive in the ready queue. Scheduling overhead is minimal because context switches occur only when process terminates and re-organization of the process queue is not required. In this scheme, completion of every process is possible, hence no starvation. -Shortest remaining time With this scheduling scheme, processes with least processing time are arranged as the next process in the queue. To achieve this, prior knowledge of completion time is required. Consider a scenario where a shorter process arrives when another process is running, in this case the current process is stopped and is divided into two parts. This results in additional context switching overhead. -Fixed priority pre-emptive scheduling The operating system gives a fixed priority rank to every process, and the processes are arranged in the ready queue based on their priority this results in higher priority processes interrupting lower priority processes. Waiting and response times are inversely proportional to priority of the process. If there are more high priority processes than low priority processes, it may result into starvation of the latter processes. -Round-robin scheduling In this scheduling algorithm, each process is allotted a fixed time unit. There could be extra overhead if time unit per process allotted is very small. Round robin has better average response time than rest of the scheduling algorithms. There cannot be starvation since processes are queued based on any priority. Also there are some desired Properties of Concurrent Programs; these properties will ensure a reliable concurrent program. There are some characteristics that a concurrent program must possess. They can be either a safety or a liveness property. Safety properties assert that nothing bad will ever happen during a program execution. Examples of safety property are: à ¢Ã¢â ¬Ã ¢ Mutual exclusion à ¢Ã¢â ¬Ã ¢ No deadlock à ¢Ã¢â ¬Ã ¢ Partial correctness A safety property is a condition that is true at all points in the execution of a program. Liveness properties assert that something good will eventually happen during a program execution. Examples include: à ¢Ã¢â ¬Ã ¢ Fairness (weak) à ¢Ã¢â ¬Ã ¢ Reliable communication à ¢Ã¢â ¬Ã ¢ Total correctness Communicating sequential process Communicating sequential process was introduced in a paper written by C. A. R. Hoare in 1978. In this paper he described how various sequential processes could run in parallel irrespective of the processor (i.e. it can be a single core or multi-core processor). CSP is an integration of two terms, Communication and Sequential process. A communication is an event that is described by a pair C, V, where C is the name of the channel on which communication takes place and V is the value of the message which passes through this channel by C .A. R. Hoare. In a Sequential Process new process cannot be started until the preceding process has completed. As CSP was more of a programming language so most of the syntax and notations were inherited from ALGOL 60 programming language. Most of the notations were single character instead of English words. For example,? and ! represents input and output respectively. CSP inherits the concept of Co routines over old programming structures such as subroutines. The structure of Co routines is comprised of COPY (copies character from output of one process to the input of second process), SQUASH is used to replace specified character with other characters, DISASSEMBLE, ASSEMBLE and REFORMAT. -OCCAM One of the renowned implementation of CSP is occam. It is named after William of Ockam. It is a strict procedural language. It was developed at INMOS. Occam2 programming language is used in most of the software developing companies across the world. It is an extension of occam1 which lacks multi-dimension arrays, functions and other data type support. Occam2 came into existence in 1987s. The latest version is occam2.1 which was developed in 1994. BYTESIN operator, fixed-length array returned from procedures, named data types etc. were some of the new features of occame2.1. the compiler designed for occam2.1 named KRoC (Kent Retargetable occam Compiler) is used to create machine code from different microprocessors. Occam-pi is the name of the new occam variant which is influenced by pi-calculus. It is implemented by newer versions of KRoC. JCSP Java programming language also implements the concept of CSP by JCSP. JCSP is a complete programming implementation of CSP i.e. it does not contain deep mathematical algebra. JCSP is used to avoid race condition, deadlock, live lock and starvation programmatically via java programs. The main advantage of JCSP is that most of the algebraic part is already developed and stored in libraries so the programmer does not require strong mathematical skills. To invoke a method he needs to import these inbuilt libraries. Concurrency Test Tools Design a concurrent application is very challenging task. Maintaining interaction between concurrently executing threads is very difficult task for programmer. It is very difficult to understand the nature of threads from one run of a program as they are nondeterministic. As result, it becomes very difficult for testing and debugging. So it is good idea to invest in techniques which can avoid this conditions aid in the process of development. We are exploring these ideas with tools for concurrency. CHESS This is one of the important tools, created by Microsoft Research, which is used to test multithreaded code systematically. CHESS facilitates both model checking and dynamic analysis. It has the potential to detect race conditions, livelocks, hangs, deadlocks and data corruption issues. Concurrency errors are detected by investigating thread schedules and interleaving and for this it chooses a specialized scheduler on which it repeatedly runs regular unit test. The specialized scheduler creates specific thread interleaving. CHESS controls state space explosion using iterative context bounding which puts a limitation on number of thread switching. This supports scientifically experimented concept that most of the concurrency bugs can be revealed with less number of thread switches. This concept is far better than traditional model checking. CHESS uses Goldilocks lockset algorithm to detect deadlock and race condition. For reporting a livelock, it is anticipated that programmes terminate and exhibit fairness for all threads. THE INTEL THREAD CHECKER Similar to CHESS, INTEL THREAD CHECKER is used for detecting problems in concurrency like data races and deadlock and it also finds out erroneous synchronization. The thread checker makes use of source code or the compiled binary for making memory references and to monitor WIN32 synchronization primitive. At the time of execution, information given by the compiled binary is used for constructing partial order of execution; this step is followed by happens before analysis of the partial order obtained. For improving efficiency and performance, it is better to remember latest access to shared variable than to remember all accesses. The disadvantage of this tool is it cannot find all bugs while analysing long-running applications. RACERX Unlike first two dynamic analysis tools we have discussed above, RACERX is a static analysis tool. It is not required to comment the entire source code rather user gives table which contains specification of APIs which are useful in gaining and releasing locks. Using such small sized tables proves to be advantageous because they lessen the overhead of annotating entire source code. The working of RACERX is carried out in several phases. In the first phase, RACERX builds a Control Flow Graph once it has iterated through each source code file. CFG consists of information about function calls, use of pointers, shared memory and other data. When building CFG is done, calls to these APIs are marked. This first phase is followed by analysis phase which involves checking race condition and deadlock. The last phase is post processing errors reported, the purpose is to prioritize errors by their significance and harmfulness. CHORD This tool is used for Java language, it is context sensitive static analysis tool. Its flow insensitive nature makes it more scalable than other static tools with the disadvantage of low accuracy. It also deals with the distinguishing synchronization primitives available in Java. ZING ZING, a pure model checker tool, verifies the design of multi threaded programs. It has the ability to model concurrent state machines using its own language that describes complex states and transition. It assures the design quality by verifying assumptions and confirming the presence or absence of some conditions. KISS Microsoft Research developed another model checker tool, named KISS (Keep It Simple and Sequential) for concurrent C programs. It converts a concurrent C program into a sequential program that features the operation of interleaving and controls non-determinism. Thereafter, the analysis is performed by a sequential model checker. While using this tool, the programmer is expected to justify the validation of concurrency assumptions. Introduction of multi-core processors increased the importance on concurrency by many folds. Concurrency and multicore processor Multi core processors The computer industry is undergoing a paradigm shift. Chip manufacturers are shifting development resources away from single-processor chips to a new generation of multi-processor chips known as multicores. Multiple processors are manufactured by placing them on the same die. Hence they share the same circuit. A die is a small block of semiconducting material, on which a given functional circuit is fabricated. A) Single Core B) Multi Core Why were they introduced? As we grow further in terms of processing power the hardware industry faces three main challenges Power Amount of power consumed by processors has been increasing as more and more powerful processors have been introduced to the market. The environment cost and the energy needs have compelled the manufacturer as well as organisations to reconsider their strategies to an extent where change in way the processors are manufactured or operate was inevitable. Processors can be overclocked or underclocked. Overclocking a processor increases the number of instructions it can execute but at the same time increases the power consumption; also overclocking a processor does not guarantee a performance improvement as there are many other factors to consider. Increasing the number of processors per core (quad or eight) will further improve the power to performance ratio. Memory clock Memory clock has not improved like the CPU clock hence adding a limitation on the processor performance. Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. So instead of building faster CPUs underclock it and have more number of cores with their own dedicated memories to have more instructions executed in the same given time. Also the clock speed in itself wont grow infinitely due to fundamental physics it has hit a wall. Chips melt above 5GHz of clock speed. Many possibilities are opened by placing two or more powerful computing cores on a single processor. True concurrent applications can be developed only on multicore processors. On single core processors concurrent applications can overload the processor degrading the performance of the application. On multi-core systems, since each core has its own cache, the operating system has sufficient resources to handle most compute intensive tasks in parallel. What are the effects of the hardware shift on concurrent programming? The free lunch of performance in terms of ever faster processors is over- Microsoft C++ guru Herb Sutter. For past five decades the ever increasing clock speed has carried the software industry through its progress but now the time has come for the software engineers to face the challenge staring directly at them which they have managed to ignore so far. Also as more and more cores are added to hardware the gap between the hardware potential and the s