Friday, December 7, 2012

Multithreading - Part 2: Synchronization


Consider the program which we mentioned in "Multithreading - Part 1: Basics"


Program:

#include <stdio.h>
#include <pthread.h>


int gvar;

void* t1_fun(void *msg)
{
    int i;
    for(i = 0; i < 10; i++)
    {   
        gvar++;
        /*printf("[%d:%s] : gvar = %d.\n", pthread_self(), (char*) msg, gvar);*/
        printf("[%d:%s] : gvar = %d.\n", (long int)syscall(224), (char*) msg, gvar);
    }   
}

int main()
{
    pthread_t t1, t2; 
    char *msg1 = "thread1";
    char *msg2 = "thread2";

    int rc1 = pthread_create(&t1, NULL, t1_fun, (void *) msg1);
    if( rc1 ) { printf("Error for thread 1.\n"); perror("pthread_create"); }
    int rc2 = pthread_create(&t2, NULL, t1_fun, (void *) msg2);
    if( rc2 ) { printf("Error for thread 1.\n"); perror("pthread_create"); }

    /* If the 2nd arg is NOT NULL, the return value of t1 and t2 will be stored there.
     * This pthread_join will suspend the main thread until the t1/t2 completes, terminates or by calling
     * pthread_exit. 
     */
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);    
    printf("Threads execution over.\n");
    return 0;
}

NOTE that if there is no load in the system, this program might print the value of "gvar" as 20 as expected (each thread increments gvar 10 times; hence 20).
But, if you increase the load in the system or if you increase the count in the for loop (replace the count 10 with 50), you can see the threads will fight each other to increment the global variable "gvar". At the  end of the program, if you see, the "gvar" will not be "100" as expected; but rather less (see below).


[5164:thread1] : gvar = 95.
[5164:thread1] : gvar = 96.
...
...
[5165:thread2] : gvar = 95.
[5165:thread2] : gvar = 97.


This kind of situation is called "race condition" where two or more threads will race with each other to change the state of a variable without bothering about the other one.

To avoid this, we have to use mutex.

Mutex (ONLY BETWEEN THREADS IN A SINGLE PROCESS; NOT ACROSS PROCESSES LIKE SEMAPHORE):
Mutexes are used to prevent data inconsistencies due to operations by multiple threads upon the same memory area performed at the same time or to prevent race conditions where an order of operation upon the memory is expected. A contention or race condition often occurs when two or more threads need to perform operations on the same memory area, but the results of computations depends on the order in which these operations are performed. Mutexes are used for serializing shared resources such as memory. Anytime a global resource is accessed by more than one thread the resource should have a Mutex associated with it. Once can apply a mutex to protect a segment of memomry ("critical region") from other threads. Mutexes can be applied only to threads in a single process and do not work between processes as do semaphores.

Modified Program with Mutex:


#include <stdio.h>
#include <pthread.h>


/* Note scope of variable and mutex are the same */
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

int gvar;
void* t1_fun(void *msg)
{
    int i;
    for(i = 0; i < 100; i++)
    {   
        pthread_mutex_lock( &mutex );
        gvar++;
        /*printf("[%d:%s] : gvar = %d.\n", pthread_self(), (char*) msg, gvar);*/
        printf("[%d:%s] : gvar = %d.\n", (long int)syscall(224), (char*) msg, gvar);
        pthread_mutex_unlock( &mutex );
    }   
}
...
...
Program Output:
$> gcc sample.c -lpthread
$> ls
a.out sample.c

$> ./a.out

[5164:thread1] : gvar = 99.
[5164:thread1] : gvar = 100.
...
...
[5165:thread2] : gvar = 199.
[5165:thread2] : gvar = 200.
Now, no matter how much ever you try executing this program, "gvar" at the end of the program will be always 200 (if you specify 100 as loop count in for loop).
NOTE THAT THE SCOPE OF THE VARIABLE AND THE MUTEX VARIABLE IS SAME. MEANS, GLOBAL VARIABLE PROTECTION SHOULD USE GLOBAL MUTEX VARIABLE!!

Multithreading - Part 1: Basics


Threads: Light-weight Processes (LWP); mostly useful if created in user-space.

Program:

#include <stdio.h>
#include <pthread.h>

int gvar;
void* t1_fun(void *msg)
{
    int i;
    for(i = 0; i < 10; i++)
    {   
        gvar++;
        /*printf("[%d:%s] : gvar = %d.\n", pthread_self(), (char*) msg, gvar);*/
        printf("[%d:%s] : gvar = %d.\n", (long int)syscall(224), (char*) msg, gvar);
    }   
}

int main()
{
    pthread_t t1, t2; 
    char *msg1 = "thread1";
    char *msg2 = "thread2";

    int rc1 = pthread_create(&t1, NULL, t1_fun, (void *) msg1);
    if( rc1 ) { printf("Error for thread 1.\n"); perror("pthread_create"); }
    int rc2 = pthread_create(&t2, NULL, t1_fun, (void *) msg2);
    if( rc2 ) { printf("Error for thread 1.\n"); perror("pthread_create"); }
    
    printf("Threads execution over.\n");
    return 0;
}

Points to Note:
1. The function prototype of the thread should be
 void * fun( void * args);
2. Single arguments (like above) can be directly passed after casting to "void *"; where as
multiple arguments can be passed in the form of structures.
3. To print the LWP (thread) ID, I am using the method
(long int) syscall (224) ==> This is the system call for "gettid" system call.
And note, pthread_self() returns "pthread_t"; and not the "thread id".
4. And for both the threads, I can give the same function (t1_fun).
5. Format of pthread_create: pthread_create(&t1, <thread_attr>, <function>, <function_args>)

Program Output:
$> gcc sample.c -lpthread
$> ls
a.out sample.c

$> ./a.out
Threads execution over.
[5164:thread1] : gvar = 1.
[5164:thread1] : gvar = 2.

Why such output?:
The reason is, we have "just triggered the threads and did not bother about them after that".
To make sure, we get all the printfs of both the threads, the main thread should "wait" for those two threads to "join" it.
Hence, if you provide "pthread_join" call for both the threads, you main thread will not exit UNTIL both the threads completes its execution.

...
...

    int rc2 = pthread_create(&t2, NULL, t1_fun, (void *) msg2);
    if( rc2 ) { printf("Error for thread 1.\n"); perror("pthread_create"); }
    
    /* If the 2nd arg is NOT NULL, the return value of t1 and t2 will be stored there.
     * This pthread_join will suspend the main thread until the t1/t2 completes, terminates or by calling
     * pthread_exit. 
     */
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Threads execution over.\n");
    return 0;
}


Program Output:
$> gcc sample.c -lpthread
$> ls
a.out sample.c

$> ./a.out

[5164:thread1] : gvar = 1.
[5164:thread1] : gvar = 2.

[5164:thread1] : gvar = 3.
[5164:thread1] : gvar = 4.

[5164:thread1] : gvar = 5.
[5164:thread1] : gvar = 6.

...
...

[5165:thread2] : gvar = 19.
[5165:thread2] : gvar = 20.

Tuesday, December 4, 2012

Intel VS PPC Architecture















PPC Uses Load/Store architecture.
Means, it uses only the load/store instructions to access memory. For the operations, the operands are stored in the registers itself.
Eg. the operands of AND operation are stored in the registers itself.

Intel x86 uses "Register memory architecture" where in one of the operands of AND operation may be in memory and the other one in register.

RISC Vs CISC

Intel : "Hardware to bear more responsibility; software life should be easy"
Apple: "Software to bear major role; hardware life should be easy"

This derived to two widely deployed CPU Architectures

1. Intel formed CISC (Complex Instruction Set Computing)
2. Apple formed RISC (Reduced Instruction Set Computing)
[NOTE that the word "reduced" does not mean the "minimal number of instructions". It means easy and light-weight instructions that takes "minimal" or "reduced" time to execute]

Pros and cons of these two follows:

CPU Performance of a program is calculated as :-

                             seconds        Instructions            cycles           seconds     
  CPU Speed = ------------  =  ----------------  X  ---------------- X -----------
                            program           program         Instruction          cycle

                                              = A X B X C

CISC: Attempts to minimize the number of instructions per program; sacrificing number of cycles per instruction (CISC tries to reduce A at the cost of increased B)

RISC: Attempts to minimize the number of cycles per instruction; sacrificing number of instructions per program (RISC tries to reduce B at the cost of increased A)


Reference: http://www.engineersgarage.com/articles/risc-and-cisc-architecture

Important Kernel Structures and its sizes

NOTE: The following are found in 32-bit Ubuntu system (Kernel version:  3.2.0-29-generic-pae)

1. Process kernel stack = 8KB in 32 bit; 16KB in 64 bit
Basically, 2 pages as process kernel stack

2. task_struct - structure for each process
sizeof(task_struct) = 3236 bytes = 3.1 KB

3. thread_struct - structure of thread info in each task_struct
sizeof(thread_struct) = 128 bytes

/proc File system usages

NOTE: The following are found in 32-bit Ubuntu system (Kernel version:  3.2.0-29-generic-pae)

1. [PID_MAX]:  default - 32768 (short int)
"/proc/sys/kernel/pid_max"

2. [Loadable Modules]
"/proc/modules" Used by "lsmod"

3. [printk levels]: default 7
echo "/proc/sys/kernel/printk"