Friday, November 2, 2007

MidHPC - TODO LIST

1 - DNS entry for Primary Global Broker -> no problem for the first version
2 - Global Broker -> change the configuration file format to xml
3 - Local Broker -> change the configuration file format to xml
4 - Startapp MUST:
4.1 - connect to the Primary Global Broker
4.2 - Primary Global Broker receives the startapp request and evaluates which is the best Global Broker to send it (based on the total load of their schedulers divided by the number of schedulers -- consequently this is an average)
4.3 - the chosen Global Broker send the request to its idlest Local Broker (A Global Broker can have many Local Brokers under its domain)
4.4 - the chosen Local Broker sends the startapp request to the idlest Scheduler (based on the getloadavg information)
5 - In the Scheduler:
5.1 - when the Scheduler is started the user provides a file name parameter which contains information about the computer capacity - if this file is filled out with such information, the scheduler does not need to calculate CPU, memory and disk capacities, otherwise it does (this file is in XML format)
5.2 - when the Scheduler receives the startapp command it must:
5.2.1 - it must launch a new shell and export LD_PRELOAD to the library which is responsible to translate threads into processes
5.2.2 - inside the new shell, the Scheduler starts the new application using the command "nice 19 application", instantly, generate the checkpoints for all processes and kill them all.
5.2.2.1 - After the migration of such processes, the scheduler starts each one and gets the PID to launch the process monitor which saves information into the Postgresql database
5.2.3 - Scheduler executes the IBL algorithm passing the following parameters: username, executable filename and executable parameters
5.2.4 - A program has to calculate the averages and standard deviations (based on the table process) and store them into the table application. First of all the program sums up all the mips consumed by each process (total user time + total system time). Consider the application being evaluated has 4 processes, then the program will have 4 Mips values (for instance 100, 102, 99 and 98). The program calculates the average and standard deviation using such 4 values. For other parameters such as the number of bytes sending over the network, the program will sum up that variable and, in this case, it will have 4 values (for instance 201, 200, 199, 189). Each of these 4 values are divided by the Mips consumption of its process (for instance 201/100, 200/102, 199/99 and 189/98), after this division, the mean and the standard deviation is calculated. --> Notice the means and standard deviations are for the processes of the application.
5.2.4.1 - The scheduler will generate configuration files for each parameter to be found. And the IBL will be executed using each one of the configuration files. The IBL runs over the table application
5.2.5 - the IBL is executed for each configuration file, the Scheduler gets the following information to send to RouteGA (average process mips - the same for all process of the application being started; average communication cost - the same for all process of the application being started), all grid computers load (all processes running in the Grid must be listed in the alloc.file -- this means that even the processes of other applications, different than this one being started, must be listed in here), process migration cost (based on the checkpoint size), computer capacities in Mips and the network latency.
5.2.5.1 - The load of other processes is obtained from the PostgreSQL database from the table application (Mips consumption, Migration cost and Communication cost) -> for the first version you can make a program (maybe in Java to make it easier for you) which connects to the database and helps to obtain such information
5.2.5.2 - periodically (to be configured in the scheduler config file -- a timeout parameter in microseconds), the average program is executed to update the means and standard deviations in the table application (it must run with the lowest priority)
5.2.5.3 - When each process finishes, the average program is executed to calculate the means and standard deviations in the table application
5.2.6 - RouteGA is executed and returns where each process must be migrated to
5.2.7 - Scheduler communicates to other modules to order additional checkpoints to be made (possibly other processes of the grid will migrate) and also start the migrations (of the new application processes and any other process which is running and according to RouteGA must be migrated)
5.2.8 - After migrating them, the Scheduler orders the resume of all transfered processes, gets theirs PIDs and carries on monitoring them (or start monitoring them if they are part of the new application)
6 - Shell applications
6.1 - Startapp
6.2 - Kill
6.3 - Top
6.4 - Ps
7 - The first Basic Tests
7.1 - What does it happen when the Global or Local Broker fails?
7.1.1 - what if the Primary Global Broker fails?
7.2 - What does it happen when the Scheduler fails?
7.3 - Test with multiple process applications using the DSM system
7.4 - Tests with multithreaded applications using the DSM system
8 - For future version the scheduler can be called to optimize the loads (in this first version we only run the RouteGA when a new application is started).

Labels:

9 Comments:

At November 2, 2007 at 5:26 PM , Blogger Rodrigo Mello said...

Augusto is responsible for the following items for the next Friday (09/11):

- 2
- 3
- 4
- 5.1
- 5.2.1
- 5.2.2
- 5.2.4

They have to be completely finished.

 
At November 2, 2007 at 6:49 PM , Blogger Augusto Andrade said...

Two more things to do:

- Change the all file extensions
(from cpp to cc);

- Change the filename "sched.h" in the scheduler module.

 
At November 2, 2007 at 9:05 PM , Blogger Rodrigo Mello said...

The file sched.h cannot have this name because Linux makes confusion with the sched.h file from the kernel

 
At November 3, 2007 at 10:36 AM , Blogger Rodrigo Mello said...

As I sent you by MSN, I already solved the items 2 and 3 for you... you just have to integrate my code to yours.

 
At November 3, 2007 at 10:56 AM , Blogger Rodrigo Mello said...

Having the XML problem solved, I expect to have the items 2, 3 and 4 for Sunday night (Oct 4th 2007).

 
At November 3, 2007 at 1:09 PM , Blogger Augusto Andrade said...

items 2 and 3: Done!
Starting item 4 in 45min (lunch break)!

 
At November 3, 2007 at 2:42 PM , Blogger Rodrigo Mello said...

Augusto, please, use the same XML way in the startapp configuration file.

 
At November 3, 2007 at 8:41 PM , Blogger Augusto Andrade said...

Startapp now uses XML config file.
Item 4 OK!

 
At November 4, 2007 at 12:03 AM , Blogger Rodrigo Mello said...

Dear Augusto,

Good news... I solved the items:

5.2.1 and 5.2.2

Tomorrow I will send you.

Work on the other items.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home