Thursday, November 29, 2007

Progress

Item 5.2.6 is finished. I made a modification in the routeGA in order check if it is necessary to change the allocation (at first, the new allocation need to have a fitness 1% higher than the original).

After some initial problems (about compilation and tests) I started the integration. During this part, I'm using three programs: a multiplication table (lots of processing, with the help of a modified delay), a matrix multiplication (read from and write to files) and the producer/consumer (with mutex).

When these three application are running and generanting checkpoints. I'll start the modifications add the information in the database.

Monday, November 26, 2007

Next steps

According to the TODO from November 2, 2007 (and its addition in November 5, 2007). These are the things that need to be done:

5.2: Monitors and Thread Splitter. Were done by Rodrigo Mello, I'm studying it in order to perform the integratation.
5.2.5.3: It only starts the average program when the application finishes (needs the application running).
5.2.6: Run the scheduling algorithm (currently routeGA). I'm finishing the convertion that makes the routGA reads from a XML file and writes the output to another XML file.
5.2.7 and 5.2.8: Deals with the migration, depends on 5.2 and 5.2.6
6.2 and 6.3: Will be implemented after the applications (and monitors) are running. May depend on information that will be generated by the application.
7: Issues

The startapp (shell application, 6.1) and the average program (5.2.4) are implemented in C++. Since they do not present errors, they will be converted to Java later.

Friday, November 23, 2007

Tasks (in order)

1) Finalize the SchedInfo.xml generation
2) Finalize the alloc.file convertion (from SchedInfo.xml)
3) Implement the Kill
4) Implement the PS
5) Study the RouteGA

The alloc.file from RouteGA

After spending a day and half working in the conversion of the SchedInfo.xml to the alloc.file, I decided to create directly the alloc.file (it is easier and faster).

However, filling the alloc.file up have some issues that I'm working on.
1) The values of migration cost (mcost) and communication (comm): I have no ideas on how to compute them;
2) The computer in which a process is allocated (alloc) is based on the comp_id value in the table Process, that refers to the comp_key value in the table Computer, so far, no problems. However, the alloc.file doesn't have a place to store this value. For example: there are 24 computers alive in the environment and a process is running in the computer of comp_key 45. The routeGA alloc.file will not be able to access this computer information (cmips and rtt information) and will produce an OutOfBounds error (the cmips vector will have an upper bound of 23, since there are 24 computers).

As soon as the alloc.file is generated, the routeGA can be executed and with the information of the routeGA the migration can be carried out.
The monitors need to be executed to fill up the process database and the average program be executed.

Wednesday, November 21, 2007

Updates

IBL is tested and functional, its data is being stored in the database.
The SchedInfo.xml is generated.
In the database, the RTT table is updated periodically. In the table Computer, the "comp_alive" is updated when the Scheduler stops running.

Current Task:
Converting the SchedInfo.xml into alloc.file (needed by RouteGA)

Monday, November 19, 2007

Updates

Major functions of the Scheduler are implemented. They are:
1) Benchmark the computer in order to obtain its characteristcs;
2) Inclusion on the LBroker and in the database
3) Send load updates to the LBroker (load is calculated with loadavg linux function)

The integration with IBL is almost ready. IBL Configuration files are being generated with all the information needed (including the Train_Start and the Train_End) and the IBL program is executed, after that, the information generated by the IBL is stored in the database. However, it is not 100% tested.

Next steps are:
- Fully test the IBL.
- Create the SchedInfo XML file.
- Convert the SchedInfo file to the alloc.file (used by the RouteGA)

Sunday, November 18, 2007

Converting to Java

Due to the problems encountered while programming in C++ (lack of experience) the MidHPC is being reimplemented in Java.

In this, there are three major tasks:
1) Global Broker
2) Local Broker
3) Scheduler
3.1) IBL
3.2) RouteGA

These tasks are described in another post.
Tasks 1 (Global Broker) and 2 (Local Broker) are finished and task 3 (Scheduler) is being finalized along with the integration with the IBL (3.1) and RouteGA (3.2).

Things that need to be done are:
1) Convert the XML file generated by the Scheduler to the "alloc.file" needed by the RouteGA;
2) Reimplement the program that calculates the averages (it is working in c++);
3) Execute applications and start the monitors to gather information about them.

Saturday, November 10, 2007

New ToDo

All functions implemented so far must be tested. Also, some issues must be corrected before continuing. These are some that I've spotted so far:

1) All Modules
1.1) Parse the XML configuration file inside a function (this is needed due to the fact that the libxml2 functions have undesired effects in the message buffers);
1.2) Strings of fixed size (most of it of BUFSIZE, 200);
1.3) Clean up the code (remove some useless functions);
2) Global Broker
2.1) Conflict messages regarding the load update (from the Local Broker and Global Brokers);
3) Local Broker
3.1) Some hard-coded info (usually regarding the "username");
3.2) Function to check which Schedulers are alive ;
4) Scheduler
4.1) Constructor has too many parameters (when it could have only one);
8) Database
8.1) Add the Global Broker information in the table Computer, this is used in order to compute the distance between two schedulers;

After this, add the documentation using doxygen (in c++ parts), javadoc. Add Makefile, Readme, Changelog and organize the directories structures.

Labels:

Friday, November 9, 2007

IBL 1.0.0

Dear all,

IBL is finally fixed and everything is running ;)
Now you just have to follow the README file to test it.

Labels:

Thursday, November 8, 2007

A problem in the Euclidean Distance

There is a small problem in the Euclidean Distance, this distance is used in the IBL to compute the distance between the points in the Knowledge base and the Query Point.

Correction - 5.2.5

The information you provide to the RouteGA is wrong in the post "Detailing 5.2.5".

Calculated by the program (means and stdevs)

mips_avg float,
bytes_in_avg float,
bytes_out_avg float,
blocks_avg float,
pagefaults_avg float,
pagereclaims_avg float,
mips_stdv float,
bytes_in_stdv float,
bytes_out_stdv float
blocks_stdv float,
pagefaults_stdv float,
pagereclaims_stdv float,

IBL fields

mips_avg_ibl float,
mips_stdv_ibl float,
bytes_in_avg_ibl float,
bytes_in_stdv_ibl float,
bytes_out_avg_ibl float,
bytes_out_stdv_ibl float,
blocks_avg_ibl float,
blocks_stdv_ibl float,
pagefaults_avg_ibl float,
pagefaults_stdv_ibl float,
pagereclaims_avg_ibl float,
pagereclaims_stdv_ibl float,

For the new application being started, you must get the IBL values and put into the XML file for RouteGA. For all the other applications running in the Grid you have to get the field spare_mips = (mips_avg_ibl - mips_avg) and writes those spare_mips in the XML file (for RouteGA). This is just necessary for the mips and not for any other field (it is not necessary to mips_stdv either).

Labels:

Small trick in the use of atof

To use the atof funcion correctly, the <stdlib.h> MUST be defined before the <stdio.h>.

Don't ask me why this is like this...

Benchmark results

Just before lunch I started a benchmark in my notebook (Atlhon XP 2600+, 512Mb RAM and 512Mb Swap. HD 4200 RPM, 40Gb). These are the results that the scheduler found:

Mips: 2476.70
Main Memory: 465.73 Mb
Swap Memory: 486.30 Mb
Hd Read: 50.49 Mb/s
Hd Write: 40.25 Mb/s
Main Memory Performance: 0.0022 x -0.0266
Swap Memory Performance: 0.000543185742572 x² - 0.422618210315704 x + 83.193984985351562

Additional info: 313 Mb of free Main memory and 169Mb of Swap free.
These were obtained with the following parameters:
Memory Usage: 90%
Increment Step: 5%

It only took almost an hour to complete the tests... :)

You can check the memory regressions below:

Wednesday, November 7, 2007

Tasks Completes

This is the current status.
Item 2: Ok
Item 3: Ok
Item 4: Ok
Item 9: Ok
Item 5.1: Finishing bugs from the use of Memo (will be finished until 12h00, GMT-2)
Item 5.2.3: Ok
Item 5.2.4: Ok (may need correction due to modifications in 5.1, if so, will be finished by 12h30, GMT-2)
Item 5.2.5: Starting November 7th, afternoon

Detailing item 5.2.5

1 - Scheduler passes the username, program file (including path) and program parameters to IBL and executes IBL for each one of the means and standard deviations
2 - Scheduler grabs all the information from IBL executions and stores them in a XML file using the following format: <table-column-name></table-column-name> (for instance <bytes_in_avg>22.2</bytes_in_avg>) DON'T FORGET TO INCLUDE MEANS AND STANDARD DEVIATIONS
3 - Scheduler gets means and standard deviations from the table application for each process running in the environment (you can add a column in the table application with the application status: Running or Finished) and makes a select in the table process to count how many processes compose each running application. The Scheduler also gets where (computer_id from table computer) each process is running.
4 - Scheduler carries on generating the XML file using the information obtained from the tables application and process. The scheduler will generate something like (the same for the new application presented in the item 2 in this post):

<application gid="1">
<number_of_processes>4</number_of_processes>
<scheduled>
<process>0</process> ### notice the id inside this XML tag is the computer_id
<process>1</process> ### which defines where a certain process is running
<process>8</process>
<process>15</process>
</scheduled>
<bytes_in_avg>22.2</bytes_in_avg>
...
</application>

PS: The new application that has just started will also have the tag <scheduled> and <process> inside it, although the tags <process> will have the name computer_id, which is the id of the Scheduler which started the application.

5 - After generating all the means and standard deviations in the XML file (for each application) the Scheduler have to get the capacity of each computer in the environment. The table computer could have all the information about MIPS capacity, HD read-write throughput and Memory latency coeficients (2 for the main memory usage and 3 for the swap memory usage).
Getting such information from the table computer, the Scheduler must fill the XML file in the following way:

<computers>
<computer>
<mips>1000</mips>
<hd_read_throughput>10.2</hd_read_throughput>
<hd_write_throughput>8.7</hd_write_throughput>
<main_memory_coeficient_a>10.2</main_memory_coeficient_a>
<main_memory_coeficient_b>10.2</main_memory_coeficient_b>
<swap_memory_coeficient_a>10.2</swap_memory_coeficient_a>
<swap_memory_coeficient_b>10.2</swap_memory_coeficient_b>
<swap_memory_coeficient_c>10.2</swap_memory_coeficient_c>
</computer>
... # other computers in here
</computers>

6 - Now you make a simple program to convert the XML to the alloc.file format used by RouteGA
7 - Scheduler launches the conversion program and, then, calls RouteGA
8 - The RouteGA output is used by the Scheduler

Labels:

Monday, November 5, 2007

New Task - Network latency

There is a new task:

9- Measure Network latency.
Mandatory: Global Brokers should exchange messages in order to calculate the network latency between them. This may be done by using a udp request or by ping:
'ping -c $1 141.109.37.173 | tail -1 | cut -d = -f 2 | tr -d ms | tr -d " " | cut -d \/ -f 2'

Wish list:
Add bandwidth measures between Local Broker and the Global Broker and between the Local Broker and the Scheduler.
This way, it is possible to extract measures that represents all computers in the Grid environment.

Labels: ,

Friday, November 2, 2007

MidHPC - TODO LIST

1 - DNS entry for Primary Global Broker -> no problem for the first version
2 - Global Broker -> change the configuration file format to xml
3 - Local Broker -> change the configuration file format to xml
4 - Startapp MUST:
4.1 - connect to the Primary Global Broker
4.2 - Primary Global Broker receives the startapp request and evaluates which is the best Global Broker to send it (based on the total load of their schedulers divided by the number of schedulers -- consequently this is an average)
4.3 - the chosen Global Broker send the request to its idlest Local Broker (A Global Broker can have many Local Brokers under its domain)
4.4 - the chosen Local Broker sends the startapp request to the idlest Scheduler (based on the getloadavg information)
5 - In the Scheduler:
5.1 - when the Scheduler is started the user provides a file name parameter which contains information about the computer capacity - if this file is filled out with such information, the scheduler does not need to calculate CPU, memory and disk capacities, otherwise it does (this file is in XML format)
5.2 - when the Scheduler receives the startapp command it must:
5.2.1 - it must launch a new shell and export LD_PRELOAD to the library which is responsible to translate threads into processes
5.2.2 - inside the new shell, the Scheduler starts the new application using the command "nice 19 application", instantly, generate the checkpoints for all processes and kill them all.
5.2.2.1 - After the migration of such processes, the scheduler starts each one and gets the PID to launch the process monitor which saves information into the Postgresql database
5.2.3 - Scheduler executes the IBL algorithm passing the following parameters: username, executable filename and executable parameters
5.2.4 - A program has to calculate the averages and standard deviations (based on the table process) and store them into the table application. First of all the program sums up all the mips consumed by each process (total user time + total system time). Consider the application being evaluated has 4 processes, then the program will have 4 Mips values (for instance 100, 102, 99 and 98). The program calculates the average and standard deviation using such 4 values. For other parameters such as the number of bytes sending over the network, the program will sum up that variable and, in this case, it will have 4 values (for instance 201, 200, 199, 189). Each of these 4 values are divided by the Mips consumption of its process (for instance 201/100, 200/102, 199/99 and 189/98), after this division, the mean and the standard deviation is calculated. --> Notice the means and standard deviations are for the processes of the application.
5.2.4.1 - The scheduler will generate configuration files for each parameter to be found. And the IBL will be executed using each one of the configuration files. The IBL runs over the table application
5.2.5 - the IBL is executed for each configuration file, the Scheduler gets the following information to send to RouteGA (average process mips - the same for all process of the application being started; average communication cost - the same for all process of the application being started), all grid computers load (all processes running in the Grid must be listed in the alloc.file -- this means that even the processes of other applications, different than this one being started, must be listed in here), process migration cost (based on the checkpoint size), computer capacities in Mips and the network latency.
5.2.5.1 - The load of other processes is obtained from the PostgreSQL database from the table application (Mips consumption, Migration cost and Communication cost) -> for the first version you can make a program (maybe in Java to make it easier for you) which connects to the database and helps to obtain such information
5.2.5.2 - periodically (to be configured in the scheduler config file -- a timeout parameter in microseconds), the average program is executed to update the means and standard deviations in the table application (it must run with the lowest priority)
5.2.5.3 - When each process finishes, the average program is executed to calculate the means and standard deviations in the table application
5.2.6 - RouteGA is executed and returns where each process must be migrated to
5.2.7 - Scheduler communicates to other modules to order additional checkpoints to be made (possibly other processes of the grid will migrate) and also start the migrations (of the new application processes and any other process which is running and according to RouteGA must be migrated)
5.2.8 - After migrating them, the Scheduler orders the resume of all transfered processes, gets theirs PIDs and carries on monitoring them (or start monitoring them if they are part of the new application)
6 - Shell applications
6.1 - Startapp
6.2 - Kill
6.3 - Top
6.4 - Ps
7 - The first Basic Tests
7.1 - What does it happen when the Global or Local Broker fails?
7.1.1 - what if the Primary Global Broker fails?
7.2 - What does it happen when the Scheduler fails?
7.3 - Test with multiple process applications using the DSM system
7.4 - Tests with multithreaded applications using the DSM system
8 - For future version the scheduler can be called to optimize the loads (in this first version we only run the RouteGA when a new application is started).

Labels:

Thursday, November 1, 2007

Updates on Project Status

Just updating the tasks:
(Tasks marked with: "+" are ready; "-" are not ready; "+/-" partially ready;"*" is currently being implemented)

Shell Module
+ CONNECT: starts a connection with the Global Broker, which tries to find a Broker in the same network of the Shell.
+ STARTAPP: sends a request to the Primary Broker in order to start the execution of a program.
- Kill: sends a message to the Broker in order to stop an execution
- PS/TOP: requests the status of applications
Global Broker Module
+/- Answer user requests (+ STARTAPP; - CONNECT, KILL, PS, TOP)
+ Manage Global Broker List (addition and remotion when asked)
+ Manage Local Broker List
+ Forward startapp requests to a Local Broker
+/- Primary Broker definition (based on dynamic DNS)
Local Broker Module
+ Manage Scheduler List
+ Forward startapp requests to an Idle Scheduler
Scheduler Module
+ Start a simple application redirecting stderr and stdout to files
+/- RouteGA integration
* IBL integration
+/- Start monitors
- Process migration
IBL (Instance-Based Learning)
+ Create database scripts
+ Configuration of the Query (based on a configuration mask file which contains some information)
* Integration with Scheduler Module
+ Prepare the data from the monitor to use with IBL
RouteGA (Load Balancing Algorithm)
+/- RouteGA execution file (alloc.file) definition (some bugs to correct)
+/- Integration with Scheduler Module