Monday, January 7, 2008

Issues, Tests and Experiments

The MidHPC is in a pre-alpha stage and some issues are still pending, as well as some tests and experiments.
General information about MidHPC (functions, modules, interactions etc) can temporarily be found in the website http://antrax.icmc.usp.br/~augusto/midhpc (address will be changed in the future).
Other issues and tests may be added as developing continues.

Issues:
1) Update the process status in the Database
When a process ends (or it is migrated), its status in the database table Process should be changed to "0", so it is not considered as a process running.
However, the update at process end is not performed due to an issue not yet found. In this case, the status is changed by a SQL update performed in the Process Monitor.
2) Messages
Make MidHPC messages clear in order to allow an easier debug. Some action does not show messages, while others show many messages.
3) Process monitor generating unusual results
Due to problems in the PTRACE functions, some events are being called twice and the first time the event is called, the result is not correct.

Tests:
1) Applications with more threads.
Test are being perfomed with an application composed of a single thread, in order to identify problems in the execution. After that, applications with more threads will be used.
2) Application average updates.
The application averages, used in the IBL, are not being computed since there are problems in the application execution, which may lead to undisered results.

Experiments:
1) Several applications in the environment at the same time
2) Use an application that communicate through the DSM
3) Java applications (using Kaffe virtual machine)

Thursday, December 6, 2007

TODO

Correções até o momento
=======================

1. Mostrar Broker Locais no MPS (listar sob qual broker global cada local está situado)
2. Arrumar parâmetro IP do MPS em Global Brokers (está aparecendo nome da máquina ao invés do IP)
3. Se um Broker Local morre ou sai de execução seu global continua com ele na lista (corrigir)
4. Há sleeps dentro de algumas classes (tal como a de atualização de cargas) - crie parâmetros para quantificar o tempo desses sleeps dentro dos arquivos de configuração.

Outros:
=======

1. Montar uma estrutura de experimentos contendo:
- 2 globais (total 2)
- 2 locais debaixo de cada global (total 4)
- 2 schedulers debaixo de cada global (total 8)
2. Fazer experimentos utilizando os programas:
- tabuada
- mutex (modificado - retirar o while (1))
3. Documentar todo o código (todos os métodos e funções de todos os programas (mesmo aqueles que vc não fez, tal como IBL, RouteGA e Monitores) -> O que não souber pode perguntar para mim ou Evgueni
4. Gerar com DOXYGEN a documentação de tudo e disponibilizar no MidHPC @ sourceforge.net
5. Tentar resolver problema de Scheduler que é morto devido ao Cryopid (provavelmente da forma que te pedi para executar usando Bash)
6. O problema do Monitor de processos será resolvido depois *** (Deixar no TODO)
7. Montar uma estrutura com README, Changelog, Makefile, TODO e INSTALL para cada uma das ferramentas (mesmo monitores, IBL etc) e escrever tudo que puder dentro do TODO e INSTALL de cada ferramenta (INSTALL? Mas já não será instalada por padrão? sim... ms vale a pena descrever um pouco sobre o funcionamento e interações com outros programaS)
8. Fazer um README, TODO, INSTALL, Makefile geral para todas as ferramentas. Quando esse Makefile for ativado ele faz o make dos demais :)
9. Montar um HTML em inglês contendo detalhes tais como (usar esse HTML para chamar as documentações geradas com DOXYGEN - é bem legal usar figuras aqui. Uma figura de todo um ambiente com os elementos e onde o usuário possa clicar e saber mais de cada módulo. Saber mais, por exemplo do monitor de processos, ou do monitor de sistema, ou do Scheduler e por ai vai... acho que uma estrutura de rede para isso seria bem legal e em cada máquina camadas com os nomes de cada programa ou módulo que é rodado - Não se esqueça que isso deve ser a index.html que chama as outras documentações - colocar isso no MidHPC @ Sourceforge.net antes de nossa reunião):
- quem comunica com quem? que programa chama qual programa? como um programa usa as saídas de outro? (Por exemplo: como o Scheduler usa as saídas do IBL? e assim por diante)
- o fluxo de mensagens para iniciar uma aplicação (passando pelo primario global, etc... até ser executada - IBL, RouteGA, checkpoitn etc)
- quais as entradas e saídas de cada programa no MidHPC? Exemplos das entradas e saídas (exemplo dos XMLs etc)
10. Após cumprir todas essas etapas: marcar uma reunião (tendo todo o ambiente pronto - todos itens anteriores prontos) para eu checar a execução. Portanto essa reunião é de conclusão. Eventualmente demais itens de correção no código ou documentação serão levantados.

Labels:

Sunday, December 2, 2007

Status update :)

The migration is implemented and the checkpoints are being generated.
Now, only the PS and KILL (shell tools) are not implemented. The rest is OK!

But there is a problem:
The Process Monitor is generating some unusual values. According to the monitor, my laptop reading 4.294.966.784 bytes (bytes_in field), I checked the source code and I guess that there is a problem when calling the function print_usage (in common.c).

this is the head:
print_usage(int id, void *in, unsigned long size_in, void *out, unsigned long size_out, char *descr, int pkey)

and it is being called in grid_box.c like this:
print_usage(regs.ebx, (unsigned long *)regs.ecx, regs.eax, NULL, 0, "read", pkey);

Well, I guess that there is a problem in this call... or not... :)

Saturday, December 1, 2007

The Migration

The process migration is currently under construction.
It is taking a little longer than I expected because of some data that I didn't have. So, I had to make some modifications in the Database, SchedInfo (XML generator), RouteGA and, of course, in the Scheduler.

I'll make some tests later today (as soon as I can get it to work).

Thursday, November 29, 2007

Progress

Item 5.2.6 is finished. I made a modification in the routeGA in order check if it is necessary to change the allocation (at first, the new allocation need to have a fitness 1% higher than the original).

After some initial problems (about compilation and tests) I started the integration. During this part, I'm using three programs: a multiplication table (lots of processing, with the help of a modified delay), a matrix multiplication (read from and write to files) and the producer/consumer (with mutex).

When these three application are running and generanting checkpoints. I'll start the modifications add the information in the database.

Monday, November 26, 2007

Next steps

According to the TODO from November 2, 2007 (and its addition in November 5, 2007). These are the things that need to be done:

5.2: Monitors and Thread Splitter. Were done by Rodrigo Mello, I'm studying it in order to perform the integratation.
5.2.5.3: It only starts the average program when the application finishes (needs the application running).
5.2.6: Run the scheduling algorithm (currently routeGA). I'm finishing the convertion that makes the routGA reads from a XML file and writes the output to another XML file.
5.2.7 and 5.2.8: Deals with the migration, depends on 5.2 and 5.2.6
6.2 and 6.3: Will be implemented after the applications (and monitors) are running. May depend on information that will be generated by the application.
7: Issues

The startapp (shell application, 6.1) and the average program (5.2.4) are implemented in C++. Since they do not present errors, they will be converted to Java later.

Friday, November 23, 2007

Tasks (in order)

1) Finalize the SchedInfo.xml generation
2) Finalize the alloc.file convertion (from SchedInfo.xml)
3) Implement the Kill
4) Implement the PS
5) Study the RouteGA