PDA

View Full Version : Qprocess and mpi not finishing



Spifffers
19th March 2009, 16:06
I know this may be more of an mpi question but i'm wondering if this type of behaviour has been seen anywhere else. mpiexec.exe launches several child processes and monitors them. Once the child processes all finish, normal behaviour for mpiexec is to finish and exit.

If i use a QProcess to start mpiexec.exe in attached mode (QProcess->start()), mpiexec.exe will never finish and stays in my task manager evern after all of it's child processes have completed and exited properly. The QProcess state goes into Qprocess::Running and stays there. The Qprocess emits the started and stateChanged signals (and all of the read signals) but never emits the finished signal.

So essentially everything goes properly except that mpiexec.exe will not actually 'finish' and close.

If i launch mpiexec.exe in detached mode (QProcess->startDetached()) or if i run mpiexec from the command line it will finish and exit properly.

Does anyone have any ideas on this? I need to monitor when the process finishes (either by the finsihed slot or peridically calling state) so i need to use the attached start() mode.

EDIT: I should mention that starting, stopping, reading from, writing to, and everything else to do with QProcesses works for me as long as i'm not launching mpiexec.exe. Any other 'normal' executable works fine. I know this is MPI related but i was hoping someone has seen this behaviour and has an idea.

Using Windows. QT 4.3

wysota
19th March 2009, 16:26
I would say this is caused by mpiexec. Maybe it detects it is a slave to other process and waits for you to send it some signal.

lni
19th March 2009, 18:38
I think MPI's signal handling is hijacking QProcess signal handling so that QProcess loses its connection to the system. Check MPI's document about the signal handling and see if you can find something there.

Spifffers
19th March 2009, 19:53
Thanks for the opinions.

Sounds more like QT signals are somehow hijacking the mpi functionality. Since qt is still monitoring the process. IF i manually kill mpiexec, the QProcess does send the finished() signal.

mpiexec must monitor how many other things are 'connected' to it and only exit once all of them disconnect. Usually only it's own children are connected. Since the QProcesses maintains a connection, it always thinks it needs to stay alive and therefore never exits. Darn.

I'll look into MPI documentation.

lni
19th March 2009, 21:19
Thanks for the opinions.

Sounds more like QT signals are somehow hijacking the mpi functionality. Since qt is still monitoring the process. IF i manually kill mpiexec, the QProcess does send the finished() signal.

mpiexec must monitor how many other things are 'connected' to it and only exit once all of them disconnect. Usually only it's own children are connected. Since the QProcesses maintains a connection, it always thinks it needs to stay alive and therefore never exits. Darn.

I'll look into MPI documentation.

When there are two processes in the program trying to handle same signals, you won't know how they would interfere with one another.

But how can QProcess submit a job to a different node? I thought QProcess can only launch a program in the local node... If your MPI requires more CPUS or ranks than your machine has, it would fail, I think...

wysota
19th March 2009, 21:25
Sounds more like QT signals are somehow hijacking the mpi functionality.
This is very unlikely. A parent process can't intercept its child's signals. Especially that QProcess is cross-platform so it is very unlikely it uses unix signals at all.


IF i manually kill mpiexec, the QProcess does send the finished() signal.
This also suggests mpiexec just doesn't want to die on its own. If Qt hijacked its signals, it would also have hijacked the one you had sent.


Since the QProcesses maintains a connection, it always thinks it needs to stay alive and therefore never exits.
No, because if that was true, mpiexec would never die when ran from a terminal as the shell maintains a similar connection to its file descriptors as Qt does. It's the parent process that creates descriptors for its child before calling execve() or equivalent and it usually "attaches" those descriptors to itself. mpiexec must be detecting it's not the owner of its own process group and doesn't terminate. It probably sends some signal to the group leader, at least that could be the case on Unix. I don't know how processes are groupped in Windows.

vdwgeert
16th June 2009, 10:33
Currently I'm facing the same problem with mpiexec, did you found a solution for this problem then?

DjayDjay
8th January 2014, 15:23
Exactly the same problem : still no solution (6 years after !)

Working with Qt 4.8, Windows

ChrisW67
8th January 2014, 20:33
Rather than post "Me too!" Perhaps you could put some effort in to write a small, self-contained example that demonstrates the problem. The problem almost certainly has nothing to do with Qt but you never know what the process of distilling and sharing the problem might show you.

You could also capture any standard output/error output from mpiexec, which I am sure could be useful in conjunction with other MPI log.

BTW: Mar 2009 to Jan 2014 is a long way short of 6 years.

DjayDjay
9th January 2014, 09:21
Ok i was a bit frustrated ... and you are right 2009 to 2014 is a bit less than 6 years !

The solution i found is to use the command "system" rather than qprocess

My first idea was to use QProcess like this :
QProcess process;
process.setWorkingDirectory( "xxxx" );
process.setProcessChannelMode(QProcess::MergedChan nels);
process.setStandardOutputFile("logfile");
process.start(commandProcess,argumentsProcess);
process.waitForFinished(-1);

But as i mentionned, the qprocess never sends a finish signal (why, this is the question and that's maybe mpiexec problem ??)

Solution i did is (using command system) :
QString command = commandProcess+" "+argumentsProcess.join(" ")+ " > "+logProcessFilePath;
command = QDir::toNativeSeparators(command);
command.replace(":\\",":\\\\");
int returnCode = system(command.toStdString().c_str());

matpen
2nd April 2016, 18:40
I am sorry to post in a very old thread, but I figured that it might help others looking for an answer in a search engine...

I just had the very same problem, which might be due to this bug (https://www.open-mpi.org/community/lists/users/2012/08/20053.php), at least according to my research.
According to that message, OpenMPI is catching the SIGCHLD signal, which is necessary for QProcess "finished" signal to work correctly (and thus for QProcess::waitForFinished()).

The message says that this was changed in OpenMPI series 1.6 and 1.7.
As of today, the latest Ubuntu 15.10 ships with OpenMPI 1.6.5, which apparently is still affected by this (mis-)behaviour.

Upgrading to a later OpenMPI version should therefore fix the problem.
I did this by installing OpenMPI 1.10.2 from source, which is very easily done by following these instructions (https://www.open-mpi.org/faq/?category=building#easy-build), and I can confirm that the problem is solved.

Hope this helps...