Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
SWarp threads deadlock?
05-02-2016, 07:46 (This post was last modified: 05-02-2016 07:54 by luvaul.)
Post: #1
SWarp threads deadlock?
Hi all, I'm a non-astronomer trying to get SWarp to run successfully but I keep running into this problem: SWarp runs at 100% cpu (or >100% cpu if using more than 1 thread) for a time, presumably doing it's thing, but then enters an indefinite period of 0% cpu use. I have left it in this state for many days (6) with no change, so I suspect a deadlock.

strace reveals:
>strace -p 61462 -f
Process 61462 attached with 2 threads
[pid 48375] futex(0x697d074, FUTEX_WAIT_PRIVATE, 575173, NULL <unfinished ...>
[pid 61462] futex(0x697daf4, FUTEX_WAIT_PRIVATE, 1, NULL

Which looks like this Linux system bug: https://groups.google.com/forum/#!topic/...bmpZxp6C64

However I think we have the patch installed that fixes this problem so I'm not sure this is the cause:
>sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref
- [kernel] futex: Mention key referencing differences between shared and private futexes (Larry Woodman) [1205862]
- [kernel] futex: Ensure get_futex_key_refs() always implies a barrier (Larry Woodman) [1205862]

Has anyone else had these (or similar) problems? Any help at all would be appreciated! (BTW, I'm using SWarp version 2.38.0.)

Many thanks,
Lance
Find all posts by this user
Quote this message in a reply
05-03-2016, 15:13
Post: #2
RE: SWarp threads deadlock?
Solved it.

I built a single-threaded copy of SWarp to bypass the thread deadlock only to find it died of a seg fault at another point in the code. A bit of debugging with gdb revealed the code was using a signed int to offset into the coadd buffer which, by requesting a large image size and large number of overlapping frames, was causing the offset counter to wrap into negative values so that memory addresses outside the buffer were being referenced. This of course caused the segmentation fault.

I "fixed" this by casting the offset to unsigned during the calculation of the base line addresses in coadd_line() before they're used in any further pointer arithmetic. This is not a general solution as I can imagine someone out there wanting even more frames or larger image size than me, thus triggering the problem again when the unsigned int isn't wide enough. Perhaps using unsigned long or unsigned int64_t would be better?

Anyway, I re-enabled multi-threading afterwards and found that my fix also prevented the thread deadlock as well!

In case anyone is having the same issues, the "fix" is to cast the offsets on lines 1108 to 1114 of src/coadd.c (v2.38.0) to unsigned, so it looks like this:
inpix = multibuf+(unsigned)(l*coadd_width*coadd_nomax);
inwpix = multiwbuf+(unsigned)(l*coadd_width*coadd_nomax);
inn = multinbuf+(unsigned)(l*coadd_width);
outpix = outbuf+(unsigned)(l*coadd_width);
outwpix = outwbuf+(unsigned)(l*coadd_width);
pixstack = coadd_pixstack+(unsigned)(l*coadd_nomax);
pixfstack = coadd_pixfstack+(unsigned)(l*coadd_nomax);

Cheers,
Lance
Find all posts by this user
Quote this message in a reply
05-10-2016, 13:43
Post: #3
RE: SWarp threads deadlock?
Oh, I just had a similar problem, it helped me too, thanks a lot!
_______________
I am a software developer in his own company. I'm wirting CMS for websites. Last for: ksiegowa Krakow
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)