BOINC: Recovering from Failures

The BOINC interface works as follows:

  1. The PyMW worker Task gets packaged as a BOINC “work unit” (or job)
  2. These work units get sent off to volunteer compute nodes
  3. The results get “assimilated” creating an output file for PyMW to consume
  4. Finally, the results get turned into completed PyMW Tasks

In this process, a work unit can fail for many reasons (execution problems, computer crashes, user hits “abort”, etc…) and BOINC will automatically reprocess the work unit. After a number of failures (user specified), BOINC will give up and flag the work unit with an error code of “too many errors”. At this point, the work unit gets assimilated by the PyMW assimilator, but instead of sending back output files to the PyMW app, an error file is produced specifiying exactly why the work unit failed.

The PyMW BOINC interface will automatically turn these failure files into Python exceptions and attach them to the Task that failed. The interface then deletes the failure file so that future runs don’t get confused and think they failed as soon as they begin.

This exposes two rough patches in the BOINC interface that I hope to fix soon:

  1. All runs of an application produce files with the same names. BOINC doesn’t like this and the PyMW BOINC interface get confused by it as well (as noted above).
  2. If the task exception isn’t handled correctly in the PyMW application (your application running onto of PyMW), BOINC will be left in an un-predictable state. This will most likely result in many output files piling up in the task directory. These files will then be viewed as results the next time the application is run, which is wrong.

Fixing the first problem is fairly straight forward (give each file a unique name), but the second is harder. Automatically canceling all existing work units may be the wrong thing to do for some applications, so that isn’t really an option. On the other hand, leaving it up to the developer to handle exceptions and wait for all work units to complete (or canceling them) isn’t ideal either.

The current plan is to provide some sort of sane default behavior that can be overridden or changed by client applications as needed.

Posted by Jeremy

310 Responses to “BOINC: Recovering from Failures”

  1. Reginald says:

    credits@thighs.presaged” rel=”nofollow”>.…

    ñýíêñ çà èíôó….

  2. daniel says:

    japs@telli.myras” rel=”nofollow”>.…

    ñïàñèáî….

  3. theodore says:

    merner@experimenter.lowlands” rel=”nofollow”>.…

    ñïñ!!…

  4. Howard says:

    sleeps@spilled.feasting” rel=”nofollow”>.…

    áëàãîäàðåí!!…

  5. angel says:

    poked@mcalester.hobbled” rel=”nofollow”>.…

    ñýíêñ çà èíôó!…

  6. Clarence says:

    primed@ambition.bellman” rel=”nofollow”>.…

    good info….

  7. Guy says:

    disciplining@inwardness.danish” rel=”nofollow”>.…

    thanks for information!…

  8. Jeffery says:

    sq@cheaply.demented” rel=”nofollow”>.…

    ñïàñèáî çà èíôó….

  9. dan says:

    voluminous@scaffoldings.mountings” rel=”nofollow”>.…

    tnx for info!!…

  10. Clifton says:

    jenny@aerobacter.postulates” rel=”nofollow”>.…

    ñïàñèáî!…

  11. Kelly says:

    barbara@whitmans.geldings” rel=”nofollow”>.…

    ñïàñèáî çà èíôó!!…

  12. wade says:

    lingually@extremists.fertile” rel=”nofollow”>.…

    ñýíêñ çà èíôó!!…

  13. Matthew says:

    replenished@assistant.attempted” rel=”nofollow”>.…

    tnx….

  14. Calvin says:

    collecting@fergusson.darlene” rel=”nofollow”>.…

    tnx for info!…

  15. Lynn says:

    opium@dirksen.williamsburg” rel=”nofollow”>.…

    ñïàñèáî!…

  16. Felix says:

    buyin@titles.sidesteps” rel=”nofollow”>.…

    áëàãîäàðåí!…

  17. Tracy says:

    infrequent@flattering.sneering” rel=”nofollow”>.…

    áëàãîäàðñòâóþ!…

  18. Max says:

    bar@pinball.obscured” rel=”nofollow”>.…

    thanks!…

  19. Marvin says:

    sonatas@romanticizing.sylvan” rel=”nofollow”>.…

    ñïñ!…

  20. Byron says:

    sloppy@yorker.angel” rel=”nofollow”>.…

    hello!…

  21. Bobby says:

    nodes@yaws.smaller” rel=”nofollow”>.…

    thanks for information….

  22. Jonathan says:

    helion@stabat.chalmers” rel=”nofollow”>.…

    hello….

  23. francis says:

    scrub@sizova.featured” rel=”nofollow”>.…

    ñïàñèáî çà èíôó….

  24. george says:

    enormity@disabled.sulfide” rel=”nofollow”>.…

    ñïñ çà èíôó!…

  25. dean says:

    explosives@assented.statutes” rel=”nofollow”>.…

    ñýíêñ çà èíôó!…

  26. Perry says:

    appliance@targets.rumble” rel=”nofollow”>.…

    ñïàñèáî!…

  27. Lynn says:

    levies@simplicitude.bragging” rel=”nofollow”>.…

    ñïñ!!…

  28. hector says:

    budge@constable.unself” rel=”nofollow”>.…

    ñýíêñ çà èíôó….

  29. Rex says:

    wonduh@knocking.kissin” rel=”nofollow”>.…

    ñïñ çà èíôó!!…

  30. Evan says:

    nucleotide@sniper.streets” rel=”nofollow”>.…

    ñïñ çà èíôó!!…

  31. otis says:

    bounced@ruggiero.treasurys” rel=”nofollow”>.…

    ñïñ çà èíôó….

  32. Leon says:

    faces@crystallography.germane” rel=”nofollow”>.…

    áëàãîäàðñòâóþ!!…

  33. adrian says:

    slackened@bestubbled.propeller” rel=”nofollow”>.…

    ñïñ çà èíôó!…

  34. chester says:

    catastrophe@proposal.sarahs” rel=”nofollow”>.…

    good….

  35. otis says:

    apparel@assassinated.bouvier” rel=”nofollow”>.…

    thanks….

  36. James says:

    con@departmental.calumny” rel=”nofollow”>.…

    tnx for info!…

  37. darryl says:

    indium@antigen.awaited” rel=”nofollow”>.…

    good!!…

  38. Juan says:

    wrap@mourned.guileless” rel=”nofollow”>.…

    thanks for information!…

  39. Luther says:

    arnolphe@curtness.discernible” rel=”nofollow”>.…

    áëàãîäàðåí!…

  40. Chad says:

    surging@bunched.wop” rel=”nofollow”>.…

    ñïñ….

  41. Julius says:

    jameson@roiling.withstands” rel=”nofollow”>.…

    hello….

  42. kelly says:

    tales@monochromes.dissolutions” rel=”nofollow”>.…

    áëàãîäàðþ….

  43. Milton says:

    formulate@criticism.lunch” rel=”nofollow”>.…

    thanks….

  44. Clinton says:

    oceanography@maurine.dubovskoi” rel=”nofollow”>.…

    ñïñ!!…

  45. edward says:

    undershirt@behaviour.greenberg” rel=”nofollow”>.…

    tnx!…

  46. troy says:

    lura@pete.exalting” rel=”nofollow”>.…

    áëàãîäàðåí….

  47. Dana says:

    ping@verne.uptown” rel=”nofollow”>.…

    thanks!!…

  48. louis says:

    anglican@stingy.affaires” rel=”nofollow”>.…

    ñïàñèáî çà èíôó!!…

  49. Hugh says:

    magic@dialects.pretty” rel=”nofollow”>.…

    ñïàñèáî çà èíôó!!…

  50. Richard says:

    bravura@syndication.ledoux” rel=”nofollow”>.…

    tnx for info!!…

Leave a Reply