Future Technologies Group
Berkeley Lab Computing Sciences

Group Members

Current Projects

Past Projects

Related Sites

    

Checkpoint/Restart Publications

Items are listed chronologically. Since BLCR is an ongoing research project, information in later publications may supersede earlier ones and some aspects of BLCR may not be accurately reflected by any of these publications. In all cases the documentation that accompanies a given source distribution of BLCR should be considered more authoritative than any document here. If in doubt, ask.

Papers and Technical Reports

  • Duell, J., Hargrove, P., and Roman, E. Requirements for Linux Checkpoint/Restart. Berkeley Lab Technical Report (publication LBNL-49659), May 2002. (PDF)

  • Duell, J., Hargrove, P., and Roman., E. The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart. Berkeley Lab Technical Report (publication LBNL-54941), December 2002. (PDF)

  • Roman, E. A Survey of Checkpoint/Restart Implementations. Berkeley Lab Technical Report (publication LBNL-54942), July 2002. (PDF)

  • Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine, Jason Duell, Paul Hargrove, and Eric Roman. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In LACSI Symposium, October 2003. (publication LBNL-53808 Proc.) (PDF)

  • Paul H. Hargrove and Jason C. Duell Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters In Proceedings of SciDAC 2006: June 2006. (publication LBNL-60520) (PDF)

Invited Presentations

  • Duell, J., Hargrove, P., and Roman, E. An Overview of Berkeley Lab's Linux Checkpoint/Restart Presented January 2004 at LLNL. (PPT)

  • Paul H. Hargrove and Jason C. Duell Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters Poster Given at SciDAC 2006: June 2006. (PDF)

  • Paul Hargrove, Eric Roman and Jason Duell Job Preemption with BLCR Urgent Computing Workshop: April 25-6, 2007, Argonne, IL. (PDF)
  • Paul Hargrove, Jason Duell and Eric Roman An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters Presentation to UC Berkeley CS Dept ParLab group: March 2008. (PDF)
  • Paul Hargrove, Eric Roman and Jason Duell Advanced Checkpoint Fault Tolerance Solutions for HPC Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing: June 9-10, 2008, Bangkok, Thailand and June 12, 2008, Phuket, Thailand. (PDF)

All Files

File: Size: Posted: Comment:
LBNL-49659.pdf 112384 15 Nov 2003 Requirements for Linux Checkpoint/Restart
blcr.pdf 293677 15 Nov 2003 The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart
checkpointSurvey-020724b.pdf 73571 15 Nov 2003 A Survey of Checkpoint/Restart Implementations
lacsi-2003.pdf 125493 19 Nov 2003 The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing
BLCR-HP.ppt 622080 13 Jan 2004 An Overview of Berkeley Lab's Linux Checkpoint/Restart
LBNL-60520.pdf 41803 12 Feb 2007 Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters (paper)
SciDAC06Poster.pdf 542830 12 Feb 2007 Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters (poster)
Urgent07_BLCR.pdf 61256 5 May 2008 Job Preemption with BLCR
BLCR_ParLab_Mar_2008.pdf 1949087 5 May 2008 An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters
WTTC2008-BKK.pdf 248439 18 Jun 2008 Advanced Checkpoint Fault Tolerance Solutions for HPC