has posted a coupon.

has posted a coupon.

has posted a coupon.

has posted a coupon.

has posted a coupon.


News Taffy


Most RecentMost PopularTop ContributorsGalleriesEvents   

News Article
Dr. Jiang Publishes His First Book
Jun 10, 2010

Dr. Hai Jiang, Computer Science, recently published his first book, "Computation Checkpointing and Migration," (Nova Publishers, 2010), based upon his PhD dissertation and co-written with his dissertation advisor, Dr. Vipin Chaudhary, University of Buffalo-State University of New York, and Dr. John Paul N. Walters. This book addresses the issue of fault-tolerance via checkpointing.

The authors discuss existing strategies to provide rollback recovery to applications - both via MPI at the user level and through application-level techniques. Checkpointing itself has been studied extensively in the literature, including the authors' own works. Here they give a general overview of checkpointing and how it is implemented. More importantly, they describe strategies to improve the performance of checkpointing, particularly in the case of distributed systems.

Computational clusters have long provided a mechanism for the acceleration of high performance computing (HPC) applications. With today's supercomputers now exceeding the petaflop scale, however, they are also exhibiting an increase in heterogeneity. This heterogeneity spans a range of technologies, from multiple operating systems to hardware accelerators and novel architectures.

Because of the exceptional acceleration some of these heterogeneous architectures provide, they are being embraced as viable tools for HPC applications. Given the scale of today's supercomputers, it is clear that scientists must consider the use of fault-tolerance in their applications.

This is particularly true as computational clusters with hundreds and thousands of processors become ubiquitous in large-scale scientific computing, leading to lower mean-times-to-failure. This forces the systems to effectively deal with the possibility of arbitrary and unexpected node failure.

Your rating: None Average: 5 (1 vote)


[+] add comment