Towards management of energy consumption in HPC systems with fault tolerance

Morán, Marina

Título:
Towards management of energy consumption in HPC systems with fault tolerance
Autor:
Morán, Marina
Colaboradores:
Balladini, Javier AldoRexachs, Dolores IsabelRucci, Enzo
Temas:
COMPUTACIÓN DE ALTO RENDIMIENTO - HPC
En:
IEEE Congreso Bienal de Argentina (ARGENCON) (2020 : Resistencia, Chaco)
Resumen:
High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure.
URL/DOI:
https://doi.org/10.1109/ARGENCON49523.2020.9505498
Palabras clave:
consumo de energía
Medio:
Soporte electrónico
Tipo de documento:
Artículo
Descripción física:
1 archivo (506,8 kB)
Idioma:
Inglés
Publicación:
, 2020

Puede solicitar más fácilmente el ejemplar con: A1161

Ver estantes

La edición contiene los siguientes documentos electrónicos para descargar:

En este momento no hay ningún ejemplar disponible.


Disponibilidad Actual Para Préstamo: 0 Disponibilidad Actual Para Sala de Lectura: 0 Cantidad Actual de Reservas: 0 Cantidad Actual de Préstamos: 0

Valoración


Comentarios (0)