Data Compression for Climate Data

  • Michael Kuhn Universität Hamburg, Hamburg
  • Julian Kunkel Deutsches Klimarechenzentrum, Hamburg
  • Thomas Ludwig Deutsches Klimarechenzentrum, Hamburg

Abstract

The different rates of increase for computational power and storage capabilities of supercomputers turn data storage into a technical and economical problem. Because storage capabilities are lagging behind, investments and operational costs for storage systems have increased to keep up with the supercomputers' I/O requirements. One promising approach is to reduce the amount of data that is stored. In this paper, we take a look at the impact of compression on performance and costs of high performance systems. To this end, we analyze the applicability of compression on all layers of the I/O stack, that is, main memory, network and storage. Based on the Mistral system of the German Climate Computing Center (Deutsches Klimarechenzentrum, DKRZ), we illustrate potential performance improvements and cost savings. Making use of compression on a large scale can decrease investments and operational costs by 50% without negatively impacting performance. Additionally, we present ongoing work for supporting enhanced adaptive compression in the parallel distributed file system Lustre and application-specific compression.

Author Biographies

Michael Kuhn, Universität Hamburg, Hamburg
Michael Kuhn is a postdoctoral researcher in the Scientific Computing group at Universität Hamburg, where he also received his doctoral degree in computer science in 2015. He conducts research in the area of high performance I/O with a special focus on I/O interfaces and data reduction techniques. Other interests of his include file systems and high performance computing in general.
Julian Kunkel, Deutsches Klimarechenzentrum, Hamburg
Since 2006, Julian Kunkel has been working on tracing environments and tools for client and server-side I/O. In 2013, he defended his thesis about the monitoring and simulation of parallel programs on application and system level. Dr. Kunkel is member in several program committees.
Thomas Ludwig, Deutsches Klimarechenzentrum, Hamburg
Thomas Ludwig received his doctoral degree and the German habilitation degree at the Technische Universität München, where he conducted research on HPC from 1988 to 2001. From 2001 to 2009 he had a chair for parallel computing at the Universität Heidelberg. 2009 he moved to Hamburg. He is now director of the German Climate Computing Center (DKRZ) and professor at the Universität Hamburg. His research activity is in the fields of high volume data storage, energy efficiency, and performance analysis concepts and tools for parallel systems. At DKRZ Prof. Ludwig takes the responsibility for accomplishing its mission: to provide high performance computing platforms, sophisticated and high capacity data management, and superior service for premium climate science.

References

1. CMIP5 – Overview. http://cmip-pcmdi.llnl.gov/cmip5/. Last accessed: 2016-04

2. Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand Singh. Gzip on a chip: High performance lossless data compression on fpgas using opencl. In Proceedings of the International Workshop on OpenCL 2013 & 2014, IWOCL ’14, pages 4:1–4:9, New York, NY, USA, 2014. ACM

3. Kenneth C. Barr and Krste Asanović. Energy-aware lossless data compression. ACM Trans. Comput. Syst., 24(3):250–291, August 2006

4. L. Benini, D. Bruni, A. Macii, and E. Macii. Hardware-assisted data compression for energy minimization in systems with embedded processors. In Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings, pages 449–453, 2002

5. Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, and Mark Shellenbaum. The Zettabyte File System. 2003

6. Konstantinos Chasapis, Manuel Dolz, Michael Kuhn, and Thomas Ludwig. Evaluating Power-Performace Benefits of Data Compression in HPC Storage Servers. In Steffen Fries and Petre Dini, editors, IARIA Conference, pages 29–34. IARIA XPS Press, 04 2014

7. Yanpei Chen, Archana Ganapathi, and Randy H. Katz. To compress or not to compress - compute vs. io tradeoffs for mapreduce energy efficiency. In Proceedings of the First ACM SIGCOMM Workshop on Green Networking, Green Networking ’10, pages 23–28, New York, NY, USA, 2010. ACM

8. D. J. Craft. A fast hardware data compression algorithm and some algorithmic extensions. IBM J. Res. Dev., 42(6):733–745, November 1998

9. Sébastien Denvil. The ESGF’s organization with a detailed discussion of the CMIP6 project and upcoming challenges. talk, https://rd-alliance.org/sites/default/files/attachment/RDA-ESGF-2015.pdf, 2015

10. Peter Deutsch. DEFLATE Compressed Data Format Specification version 1.3. RFC 1951, 1996

11. A. Dzhagaryan, A. Milenkovic, and M. Burtscher. Energy efficiency of lossless data compression on a mobile device: An experimental evaluation. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 126–127, April 2013

12. ECMA. Standard ECMA-321: Streaming Lossless Data Compression Algorithm – (SLDC). http://www.ecma-international.org/publications/standards/Ecma-321.htm, June 2011

13. Florian Ehmke. Adaptive Compression for the Zettabyte File System. Master’s thesis, Universität Hamburg, 02 2015

14. Rosa Filgueira, Malcolm Atkinson, Alberto Nuñez, and Javier Fernández. Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par 2012, Rhodes Island, Greece, August 27-31, 2012. Proceedings, chapter An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications, pages 729–740. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012

15. Rosa Filgueira, Malcolm Atkinson, Yusuke Tanimura, and Isao Kojima. Euro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, chapter Applying Selectively Parallel I/O Compression to Parallel Storage Systems, pages 282–293. Springer International Publishing, Cham, 2014

16. Rosa Filgueira, David E. Singh, Alejandro Calderón, and Jesús Carretero. CoMPI: Enhancing MPI Based Applications Performance and Scalability Using Run-Time Compression. In Proceedings of the 16th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 207–218, Berlin, Heidelberg, 2009. Springer-Verlag

17. Rosa Filgueira, David E. Singh, Jesús Carretero, Alejandro Calderón, and Félix García. Adaptive-Compi: Enhancing Mpi-Based Applications - Performance and Scalability by Using Adaptive Compression. Int. J. High Perform. Comput. Appl., 25(1):93–114, February 2011

18. Nathanel Hübbe and Julian Kunkel. Reducing the HPC-Datastorage Footprint with MAFISC – Multidimensional Adaptive Filtering Improved Scientific data Compression. Computer Science - Research and Development, pages 231–239, 05 2013

19. Intel High Performance Data Division. Lustre – The High Performance File System, 2013

20. Joachim Metz. Shrinking the gap: carving NTFS-compressed files. https://articles.forensicfocus.com/2011/07/18/shrinking-the-gap-carving-ntfs-compressed-files/, 07 2011. Last accessed: 2016-04

21. J. Kane and Q. Yang. Compression speed enhancements to lzo for multi-core systems. In Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on, pages 108–115, Oct 2012

22. Meaza Taye Kebede. Performance Comparison of Btrfs and Ext4 Filesystems. Master’s thesis, University of Oslo, 2012

23. Kush K. Kella and Aasia Khanum. APCFS: Autonomous and Parallel Compressed File System. International Journal of Parallel Programming, 39(4):522–532, 2010

24. Peter Kogge, Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, Sherman Karp, Stephen Keckler, Dean Klein, Robert Lucas, Mark Richards, Al Scarpelli, Steven Scott, Allan Snavely, Thomas Sterling, R. Stanley Williams, and Katherine Yelick. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, DARPA report. http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf, Sep 2008

25. Rachita Kothiyal, Vasily Tarasov, Priya Sehgal, and Erez Zadok. Energy and Performance Evaluation of Lossless File Data Compression on Server Systems. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR ’09, pages 4:1–4:12, New York, NY, USA, 2009. ACM

26. Julian Kunkel, Michael Kuhn, and Thomas Ludwig. Exascale Storage Systems – An Analytical Study of Expenses. Supercomputing Frontiers and Innovations, pages 116–134, 06 2014

27. Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Seung-Hoe Ku, Choong-Seock Chang, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F Samatova. ISABELA for effective in situ compression of scientific data. Concurrency and Computation: Practice and Experience, 25(4):524–540, 2013

28. P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics, 12(5):1245–1250, Sept 2006

29. Peter Lindstrom. Fixed-Rate Compressed Floating-Point Arrays. Visualization and Computer Graphics, IEEE Transactions on, 20(12):2674–2683, 2014

30. Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Michael Kuhn, Julian Kunkel, and Toni Cortes. A Study on Data Deduplication in HPC Storage Systems. In Proceedings of the ACM/IEEE Conference on High Performance Computing (SC), 11 2012

31. Dutch T. Meyer and William J. Bolosky. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST’11, pages 1–1, Berkeley, CA, USA, 2011. USENIX Association

32. Nitin Gupta. zram: Compressed RAM based block devices. https://www.kernel.org/doc/Documentation/blockdev/zram.txt, 11 2015. Last accessed: 2016-04

33. Ritesh A Patel, Yao Zhang, Jason Mak, Andrew Davidson, and John D Owens. Parallel lossless data compression on the GPU. IEEE, 2012

34. P. Ratanaworabhan, Jian Ke, and M. Burtscher. Fast lossless compression of scientific floating-point data. In Data Compression Conference (DCC’06), pages 133–142, March 2006

35. Christopher M. Sadler and Margaret Martonosi. Data compression algorithms for energy-constrained devices in delay tolerant networks. In Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, SenSys ’06, pages 265–278, New York, NY, USA, 2006. ACM

36. The Green500 Editors. Green500. http://www.green500.org/, 2016. Last accessed: 2016-04

37. The TOP500 Editors. TOP500. http://www.top500.org/, 06 2014. Last accessed: 2016-04

38. Ning Wang, Jian-Wen Bao, Jin-Luen Lee, Fanthune Moeng, and Cliff Matsumoto. Wavelet Compression Technique for High-Resolution Global Model Data on an Icosahedral Grid. Journal of Atmospheric and Oceanic Technology, 32(9):1650–1667, 2015

39. Benjamin Welton, Dries Kimpe, Jason Cope, Christina M. Patrick, Kamil Iskra, and Robert Ross. Improving I/O Forwarding Throughput with Data Compression. In Proceedings of the 2011 IEEE International Conference on Cluster Computing, CLUSTER ’11, pages 438–445, Washington, DC, USA, 2011. IEEE Computer Society

40. R.N. Williams. An extremely fast Ziv-Lempel data compression algorithm. In Data Compression Conference, 1991. DCC ’91., pages 362–371, Apr 1991

41. Rong Xu, Zhiyuan Li, Cheng Wang, and Peifeng Ni. Impact of data compression on energy consumption of wireless-networked handheld devices. In Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on, pages 302–311, May 2003

42. Yann Collet. lz4. http://www.lz4.org/, 04 2016. Last accessed: 2016-04

43. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. Information Theory, IEEE Transactions on, 23(3):337–343, May 1977
Published
2016-06-20