Review-9
NASA
Review of HDF5 operational readiness:
NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF5 for adoption as a community standard. This is the second review of HDF5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to spg-rfc-007@lists.nasa.gov.
1. Describe in a sentence or two your overall experience related to HDF5 (e.g., science data provider, science data systems, software tools developer, and science data user, etc).
I manage the Orbiting Carbon Observatory (OCO) Ground Data System (GDS) software development team that generates data products in HDF5.
2. Do you currently use or plan to use HDF5 in a production setting? What types of applications do you use with HDF5? Is HDF5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)
All of the major distributable products that the OCO mission generates will be in HDF5. Our developers implemented a standard interface into the OCO GDS code framework that reads and writes all products in HDF5. The standard interface handles HDF, as well as some additional product content standards that the OCO GDS chose to adopt.
3. Why do you choose to use HDF5 over other data formats for your applications?
The original mission proposal required the distribution of data products through the Data Active Archive Centers (DAACs). The team assumed that use of HDF would ease implementation of archive and distribution at the DAACs. Use of HDF would also be convenient for members of the Science Community who read products generated one or several of the EOS missions. Since HDF5 is not backward compatible, has a more general form than HDF4 and is more likely to be supported by NCSA for a longer time period, we decided to incorporate HDF5 as our standard for major products.
4. Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ) on HDF-5 data files? If you have, please provide a brief description of your experience.
The same framework interface for HDF includes a library for IDL. We’ve constructed IDL tools that use the IDL readers for our HDF5 products. The library we constructed works well.
5. Does the performance of HDF5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF5 files?)
OCO is still in the code development phase. At this time, we can not attribute any performance problems to the use of HDF5. This situation may change as we further develop our code.
6. What operational challenges or limitations does HDF5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)
We've encountered no serious problems. Of course, most of our developers worked with HDF4 or HDF-EOS. Even though the constructs are different, much of the overall concepts were similar to adapt with little fuss.
7. What benefits does HDF5 present? Do the benefits of HDF5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)
The ability to build composite Datatypes in HDF5 makes the entire library far more flexible. The ability to assign any Datasets user defined Datatypes and Dataspaces adds even more to that flexibility. Of course, flexibility implies that developers need to make choices, which inevitably can add some complication to the design. All in all, when one considers the target audience of NASA products, I think this flexibility is desirable. Any flavor of HDF5 that EOS adopts ought not restrict this flexibility.
8. How much data do/will you provide or archive in HDF5? (number of distinct data products or data sets, total data volume, number of files.)
Below are the major products that the project will generate. The Level 1A, Level 1B and Level 2 Product volumes are based on solid design specifications. We have not yet designed our Level 3 and Level 4 Products. Thus, we don’t have any handle on their ultimate volumes. Since the products will contain representative values over 4 degree by 5 degree regions over the Earth’s surface, each product will contain 3240 representative values for each element stored. If these products store 100 data elements, each product would have a volume of less than 500 Kbytes.
Estimates assume one version of each data granule. Ultimately, we know we’ll have more.
Number of files Total Volume
OCO Level 1A Product | 10, 600 | 4.05 TBytes
OCO Level 1B Product | 10, 600 | 7.35 TBytes
OCO Level 2 Product | 10, 600 | 182 GBytes
OCO Level 3 Product | 45 | 22.5 MBytes
OCO Level 4 Product | 45 | 22.5 Mbytes
The mission will generate other products as well, and some of those will be in HDF. The mission does not intend to retain most of those products or distribute them.
9. How many users do you have or expect to have for data in HDF5, and what is your expected user community?
Considering the interest in greenhouse gases and how they may contribute to the phenomenon of world wide climate change, our user community could be quite large. We figure that the science community that regularly requests our Level 2 Product will number in the hundreds. Our Level 3 and Level 4 Products could be considerably more popular.