Review-4
NASA Data System
Review of HDF5 operational readiness:
NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF5 for adoption as a community standard. This is the second review of HDF5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to ese–rfc–007@spg.gsfc.nasa.gov.
- Describe in a sentence or two your overall experience related to HDF5 (e.g., science data provider, science data systems, software tools developer, and science data user, etc).
The TES instrument (on the Aura platform) uses HDF5 and HDF/EOS5 for our Earth Science Data Records. We are the data providers for those data records. We provide these records at Level 1B, Level 2.
- Do you currently use or plan to use HDF5 in a production setting? What types of applications do you use with HDF5? Is HDF5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)
We use HDF5 in a productions setting. Our applications should be considered scientific programming in atmospheric chemistry. Coding of the applications is in C++. HDF5 is certainly applicable to our application.
- Why do you choose to use HDF5 over other data formats for your applications?
We developed our Earth Science Data Records under the self-imposed Aura Standard; it was decided early in meetings establishing the standard to use HDF5, even though it was a relatively new release & incompatible with HDF4. It was believed that use of HDF5 would be more wide-spread, and that support would be more available.
- Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ..) on HDF-5 data files? If you have, please provide a brief description of your experience.
We use IDL extensively; no insurmountable problems have been encountered. We do provide an IDL "reader" software to our users; that may hide issues that users may have otherwise encountered.
- Does the performance of HDF5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF5 files?)
Performance is adequate we have had no complaints by our users other than volume of our products.
- What operational challenges or limitations does HDF5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)
HDF5/HDS-EOS5 certainly does have a learning curve. We had to devote a full-time person to develop our HDF5 applications as an expert. Operationally, one challenge is the lack of tools for HDF5. The HDF5 difference script h5diff will halt if the number of records between two files is not the same, rather than providing a smart diff tool. Another is the lack of subsetting tools; HDF4 subsetting tools have been developed by various users but are not widely available for HDF5. We are developing some of our own tools.
- What benefits does HDF5 present? Do the benefits of HDF5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)
I believe the benefits outweigh the disadvantages, although there certainly are other formats out there. The incompatibility between HDF4 and HDF5 is an issue; a user that wants to compare data between the two formats has to develop two sets of access routines (ie. between the Aura and Aqua platforms).
- How much data do/will you provide or archive in HDF5? (number of distinct data products or data sets, total data volume, number of files.)
We are a fairly large volume data producer.
L1B - 4 files per orbit, 16 orbits every two days, roughly 2 gigs
L2 - 8 files every two days, roughly 2 gigs
- How many users do you have or expect to have for data in HDF5, and what is your expected user community?
Fairly small community of scientific users in atmospheric chemistry, probably 50-100 individuals