Review-3
Academic Institution
Review of HDF5 operational readiness:
NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF5 for adoption as a community standard. This is the second review of HDF5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to spg-rfc-007@lists.nasa.gov.
- Describe in a sentence or two your overall experience related to HDF5 (e.g., science data provider, science data systems, software tools developer, and science data user, etc).
I am a science data user with a focus of using HDF-EOS. We also provide derived data product. HDF5 is relatively new to me because of limited data source.
- Do you currently use or plan to use HDF5 in a production setting? What types of applications do you use with HDF5? Is HDF5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)
Not currently using HDF5. As more data source will become more readily available in HDF5-based, we do plan to migrate to HDF5-based, for both input and output. The CSISS (Center for Spatial Information Science and Systems) at George Mason University is mainly working on processing data and delivering product through open standards, mainly OpenGeospatial specifications and ISO standards. Most of our product are open-source or open-source-based. Our WCS (Web Coverage Service) was the first one to support HDF4-based HDF-EOS. The new version will completely support HDF5-based HDF-EOS. For the current test, HDF5 works pretty well with our existing and developing data systems.
- Why do you choose to use HDF5 over other data formats for your applications?
HDF5 support nicely hierarchical data which supplements WCS 1.0, an early version for delivering raster data through Web service, which supports only one layer deliver. With this file hierarchical structure and open-source support to handle the data file, our WCS server can easily deliver the data in more readily useable data for earth scientist. Standard HDF5 format and compression also reduce the load to be transported through the Web.
- Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ..) on HDF-5 data files? If you have, please provide a brief description of your experience.
No report so far.
- Does the performance of HDF5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF5 files?)
Yes. The best for HDF5 is the support of large file. This is very handy when there is a need to deal with large earth science data.
- What operational challenges or limitations does HDF5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)
One of the problems with HDF is that the source code is developed under C. The configuration to make it work on different systems is quite complex. This problem is especially prominent when the delivering services are Java-based Web server. The JNI call to C library raises the difficult to debug programs and to handle fault. The server may be stopped due to unforeseen failure and memory leak.
- What benefits does HDF5 present? Do the benefits of HDF5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)
HDF5 is the choice. Many new developments help in meeting the developments of Web-based service. The standard format surly is a great benefit. The XML output of its structure is very handy.
- How much data do/will you provide or archive in HDF5? (number of distinct data products or data sets, total data volume, number of files.)
We are currently hosting more than 8 Terabyte data. The amount is growing. As required, all data may be deliverable in HDF5-based HDF-EOS format in near future.
- How many users do you have or expect to have for data in HDF5, and what is your expected user community?
The user community is mainly in the sector of education. Students and research scientists will access the data served from the server through standard Web services. Users are worldwide. More than a hundred users retrieve data each month. This is expecting to increase as the service expands.