raw2zarr

thumbnail

FAIR open radar data

Motivation

Radars are vital in meteorology, detecting severe weather early and enabling timely warnings, saving lives, and reducing property damage. Beyond forecasting, radar data supports various applications, including for statistical analysis and climatology, relying on time series analysis. Radar scans generate large, separated files, leading to vast accumulations over decades, posing storage challenges akin to big data management. A radar volume scan, comprising data collected through multiple cone-like sweeps at various elevation angles, often exceeds several megabytes in size every 5 to 10 minutes that usually stored as individual files. Consequently, national weather radar networks accumulate vast amounts of data with non-interconnected files over extended periods, spanning several decades. This presents significant challenges in data storage and availability, particularly when treating radar data as a time-series dataset, which parallels the complexities of managing big data.

Traditionally, radar data storage involves proprietary formats that demand extensive input-output (I/O) operations, leading to prolonged computation times and high hardware requirements. In response, our study introduces a novel d ata model designed to address these challenges. Leveraging the Climate and Forecast Conventions (CF) format-based FM301 hierarchical tree structure, endorsed by the World Meteorological Organization (WMO), and Analysis-Ready Cloud-Optimized (ARCO; Abernathey et al. 2023) formats, we developed an open data model to arrange, manage, and store radar data in cloud-storage buckets efficiently. This approach uses a suite of Python libraries, including Xarray (Xarray-Datatree), Xradar, Wradlib, and Zarr, to implement a hierarchical tree-like data model. This model is designed to align with the new open data paradigm, emphasizing the FAIR principles (Findable, Accessible, Interoperable, Reusable).

Authors

Alfonso Ladino-Rincon, Max Grover

Collaborators

[!WARNING] This project is currently in high development mode. Features may change frequently, and some parts of the library may be incomplete or subject to change. Please proceed with caution.

Running on Your Own Machine

If you are interested in running this material locally on your computer, you will need to follow this workflow:

Clone the “raw2zarr” repository

 git clone https://github.com/aladinor/raw2zarr.git

Move into the raw2zarr directory
```
 cd raw2zarr
```
Create and activate your conda environment from the environment.yml file
```
 conda env create -f environment.yml
 conda activate raw2zarr
```
Move into the notebooks directory and start up Jupyterlab
```
cd notebooks/
jupyter lab
```

References

R. P. Abernathey et al., “Cloud-Native Repositories for Big Scientific Data,” in Computing in Science & Engineering, vol. 23, no. 2, pp. 26-35, 1 March-April 2021, doi: 10.1109/MCSE.2021.3059437.

This site is open source. Improve this page.