R&D subventions
Development of compact storage for analysis-ready geospatial data platforms
Reference:
PID2024-155657OB-C22
Financial institution:
Ministerio de Ciencia, Innovación e Universidades. Convocatoria 2024 Generación de Conocimiento. Cofinanciado UE
Budget amount:
104,700
Euros
Duration:
Sep 1, 2025 to Aug 31, 2028
Main researchers:
Involved researchers:
Description:
In the last decade, the proliferation of different types of sensors has generated a huge amount of valuable data that needs to be stored. Geospatial data platforms, in general, and Analisys Ready Data (ARD) platforms, in particular, can be receiving data that are constantly generated, such as atmospheric data, satellite observation data, measurements of different parameters from sensors in plantations, etc. The storage of these data poses important problems, where the easy solution is to delete valuable past data to make room for new data. The alternative is to resort to compression. Classic compression has a very serious defect: if we have a compressed data file, it must be completely decompressed before being processed. This is a very expensive process in terms of time and computation, which clearly goes against the principles of an ARD platform. Recently, compression systems have been developed that allow any piece of data in a compressed file to be decompressed individually, without having to decompress the rest. These compression systems are called Compact Data Structures (CDS). These structures allow data to be kept permanently compressed, as they also make it possible to access and query the data in their compact form. Furthermore, CDS enable a new type of computing known as In-Memory Data management (IMD). In this type of computing, data is always in main memory, thus avoiding the transfer of data between memory and disk. This speeds up data access by several orders of magnitude and therefore data processing time is significantly reduced. This subproject aims to provide a large geospatial data platform, and more specifically, an ARD platform, with a new data storage scheme following the CDS paradigm. Thus, we will develop advances in the compressed storage of raster data and time series of raster data. In addition, this coordinated project will explore the use of Discrete Global Grid Systems (DGGS) as base elements for the representation of the earth's surface, which implies going one step beyond the traditional raster, and for which there are still no compact data structures. Maintaining the largest amount of historical data is key to being able to make analyses and predictions, and with the development of CDS for these data we will provide geospatial data platforms with a significantly greater storage capacity. At the same time, thanks to an in-memory computing strategy, heavy raster data analyses will also be streamlined. In summary, the main goals of this subproject are to develop CDS and query algorithms on them to efficiently store, compress and index large collections of geospatial data, allowing their efficient processing and querying in a compressed form, on different scenarios: large collections of raster geodata, including DGGS geodata, and temporal series of geodata.





