R&D subventions

Modeling, Discovering, Exploring and Analyzing Environmental Data Lakes

Reference:
PID2022-141027NB-C21
Number of subprojects:
2
Financial institution:
Ministerio de Ciencia e Innovación. Convocatoria 2022 Generación de Conocimiento
Budget amount:
103,875 Euros
Duration:
Sep 1, 2023 to Aug 31, 2026
Description:
The EarthDL-UDC project aims to optimize the management of information stored in geospatial data lakes by integrating compact data structures, model-driven development methods, and variability management strategies. This approach aims to create more efficient, userfriendly, and cost-effective information systems. The subproject of the UDC Database Laboratory (LBD-UDC) will specifically focus on defining software engineering techniques based on model-driven development and variability management of product families to minimize implementation work and streamline the process of building information systems. Additionally, the subproject will aim to define compact data structures that enable efficient storage of geospatial information and enhance the performance of information analysis techniques.
Model-driven development (MDD) and variability management are techniques that can be used to reduce effort in building information systems. MDD uses models to create descriptions of the system that can be reused and modified easily. Variability management manages variations in functionality of applications by identifying commonalities/differences and creating reusable abstractions. Using these techniques can lead to more efficient, flexible and maintainable visualizations with better user satisfaction. The first objective of the project is to use MDD and variability management techniques to create visualization applications for geospatial data lakes. By using MDD, the project aims to create a set of models that can be used to generate the necessary code for the  visualization apps, which can help to ensure consistency and reduce errors in the development process. by using Variability Management techniques, the project aims to create visualization apps that can handle different variations of the visualization, including different types of graphics such as heat maps, choropleth maps, 3D maps, histograms, line charts, scatter plots, and others. This will allow users to better understand and explore the geospatial data stored in the data lake by providing them with a variety of ways to visualize the data and identify patterns and trends.
Compact data structures are specialized data structures that are designed to minimize the amount of memory and disk space needed to store data, while still allowing for efficient access and manipulation of that data. Even though many compact data structures have been defined, they have not been widely used for analysis in geospatial data lakes. This may be due to a lack of understanding of the specific advantages and limitations of these data structures, as well as the challenges involved in integrating them into existing data lake architectures. Additionally, geospatial data analysis often requires complex spatial queries and operations, which may not be well-suited to certain compact data structures. The second objective of the project is to research and evaluate the use of different types of compact data structures for efficient storage and analysis of geospatial data in a data lake. We plan to compare the performance of these compact data structures to traditional data structures to determine their effectiveness, and if the results are promising, integrate the chosen compact data structure into the data lake for analysis and visualization.