EUXDAT will deploy a state of the art, big data and hybrid HPC/Cloud, data exploitation platform on top of the existing partners’ infrastructures. This EUXDATA e-Infrastructure will enable users with different profiles (agriculture scientist and practices, planers, decision makers) to fully benefit from the underlying high processing capacities to explore new methods, build new innovative services and to perform predictions and simulations with extremely large and heterogeneous datasets.
The EUXDAT objectives include:
- Objective 1: Develop a set of tools for managing extremely large datasets, taking into account storage requirements, different formats and managing policies for reducing data movement latency and protecting the information.
- Objective 1.1: Analyse the way to use and store the different kinds of data (data streams, array databases, hyperspectral images, etc) by pilot applications, identifying the main bottlenecks, required metadata and ways to keep trace of data provenance;
- Objective 1.2: Provide a specific tool for managing the storage and movement of large datasets from their source to target e-Infrastructures based on HPC and/or Cloud resources, able to connect a great variety of data sources with a plugin-like architecture, and exposing a common API for object-oriented data, databases and data streams;
- Objective 1.3: Define adaptable data management policies, integrated in the management tools, taking into account constraints and other non-functional requirements (data availability, security, locality, complexity).
- Objective 2: Adapt and evolve, as required, data processing tools already available adding new features in such a way that they can be provisioned in a Large Data Analytics-as-a-Service way. These main changes will be focused on the capabilities to exploit HPC capabilities with the new data management tools, the improvement of users’ portal and the adaptation of resources management.
- Objective 2.1: We propose to perform a joint research related to the usage of a hybrid solution, in which part of the tasks goes to the Cloud and the other part goes to HPC, as a way to optimize the execution;
- Objective 2.2: EUXDAT aims at improving the usage of the infrastructure with a complete portal for stakeholders which will not only facilitate development of applications and the request for improvements, but also keep trace of the way they are executed;
- Objective 2.3: Adapt and expose data processing tools in such a way they can improve their parallel execution for scaling up appropriately as datasets increase their size. Such tools are the SparkinData engine and a parallel CEP engine, together with array databases.
- Objective 3: Carry out service activities based on an integrated e-infrastructure, where three data-intensive pilots from the Sustainable Development domain will validate the proposed solutions.
- Objective 3.1: EUXDAT will provide access to resources of different data sources (including e-Infrastructures such as Copernicus, metereological and climate data, hyperspectral data from UAVs, field sensos and other open data sources) to scientists for implementing their pilots, as a way to validate the proposed solutions;
- Objective 3.2: The different pilots proposed will exploit the maximum potential of the proposed e-Infrastructure, covering several topics related to land monitoring and sustainable management, energy efficiency in farms and 3D farming for soil protection;
- Objective 3.3: A public instance of the e-Infrastructure will be publicly available during some time in order to allow stakeholders to experiment with the provided features.
- Objective 4: Carry out an important networking activity, especially in the domain of Sustainable Development, in order to motivate the adoption of the proposed tools among a wider European community.
- Objective 4.1: Build a network of cooperating organisations, scientists, industries and citizens, who will cooperate on JRAs, through P4A and CoO;
- Objective 4.2: To support long time sustainability of the platform encouraging a critical mass of stakeholders to utilise the EUXDAT e-infrastructure through several events and the open instance, in collaboration with computation providers such as PRACE.
The overall EUXDAT architecture is depicted in the image below.