Work Package 1: Federated Storage with a multi-site common directory structure
- Task Area 1: develop a multi-site dCache administrative instance
-
Within this task area, the partners will develop and pen test cryptographically secure data channels for the administrative data exchange between dCache instances. This is important to extend dCache systems beyond institute borders - a scenario not yet foreseen by the dCache developers. This will require techniques like ad-hoc VPNs, as satellite instances need to be deployable at any opportunistic or semi-permanent site. Examples could be overlay VPN networks in userspace, that have been demonstrated by projects in the USA.
DESY will develop and maintain a central administrative instance, which provides the necessary, uniform user identification and authorization using established mechanisms like X.509 certificates or OAuth2 tokens (in WLCG flavor). DESY will also work on adapting the staging workflows already in use to fit the needs of the remote caches. DESY and Wuppertal will develop, test and deploy a scheme to create a uniform name space spanning over institute boundaries.
Once the central services have been developed, first prototypes of satellite site setups will be created. Mainz will provide an overlay for NHR sites to integrate experiment data and will use a two-tier caching approach to minimize the impact of opportunistic jobs on other applications. Experiment data will be distributed over both the backend parallel file system and the internal NVMe SSDs of the compute nodes. Therefore, Mainz will extend the ad-hoc parallel file system GekkoFS, co-developed in FIDIUM 1, and optimize it for (mostly) read-only experiment data. GekkoFS ensures that data does not have to be replicated multiple times across worker nodes because it provides parallel access to data. In addition, Mainz will develop mechanisms to minimize the impact of remote IO accesses so that data can be stored on and accessed from nodes not running FIDIUM applications.
The unavailability of parts of the cached data will be overcome by resorting to backend storage. Application output will be stored locally in worker nodes and staged back to the storage backend when the application finishes.
-
-
Task Area 2: federated storage on sites with semi-permanent data
-
This task area will create the necessary tools and infrastructure to deploy those software products, which were developed in Task Area 1, for semi-permanent sites (DESY, Wuppertal). The use case here are satellite sites that provide compute and storage resources but lack the experience or person-power to operate more than just the storage building blocks. The local storage will be integrated into the main data hub, that manages the namespace for the satellite data, access management and integration into the wider experimental frameworks. Data are transferred directly to and only stored at the satellite site.
The integration into data management tools like RUCIO will be performed first at one or more prototype sites with experienced personal on site. Candidates are Aachen, Wuppertal or Göttingen with the central hub at DESY.
Adoption to other sites will require proper packaging. This will lead to a multitude of packages required due to limitations or prerequisites at different locations. Solutions will range from simple RPM packages to complex Helm Charts for Kubernetes.
Possible customers are the Tier-3 clusters at the universities or similarly small sites offering storage resources to the scientific communities.
Independently of the packaging, this will lead to a centrally managed solution that stream-lines the local cluster administration and reduces the need for person power.
-
- Task Area 3: integration of opportunistic sites
-
In contrast to Task Area 2 this task area covers use cases where the satellite site offers opportunistic storage only. This means that only a local volatile cache is available to reduce unnecessary data transfer to HPC or cloud storage sites. This Task Area will also develop and provide methods to transfer data on-demand to these satellites from a central hub either by the scientific application itself or in advance by the analysis pipelines defined by the communities.
For the scientific communities these caches look and feel like sites with very smalldisks in front of classical tape, that has relatively low latency. It would also allow integrating tape resources from sites that cannot provide an integrated storage w.r.t. the scientific communities.
-
- Task Areas 4: monitoring and self-healing
-
The focus of task area 4 is on the reliability of the operation by providing reliable monitoring systems of the components that will be developed within task areas 1, 2 and 3. This includes in particular providing monitoring for facilitating the work of the satellite sites (GÖ) and by developing self-healing storage using AI techniques for the central hub site and, to a lesser degree, for the satellite sites.
A federated storage system requires complex technical solutions and dedicated monitoring due to the involvement of multiple parties with varying focuses. Central operations will largely rely on existing solutions, with extensions for satellite cache pools. However, operators of satellite cache pools need tailored monitoring for their local systems, integrating data from both local pools and the central dCache site.
Performance monitoring evaluates satellite cache efficiency by tracking network usage, cache hit rates, and file access frequency. This aids operators in optimizing cache usage and addressing delays. After development, it will be implemented during TA2 and TA3 rollouts and adjusted as needed.
Availability monitoring provides a quick overview of connections between central and satellite sites over vulnerable wide area networks. It ensures satellite pools are registered, connected, and central dCache doors are accessible. After development, it will be implemented during TA2 and TA3 rollouts and adjusted as needed.
Self-healing storage using artificial intelligence is crucial for the distributed federated structure. An initial pilot study2 evaluated dCache adaptivity, enabling emote administration based on MAPE-K.
Future work focuses on enhancing self-healing capacity of dCache and introducing self-managing dCache nodes. Through constant monitoring, runtime job (re-)scheduling will be triggeredto ensure availability through data migration or duplication. Automated monitoring and interventions will leverage gametheoretic methods, and welfare-based approaches. This will be developed and tested on specially configured installation and later applied to the production installations and taken into operations.
-