Towards FAIR4ML - A tutorial at KI2025 | A practical approach to documentation and metadata good practices for machine learning models

Co-located with the KI Conference on 16.Sep.2025 from 16:30 to 18:00 in Postdam, Germany

Description

Data Science and Artificial Intelligence require data and software to create machine/deep learning models (abbreviated as ML models). In 2016, the Findable, Accessible, Interoperable and Reusable (FAIR) principles were introduced with data as their main target. In 2022 a variation for research software (FAIR4RS) was published. FAIRness for machine learning has been discussed at different forums, one of them being the Research Data Alliance FAIR4ML Interest Group (RDA FAIR4ML IG). FAIRness heavily relies on good practices for documentation (human-oriented FAIRness) and metadata (machine-actionable FAIRness). As part of the NFDI4DataScience, and together with RDA FAIR4ML IG, we are creating a metadata layer to describe ML models (FAIR4ML). This 90-minute tutorial will introduce ML model cards (documentation for humans) and FAIR4ML (machine-actionable metadata for machines), showcasing its use in the NFDI4DataScience MLentory –a registry for ML models, so attendees get familiar with and adopt some good practices for ML models, promoting a more reliable and impactful data science and artificial intelligence landscape.

Audience and requirements

This tutorial targets researchers across all domains working on Data Science and Artificial Intelligence fields, from beginners to experienced researchers, interested in improving practices towards documentation and metadata for ML models. As we will work on good practices to care and share your own ML models, it would be good if you have one at hand (it does not have to be public, it can be still work in progress). Please bring your laptop.

Agenda

Time	Topic
10’	Welcoming and brief introduction to NFDI4DataScience
30’	Ice-breaker: Lost in a sea of ML models and related research artifacts?
	How do you find datasets, software, ML models related to your research?
	How do you share your own research artifacts?
	What impact do you think sharing your own research artifacts have in Open Science, FAIR, reproducibility, other *ilities?
5’	Summary of the ice-breaker activity
30’	Presentation: ML model cards and FAIR4ML vocabulary
10’	Demo: MLentory, FAIR4ML in action
5’	Wrap-up

Ravinder, R., & Castro, L. J. (2025a, February 21). An introduction to Signposting—Easy navigation to FDO layers. [Tutorial] https://doi.org/10.5281/zenodo.14882785
Ravinder, R., & Castro, L. J. (2025b, February 21). An introduction to (webby) FAIR Digital Objects—A machine-actionable approach to FAIR. https://doi.org/10.5281/zenodo.14882665
Castro, L. J. (Director). (2024). Machine-actionable Software Management Plans (SMP)—Leyla Jael Castro @AIKG-SD 2024 [Video recording]. NFDI for Data Science and Artificial Intelligence, Leuphana Universität Lüneburg. https://doi.org/10.5446/69660
Kraft, A., Castro, L. J., Usbeck, R. (2024, September 03) Research Data Management in Data Science and AI — Avoiding a Replicability Crisis RDM4AI 2024. [Tutorial]. https://sites.google.com/view/rdm4ai-2024/
Castro, L. J. (2024a, February 7). Adding Bioschemas Dataset and ComputationalTool markup to GitHub pages [Tutorial]. https://doi.org/10.5281/zenodo.10629453
Castro, L. J. (2024b, September 19). Introduction and tutorial on Metadata and SMPs for FAIR research software. [Tutorial]. https://doi.org/10.5281/zenodo.13799879

Co-organizers

Rohitha Ravinder (orcid:0009-0004-4484-6283). Data Scientist working on evaluation of text based embedding techniques for document similarity assessment and an early PhD student working towards combining Knowledge Graph Embeddings and LLMs to improve recommendation systems.
Nelson Quiñones (orcid:0000-0002-5037-0443). Research Software Engineer working on metadata extraction and harmonization for ML models using schema.org-based metadata and developing the frontend interface.
Dietrich Rebholz-Schuhmann (orcid:0000-0002-1018-0370). Scientific director at ZB MED Information Centre for Life Sciences, Cologne, Germany. Prof. D. Rebholz-Schuhmann is a medical doctor and a computer scientist. His research is positioned in semantic technologies in the biomedical domain. In his previous research he has established large-scale on-the-fly biomedical text mining solutions and has contributed to the semantic normalization in the biomedical domain.
Leyla Jael Castro (main facilitator, orcid:0000-0003-3986-0510). Semantic retrieval team leader at ZB MED Information Centre for Life Sciences in Cologne, Germany. She is a Computer Scientist currently working on Life Sciences research related to semantic web, linked data, ontologies, and data science. She has participated in initiatives related to FAIRness for research software and recommendations for research open software.

Description

Audience and requirements

Agenda

Related tutorials

Co-organizers