Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction

Wiem Ben RimCarolin LawrenceKiril GashteovskiMathias NiepertNaoaki Okazaki.



We introduce behavioral testing for Knowledge Graph Embedding models
Knowledge graph embedding (KGE) models are often used to encode knowledge graphs in order to predict new links inside the graph. The accuracy of these methods is typically evaluated by computing an averaged accuracy metric on a held-out test set. This approach, however, does not allow the identification of \emph{where} the models might systematically fail or succeed. To address this challenge, we propose a new evaluation framework that builds on the idea of (black-box) behavioral testing, a software engineering principle that enables users to detect system failures before deployment. With behavioral tests, we can specifically target and evaluate the behavior of KGE models on specific capabilities deemed important in the context of a particular use case. To this end, we leverage existing knowledge graph schemas to design behavioral tests for the link prediction task. With an extensive set of experiments, we perform and analyze these tests for several KGE models. Crucially, we for example find that a model ranked second to last on the original test set actually performs best when tested for a specific capability. Such insights allow users to better choose which KGE model might be most suitable for a particular task. The framework is extendable to additional behavioral tests and we hope to inspire fellow researchers to join us in collaboratively growing this framework.


title={Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction},
author={Wiem Ben Rim and Carolin Lawrence and Kiril Gashteovski and Mathias Niepert and Naoaki Okazaki},
booktitle={3rd Conference on Automated Knowledge Base Construction},