Network data meets research validity: Can software help?
Speaker: Corinna Coupette
Abstract: Given the ubiquity and importance of network data, the intricate process of transforming real-world phenomena into graphs — i.e., data modeling — has received remarkably little attention. This is, at least in part, due to the prevalent division of labor between domain scientists and method developers: While the former tend to mold their data to suit their favorite network library, the latter like to grab prepackaged datasets off the digital shelves to conveniently validate their novel techniques. In practice, the resulting choices provoke incompatibilities between network data and network methods, threatening the validity of domain-scientific and methodological network research. In this talk, I will discuss how some widely used software inadvertently aggravates the problem, and I will share some ideas on how we could fix it.
References: The content of this talk has developed alongside the following papers:
- Corinna Coupette, Jeremy Wayland, Emily Simons, and Bastian Rieck. No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets. International Conference on Machine Learning (ICML), to appear. https://arxiv.org/abs/2502.02379
- Corinna Coupette, Jilles Vreeken, and Bastian Rieck. All the World’s a (Hyper)Graph: A Data Drama. Digital Scholarship in the Humanities, 74–96. https://arxiv.org/abs/2206.08225 (https://hyperbard.net/)