Evaluation
Measuring Actual Privacy of Obfuscated Queries in Information Retrieval | Francesco Luigi De Faveri, Guglielmo Faggioli and Nicola Ferro |
Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation | Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng and Ana Peleteiro Ramallo |
Context Example Selection For LLM Generated Relevance Assessments | Jack McKechnie, Graham McDonald and Craig Macdonald |
Corpus Subsampling: Estimating the Effectiveness of Neural Retrieval Models on Large Corpora | Maik Fröbe, Andrew Parry, Harrisen Scells, Shuai Wang, Shengyao Zhuang, Guido Zuccon, Martin Potthast and Matthias Hagen |
PEIR: Modeling Performance in Neural Information Retrieval | Pooya Khandel, Andrew Yates, Ana Lucia Varbanescu, Maarten de Rijke and Andy Pimentel |
Towards Reliable Testing for Multiple Information Retrieval System Comparisons | David Otero, Javier Parapar and Alvaro Barreiro |
Domain-specific tasks and specific user groups
The Impact of Mainstream-Driven Algorithms on Recommendations For Children | Robin Ungruh, Alejandro Bellogín and Maria Soledad Pera |
Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval | Jesus Lovon-Melgarejo, Martin Mouysset, Jo Oleiwan, Jose G Moreno, Christine Damase-Michel and Lynda Tamine |
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem | Sajad Ebrahimi, Sara Salamat, Negar Arabzadeh, Mahdi Bashari and Ebrahim Bagheri |
Leveraging Query Terms for Efficient Legal Document Recommendation | André Rolim, Leandro Marinho, Edleno Moura, Marcos Domingues and Ricardo Oliveira |
Advancing Math Formula Search Using Diverse Structural and Symbolic Representations | Sumedh Vemuganti, Ayu Seiya and Nickvash Kani |
From facts and fairness to adversaries
LIBRA: Measuring Bias of Large Language Model from a Local Context | Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham and Yun Sing Koh |
Opt-in Transparent Fairness for Recommender Systems | Bjørnar Vassøy, Benjamin Kille and Helge Langseth |
Enhancing FEVER-Style Claim Fact-Checking Against Wikipedia: A Diagnostic Taxonomy and Generative Framework | Anton Chernyavskiy, Dmitry Ilvovsky and Preslav Nakov |
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | Andreea Iana, Fabian David Schmidt, Goran Glavaš and Heiko Paulheim |
Towards Efficient and Explainable Hate Speech Detection via Model Distillation | Paloma Piot and Javier Parapar |
Enhancing Utility in Differentially Private Recommendation Data Release via Exponential Mechanism | Antonio Ferrara, Angela Di Fazio, Alberto Carlo Maria Mancino, Tommaso Di Noia and Eugenio Di Sciascio |
Graphs & RAG
Graph Representation of Tables+Text and Compact Subgraph Retrieval for QA Tasks | Vishwajeet Kumar, Jaydeep Sen, Bhawna Chelani and Soumen Chakrabarti |
Higher Order Knowledge Graph Embeddings | Giuseppe Pirrò |
Town Mice versus Country Mice: Urban Bias in Job Recommender Systems | Roan Schellingerhout, Francesco Barile and Nava Tintarev |
Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering | Imed Keraghel and Mohamed Nadif |
Leveraging Retrieval-Augmented Generation for Keyphrase Synonym Suggestion | Jorge Gabín and Javier Parapar |
Is Relevance ‘Lost in Transmission’ from Retriever to Generator? | Fangzheng Tian, Debasis Ganguly and Craig Macdonald |
Recommenders
Repeat-bias-aware Optimization of Beyond-accuracy Metrics for Next Basket Recommendation | Yuanna Liu, Ming Li, Mohammad Aliannejadi and Maarten de Rijke |
CountNet: Utilising Repetition Counts in Sequential Recommendation | Aleksandr V. Petrov, Efi Karra Taniskidou and Sean Murphy |
Feature Attribution Explanations of Session-based Recommendations | Simone Borg Bruun, Maria Maistro and Christina Lioma |
Embedding Cultural Diversity in Prototype-based Recommender Systems | Armin Moradi, Nicola Neophytou, Florian Carichon and Golnoosh Farnadi |
Town Mice versus Country Mice: Urban Bias in Job Recommender Systems | Roan Schellingerhout, Francesco Barile and Nava Tintarev |
LLM is Knowledge Graph Reasoner: LLM’s Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation | Keigo Sakurai, Ren Togo, Takahiro Ogawa and Miki Haseyama |
Conversational and Robust IR
Zero-Shot and Efficient Clarification Need Prediction in Conversational Search | Lili Lu, Chuan Meng, Federico Ravenda, Mohammad Aliannejadi and Fabio Crestani |
Improving the Re-Usability of Conversational Search Test Collections | Zahra Abbasiantaeb, Chuan Meng, Leif Azzopardi and Mohammad Aliannejadi |
Malevolence Attacks Against Pretrained Dialogue Models | Pengjie Ren, Ruiqi Li, Zhaochun Ren, Zhumin Chen, Maarten de Rijke and Yangjun Zhang |
mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval | Orion Weller, Benjamin Chang, Eugene Yang, Mahsa Yarmohammadi, Sam Barham, Sean MacAvaney, Arman Cohan, Luca Soldaini, Benjamin Van Durme and Dawn Lawrie |
Query Performance Prediction using Dimension Importance Estimators | Guglielmo Faggioli, Nicola Ferro, Raffaele Perego and Nicola Tonellotto |
On the Robustness of Generative Information Retrieval Models: An Out-of-Distribution Perspective | Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke and Xueqi Cheng |
About rankers and rerankers
Guiding Retrieval using Large Language Models | Mandeep Rathee, Sean MacAvaney and Avishek Anand |
Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders | Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast and Matthias Hagen |
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models | Xinyu Zhang, Sebastian Hofstätter, Patrick Lewis, Raphael Tang and Jimmy Lin |
An Investigation of Prompt Variations for Zero-shot LLM-based Rankers | Shuoqi Sun, Shengyao Zhuang, Shuai Wang and Guido Zuccon |
Can Large Language Models Effectively Rerank News Articles for Background Linking? | Marwa Essam and Tamer Elsayed |
One size doesn’t fit all: Predicting the Number of Examples for In-Context Learning | Manish Chandra, Debasis Ganguly and Iadh Ounis |
Across modalities and languages
A Multi-modal Recipe for Improved Multi-domain Recommendation | Zixuan Yi and Iadh Ounis |
Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline | Nicola Messina, Lucia Vadicamo, Leo Maltese and Claudio Gennaro |
Patent Figure Classification using Large Vision-language Models | Sushil Awale, Eric Müller-Budack and Ralph Ewerth |
MVAM: Multi-View Attention Method for Fine-grained Image-Text Matching | Wanqing Cui, Rui Cheng, Jiafeng Guo and Xueqi Cheng |
Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval | Giacomo Pacini, Fabio Carrara, Nicola Messina, Nicola Tonellotto, Giuseppe Amato and Fabrizio Falchi |
Visual Latent Captioning – Towards Verbalizing Vision Transformer Encoders | Sogol Haghighat, Tim Daniel Metzler, Santosh Thoduka and Sebastian Houben |
Efficiency in IR and NLP
MURR: Model Updating with Regularized Replay for Searching a Document Stream | Eugene Yang, Nicola Tonellotto, Dawn Lawrie, Sean MacAvaney, James Mayfield, Douglas Oard and Scott Miller |
Token Pruning Optimization for Efficient Dense Retrieval with Multi-Vector Representations | Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang and Choon Hui Teo |
CUP: a Framework for Resource-Efficient Review-Based Recommenders | Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum and Andrew Yates |
LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval | Yingrui Yang, Parker Carlson, Yifan Qiao, Wentai Xie, Shanxiu He and Tao Yang |
Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval | Aymen Berriche, Mehdi Zakaria Adjal and Riyadh Baghdadi |
Decoding the Hierarchy: A Hybrid Approach to Hierarchical Multi-Label Text Classification | Fatos Torba, Christophe Gravier, Charlotte Laclau, Abderrahmen Kammoun and Julien Subercaze |
Findings
Evaluating Auto-complete Ranking for Diversity and Relevance | Sonali Singh, Sachin Farfade and Prakash Mandayam Comar |
Semantically Proportioned nDCG for Explaining ColBERT’s Learning Process | Ariane Mueller and Craig Macdonald |
Exploring the relationship between listener receptivity and source of music recommendations | John Paul Vargheese, Marianne Wilson, Katherine Stephen, Rachel Salzano and David Brazier |
Uncertainty Estimation in the Real World: A study on Music Emotion Recognition | Karn N Watcharasupat, Yiwei Ding, Aleksandra T Ma, Pavan Seshadri and Alexander Lerch |
Unraveling the Impact of Visual Complexity on Search as Learning | Wolfgang Gritz, Anett Hoppe and Ralph Ewerth |
Semi-supervised image-based narrative extraction: A case study with historical photographic records | Fausto German, Brian Keith, Mauricio Matus, Diego Urrutia and Claudio Meneses |
FrameworkX: A Reusable RAG Framework and Baselines for TrackY | Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell and Jimmy Lin |
Lost but Not Only in the Middle: Positional Bias in Retrieval Augmented Generation | Jan Hutter, David Rau, Maarten Marx and Jaap Kamps |
Evaluating Sequential Recommendations in the Wild: A Case Study on Offline Accuracy, Click Rates, and Consumption | Anastasiia Klimashevskaia, Snorre Alvsvåg, Christoph Trattner, Alain D. Starke, Astrid Tessem and Dietmar Jannach |
Biased PromptORE: Enhancing Relation Extraction in Gendered Languages and Complex Texts – The Case of Spanish Documents from the XVI Century | Héctor López Hidalgo, Michel Boeglin, David Kahn, Josiane Mothe, Diego Ortiz and David Panzoli |
Efficient Session Retrieval Using Topical Index Shards | Gijs Hendriksen, Djoerd Hiemstra and Arjen de Vries |
CLEF & Repro Tracks 1
EXIST 2025: Learning with Disagreement for Sexism Identification and Characterization in Tweets, Memes, and TikTok Videos | Laura Plaza, Jorge Carrillo-De-Albornoz, Iván Arcos, Paolo Rosso, Damiano Spina, Enrique Amigó, Julio Gonzalo and Roser Morante |
Towards Reproducibility of Interactive Retrieval Experiments: Framework and Case Study | Jana Isabelle Friese and Norbert Fuhr |
Combining and Evaluating Query Performance Predictors: A Reproducibility Study | Sourav Saha, Suchana Datta, Dwaipayan Roy, Mandar Mitra and Derek Greene |
LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance | Matteo Cancellieri, Alaa El-Ebshihy, Tobias Fink, Petra Galuščáková, Gabriela González Sáez, Lorraine Goeuriot, David Iommi, Jüri Keller, Petr Knoth, Philippe Mulhem, Florina Piroi, David Pride and Philipp Schaer |
On the Reproducibility of: Adapting Learned Sparse Retrieval for Long Documents | Emmanouil Georgios Lionis and Jia-Huei Ju |
Fact vs. Fiction: Are the Reportedly “Magical” LLM-Based Sequential Recommenders Reproducible? | Shirin Tahmasebi, Narjes Nikzad, Amir H. Payberah, Meysam Asgari-Chenaghlu and Mihhail Matskin |
BioASQ at CLEF2025: The thirteenth edition of the large-scale biomedical semantic indexing and question answering challenge | Anastasios Nentidis, Georgios Katsimpras, Anastasia Krithara, Martin Krallinger, Miguel Rodriguez Ortega, Natalia Loukachevitch, Andrey Sakhovskiy, Elena Tutubalina, Grigorios Tsoumakas, George Giannakoulas, Alexandra Bekiaridou, Athanasios Samaras, Giorgio Maria Di Nunzio, Nicola Ferro, Stefano Marchesin, Laura Menotti, Gianmaria Silvello and Georgios Paliouras |
A Reproducibility Study on Consistent LLM Reasoning for Natural Language Inference over Clinical Trials | Artur Guimarães, João Magalhães and Bruno Martins |
eRisk 2025: Contextual and Conversational Approaches for Depression Challenges | Javier Parapar, Anxo Perez, Xi Wang and Fabio Crestani |
LifeCLEF 2025 Teaser: Challenges on Species Presence Prediction and Identification, and Individual Animal Identification | Alexis Joly, Lukáš Picek, Stefan Kahl, Hervé Goëau, Lukas Adam, Christophe Botella, Diego Marcos, Maximilien Servajean, César Leblanc, Theo Larcher, Jiri Matas, Klara Janouskova, Vojtěch Čermák, Kostas Papafitsoros, Robert Planqué, Willem-Pier Vellinga, Holger Klinck, Tom Denton, Pierre Bonnet and Henning Müller |
ImageCLEF 2025: Multimedia Retrieval in Medical, Social Media and Content Recommendation Applications | Bogdan Ionescu, Henning Müller, Dan-Cristian Stanciu, Ahmad Idrissi-Yaghir, Ahmedkhan Radzhabov, Alba García Seco de Herrera, Alexandra Andrei, Andrea Storås, Asma Ben Abacha, Benjamin Bracke, Benjamin Lecouteux, Benno Stein, Cécile Macaire, Christoph Friedrich, Cynthia Sabrina Schmidt, Didier Schwab, Dimitar Dimitrov, Emmanuelle Esperança-Rodier, Gabriel Constantin, Hendrik Damm, Henning Schäfer, Ivan Rodkin, Johannes Kiesel, Johannes Rückert, Liviu-Daniel Stefan, Louise Bloch, Martin Potthast, Maximilian Heinrich, Helmut Becker, Ivan Koychev, Josep Malvehy, Michael Riegler, Mihai Dogariu, Noel Codella, Pål Halvorsen, Preslav Nakov, Raphael Brüngel, Roberto Andres Novoa, Rocktim Jyoti Das, Steven A. Hicks, Sushant Gautam, Tabea M. G. Pakull, Vajira Thambawita, Vassili Kovalev, Wen-Wai Yim and Zhuohan Xie |
CLEF & Repro Tracks 2
CLEF 2025 SimpleText Track: Simplify Scientific Text (and Nothing More) | Liana Ermakova and Jaap Kamps |
CLEF 2025 JOKER Lab: Humour in the Machine | Liana Ermakova, Anne-Gwenn Bosser, Tristan Miller and Ricardo Campos |
QuantumCLEF 2025 – The Second Edition of the Quantum Computing Lab at CLEF | Andrea Pasin, Maurizio Ferrari Dacrema, Paolo Cremonesi, Washington Cunha, Marcos Goncalves and Nicola Ferro |
Overview of PAN 2025: Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection | Janek Bevendorff, Daryna Dementieva, Maik Fröbe, Bela Gipp, André Greiner-Petter, Jussi Karlgren, Maximilian Mayerl, Preslav Nakov, Alexander Panchenko, Martin Potthast, Artem Shelmanov, Efstathios Stamatatos, Benno Stein, Yuxia Wang, Matti Wiegmann and Eva Zangerle |
The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Extraction & Normalization, and Retrieval | Firoj Alam, Julia Maria Struß, Tanmoy Chakraborty, Stefan Dietze, Salim Hafid, Katerina Korre, Arianna Muti, Preslav Nakov, Federico Ruggeri, Sebastian Schellhamm, Vinay Setty, Megha Sundriyal, Konstantin Todorov and Venktesh Viswanathan |
Revisiting Language Models in Neural News Recommender Systems: A Reproducibility Study | Yuyue Zhao, Jin Huang, David Vos and Maarten de Rijke |
Overview of Touché 2025: Argumentation Systems | Johannes Kiesel, Çağrı Çöltekin, Marcel Gohsen, Sebastian Heineking, Maximilian Heinrich, Maik Fröbe, Tim Hagen, Mohammad Aliannejadi, Tomaž Erjavec, Matthias Hagen, Matyáš Kopp, Nikola Ljubešić, Katja Meden, Nailia Mirzakhmedova, Vaidas Morkevičius, Harrisen Scells, Ines Zelch, Martin Potthast and Benno Stein |
Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval | Yongkang Li, Panagiotis Eustratiadis and Evangelos Kanoulas |
TalentCLEF at CLEF2025: Skill and Job Title Intelligence for Human Capital Management | Luis Gasco, Hermenegildo Fabregat, Laura García-Sardiña, Daniel Deniz, Alvaro Rodrigo and Rabih Zbib |
A Reproducibility Study for Joint Information Retrieval and Recommendation in Product Search | Simone Merlo, Guglielmo Faggioli and Nicola Ferro |
Are Representation Disentanglement and Interpretability Linked in Recommendation Models? A Critical Review and Reproducibility Study | Ervin Dervishaj, Tuukka Ruotsalo, Maria Maistro and Christina Lioma |
ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality, 2nd edition | Jussi Karlgren, Ekaterina Artemova, Ondřej Bojar, Timothee Mickus, Vladislav Mikhailov, Magnus Sahlgren, Erik Velldal and Lilja Øvrelid |