Dublin, 1 June 2023: ADAPT Natural language Processing (NLP) Researcher, Filip Klubička (Technological University Dublin), recently presented a paper on idiomaticity in vector space at a workshop on Multiword Expressions at the European Chapter of the Association for Computational Linguistics that took place May 2 – 6th 2023. The paper, titled “Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space” was co-authored by Prof. John Kelleher (Maynooth University) and Vasudevan Nedumpozhimana (Technological University Dublin).
The paper aims to uncover more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. The team repurposed an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a
comparative probing study of static (GloVe) and contextual (BERT) embeddings. The experiments included in the paper indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm. They also identify the limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.
The full talk can be accessed via Youtube here.
Image: An illustrative example of a vector space model (source)