Browsing by Author "Ruas, Terry"

Now showing 1 - 10 of 10

Some of the metrics are blocked by your
consent settings
A domain-adaptive pre-training approach for language bias detection in news
(ACM, 2022)
Krieger, Jan-David
;
Spinde, Timo
;
Ruas, Terry
;
Kulshrestha, Juhi
;
Gipp, Bela
Some of the metrics are blocked by your
consent settings
Aspect-based Document Similarity for Research Papers
(2020-10-13)
Ostendorff, Malte
;
Ruas, Terry
;
Blume, Till
;
Gipp, Bela
;
Rehm, Georg
Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.
Some of the metrics are blocked by your
consent settings
Aspect-based Document Similarity for Research Papers
(2020)
Ostendorff, Malte
;
Ruas, Terry
;
Blume, Till
;
Gipp, Bela
;
Rehm, Georg
;
Scott, Donia
;
Bel, Nuria
;
Zong, Chengqing
Some of the metrics are blocked by your
consent settings
Identifying Machine-Paraphrased Plagiarism
(Springer, 2022)
Wahle, Jan Philip
;
Ruas, Terry
;
Foltýnek, Tomáš
;
Meuschke, Norman
;
Gipp, Bela
;
Smits, Malte
Some of the metrics are blocked by your
consent settings
Incorporating Word Sense Disambiguation in Neural Language Models
(2021-06-15)
Wahle, Jan Philip
;
Ruas, Terry
;
Meuschke, Norman
;
Gipp, Bela
We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs). The training improves our models' performance for Word Sense Disambiguation (WSD) but also benefits general language understanding tasks while adding almost no parameters. We evaluate our techniques with seven different neural LMs and find that XLNet is more suitable for WSD than BERT. Our best-performing methods exceeds state-of-the-art WSD techniques on the SemCor 3.0 dataset by 0.5% F1 and increase BERT's performance on the GLUE benchmark by 1.1% on average.
Some of the metrics are blocked by your
consent settings
Math-word embedding in math search and semantic extraction
(2020)
Greiner-Petter, André
;
Youssef, Abdou
;
Ruas, Terry
;
Miller, Bruce R.
;
Schubotz, Moritz
;
Aizawa, Akiko
;
Gipp, Bela
Some of the metrics are blocked by your
consent settings
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
(2020-03-22)
Ostendorff, Malte
;
Ruas, Terry
;
Schubotz, Moritz
;
Rehm, Georg
;
Gipp, Bela
Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
Some of the metrics are blocked by your
consent settings
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
(ACM Digital Library, 2020)
Ostendorff, Malte
;
Ruas, Terry
;
Schubotz, Moritz
;
Rehm, Georg
;
Gipp, Bela
;
Huang, Ruhua
;
Wu, Dan
;
Marchionini, Gary
;
He, Daqing
;
Cunningham, Sally Jo
;
Hansen, Preben
Some of the metrics are blocked by your
consent settings
Specialized document embeddings for aspect-based similarity of research papers
(ACM, 2022)
Ostendorff, Malte
;
Blume, Till
;
Ruas, Terry
;
Gipp, Bela
;
Rehm, Georg
Some of the metrics are blocked by your
consent settings
Why Machines Cannot Learn Mathematics, Yet
(2019-05-20)
Greiner-Petter, André
;
Ruas, Terry
;
Schubotz, Moritz
;
Aizawa, Akiko
;
Grosky, William
;
Gipp, Bela
Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers.