Repository logoRepository logo
GRO
  • GRO.data
  • GRO.plan
Help
  • English
  • Deutsch
Log In
New user? Click here to register.Have you forgotten your password?
Publications
Researcher
Organizations
Other
  • Journals
  • Series
  • Events
  • Projects
  • Working Groups

Browsing by Author "Ruas, Terry"

Filter results by typing the first few letters
Now showing 1 - 10 of 10
  • Results Per Page
  • Sort Options
  • Some of the metrics are blocked by your 
    consent settings
    A domain-adaptive pre-training approach for language bias detection in news
    (ACM, 2022)
    Krieger, Jan-David
    ;
    Spinde, Timo
    ;
    Ruas, Terry
    ;
    Kulshrestha, Juhi
    ;
    Gipp, Bela
  • Some of the metrics are blocked by your 
    consent settings
    Aspect-based Document Similarity for Research Papers
    (2020-10-13)
    Ostendorff, Malte
    ;
    Ruas, Terry
    ;
    Blume, Till
    ;
    Gipp, Bela  
    ;
    Rehm, Georg
    Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.
  • Some of the metrics are blocked by your 
    consent settings
    Aspect-based Document Similarity for Research Papers
    (2020)
    Ostendorff, Malte
    ;
    Ruas, Terry
    ;
    Blume, Till
    ;
    Gipp, Bela  
    ;
    Rehm, Georg
    ;
    Scott, Donia
    ;
    Bel, Nuria
    ;
    Zong, Chengqing
  • Some of the metrics are blocked by your 
    consent settings
    Identifying Machine-Paraphrased Plagiarism
    (Springer, 2022)
    Wahle, Jan Philip
    ;
    Ruas, Terry
    ;
    Foltýnek, Tomáš
    ;
    Meuschke, Norman
    ;
    Gipp, Bela  
    ;
    Smits, Malte
  • Some of the metrics are blocked by your 
    consent settings
    Incorporating Word Sense Disambiguation in Neural Language Models
    (2021-06-15)
    Wahle, Jan Philip
    ;
    Ruas, Terry
    ;
    Meuschke, Norman  
    ;
    Gipp, Bela  
    We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs). The training improves our models' performance for Word Sense Disambiguation (WSD) but also benefits general language understanding tasks while adding almost no parameters. We evaluate our techniques with seven different neural LMs and find that XLNet is more suitable for WSD than BERT. Our best-performing methods exceeds state-of-the-art WSD techniques on the SemCor 3.0 dataset by 0.5% F1 and increase BERT's performance on the GLUE benchmark by 1.1% on average.
  • Some of the metrics are blocked by your 
    consent settings
    Math-word embedding in math search and semantic extraction
    (2020)
    Greiner-Petter, André
    ;
    Youssef, Abdou
    ;
    Ruas, Terry
    ;
    Miller, Bruce R.
    ;
    Schubotz, Moritz
    ;
    Aizawa, Akiko
    ;
    Gipp, Bela  
  • Some of the metrics are blocked by your 
    consent settings
    Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
    (2020-03-22)
    Ostendorff, Malte
    ;
    Ruas, Terry
    ;
    Schubotz, Moritz
    ;
    Rehm, Georg
    ;
    Gipp, Bela  
    Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
  • Some of the metrics are blocked by your 
    consent settings
    Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
    (ACM Digital Library, 2020)
    Ostendorff, Malte
    ;
    Ruas, Terry
    ;
    Schubotz, Moritz
    ;
    Rehm, Georg
    ;
    Gipp, Bela  
    ;
    Huang, Ruhua
    ;
    Wu, Dan
    ;
    Marchionini, Gary
    ;
    He, Daqing
    ;
    Cunningham, Sally Jo
    ;
    Hansen, Preben
  • Some of the metrics are blocked by your 
    consent settings
    Specialized document embeddings for aspect-based similarity of research papers
    (ACM, 2022)
    Ostendorff, Malte
    ;
    Blume, Till
    ;
    Ruas, Terry
    ;
    Gipp, Bela
    ;
    Rehm, Georg
  • Some of the metrics are blocked by your 
    consent settings
    Why Machines Cannot Learn Mathematics, Yet
    (2019-05-20)
    Greiner-Petter, André
    ;
    Ruas, Terry
    ;
    Schubotz, Moritz
    ;
    Aizawa, Akiko
    ;
    Grosky, William
    ;
    Gipp, Bela  
    Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers.

About

About Us
FAQ
ORCID
End User Agreement
Privacy policy
Cookie consent
Imprint

Contact

Team GRO.publications
support-gro.publications@uni-goettingen.de
Matrix Chat: #support_gro_publications
Feedback

Göttingen Research Online

Göttingen Research Online bundles various services for Göttingen researchers:

GRO.data (research data repository)
GRO.plan (data management planning)
GRO.publications (publication data repository)
Logo Uni Göttingen
Logo Campus Göttingen
Logo SUB Göttingen
Logo eResearch Alliance

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International license.