Abstract

The correct semantic interpretation of mathematical formulae in electronic mathematical documents is an important prerequisite for advanced tasks such as search, accessibility or computational processing. Especially in advanced maths, the meaning of characters and symbols is highly domain dependent, and only limited information can be gained from considering individual formulae and their structures. Although many approaches have been proposed for semantic interpretation of mathematical formulae, most of them rely on the limited semantics from maths representation languages whereas very few use maths context as a source of information. This thesis presents a novel approach for principal extraction of semantic information of mathematical formulae from their context in documents. We utilised different supervised machine learning (SML) techniques (i.e. Linear-Chain Conditional Random Fields (CRF), Maximum Entropy (MaxEnt) and Maximum Entropy Markov Models (MEMM) combined with Rprop- and Rprop+ optimisation algorithms) to detect definitions of simple and compound mathematical expressions, thereby deriving their meaning. The learning algorithms demand annotated corpus which its development considered as one of this thesis contributions. The corpus has been developed utilising a novel approach to extract desired maths expressions and sub-formulae and manually annotated by two independent annotators employing a standard measure for inter-annotation agreement. The thesis further developed a new approach to feature representation depending on the definitions' templates that extracted from maths documents to defeat the restraint of conventional window-based features. All contributions were evaluated by various techniques including employing the common metrics recall, precision, and harmonic F-measure.

Year of Publication
2018
Academic Department
School of Computer Science, College of Engineering & Physical Sciences
Degree
Ph.D.
Date Published
12/2018
University
University of Birmingham
City
Birmingham
URL
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=B_gi7PMAAAAJ&citation_for_view=B_gi7PMAAAAJ:zYLM7Y9cAGgC
Download citation

Context classification for improved semantic understanding of mathematical formulae

Lecturer of Software Engineering

Citation: Almomen R. Context classification for improved semantic understanding of mathematical formulae. School of Computer Science, College of Engineering & Physical Sciences. 2018;Ph.D. https://scholar.google.com/citations?view_op=view_citation&hl=en&user=B_gi7PMAAAAJ&citation_for_view=B_gi7PMAAAAJ:zYLM7Y9cAGgC.

In: School of Computer Science, College of Engineering & Physical Sciences

Published by: University of Birmingham , 2018

Birmingham

Cited by: