Theses

Toni Kukurin
Klasifikacija tvrdnji i stavova na internetskim raspravama pomoću strojnog učenja
Claim and Stance Classification in Online Discussions Using Machine Learning
2017
Undergraduate
Jan Šnajder
Filip Boltužić
FER
FER2
5323
33
EN
Forumi i razne ostale web-stranice s mogućnošću korisničke interakcije često pružaju vrijedne informacije o širokom spektru zanimljivih tema te nam je iz tog razloga zanimljiva automatska analiza takvih izvora podataka. Polazeći iz te ideje, u ovom radu implementiramo sustav strojnog učenja za klasifikaciju stava i podjelu argumenata prema tipu za kratke online tekstove pisane engleskim jezikom. Koristimo pristup nadziranog učenja nad ručno označenim podacima, parafrazama argumentativnih segmenata korisničkih tekstova, kao i neprerađenim korisničkim tekstovima. Vrednujemo i analiziramo implementirane sustave, izlažemo naše zaključke o prikupljenim rezultatima te pružamo smjernice za daljnji rad unutar dane domene.
Online forums and various other types of question-answering websites often provide valuable information regarding a wide array of different topics, thereby making automatic analysis of the information from these sources an interesting research task. Building on the previous statement, in this thesis we devise and implement a Machine Learning model for classifying stance and type in short online user comments written in English. This is done by employing a supervised learning approach on manually-extracted argumentative text segments and their paraphrases, as well as original user texts. We evaluate and analyze our models, provide conclusions on obtained results, and offer suggestions for future exploratory work to be performed in the given domain.
prirodna obrada jezika, strojno učenje, analiza argumentacije, klasifikacija stavova, klasifikacija tvrdnji
natural language processing, machine learning, argumentation mining, stance detection, claim classification
6.7.2017.
Online user comments are a valuable source of information for the analysis of opinions on events and their protagonists, political decisions, political subjects, ideological issues, contested topics, etc. Stance classification is a new research area at the intersection of sentiment analysis and argumentation mining, concerned with the automatic detection of stances expressed in text. The task is especially challenging in the context of user comments on the Internet, due to the informality and brevity of texts. The topic of this thesis is the application of machine learning to claim and stanceclassification in online user comments in English. Do a literature study on short text classification, with an emphasis on stance classification. Devise and implement a model for claim classification based on their type within informal argumentation (fact, value, policy) and for claim stance classification with respect to a given topic (for, against, neutral). For model development and testing, use the dataset from (Hasan and Ng, 2014), and carry out an additional annotation round at the level of individual claims. Carry out a detailed evaluation of the model, comparison against a baseline, a statistical analysis of the results, and an error analysis. All references must be cited, and all source code, documentation, executables, and datasets must be provided with the thesis.