Theses

Student

Toni Kukurin

Title HR

Klasifikacija tvrdnji i stavova na internetskim raspravama pomoću strojnog učenja

Title EN

Claim and Stance Classification in Online Discussions Using Machine Learning

Year

2017

Level

Undergraduate

Supervisor

Jan Šnajder

Co-supervisor

Hands-on assistant

Filip Boltužić

Study Programme

FER

Programme

FER2

Thesis ID

5323

Number of pages

Language

Abstract HR

Forumi i razne ostale web-stranice s mogućnošću korisničke interakcije često pružaju vrijedne informacije o širokom spektru zanimljivih tema te nam je iz tog razloga zanimljiva automatska analiza takvih izvora podataka. Polazeći iz te ideje, u ovom radu implementiramo sustav strojnog učenja za klasifikaciju stava i podjelu argumenata prema tipu za kratke online tekstove pisane engleskim jezikom. Koristimo pristup nadziranog učenja nad ručno označenim podacima, parafrazama argumentativnih segmenata korisničkih tekstova, kao i neprerađenim korisničkim tekstovima. Vrednujemo i analiziramo implementirane sustave, izlažemo naše zaključke o prikupljenim rezultatima te pružamo smjernice za daljnji rad unutar dane domene.

Abstract EN

Online forums and various other types of question-answering websites often provide valuable information regarding a wide array of different topics, thereby making automatic analysis of the information from these sources an interesting research task. Building on the previous statement, in this thesis we devise and implement a Machine Learning model for classifying stance and type in short online user comments written in English. This is done by employing a supervised learning approach on manually-extracted argumentative text segments and their paraphrases, as well as original user texts. We evaluate and analyze our models, provide conclusions on obtained results, and offer suggestions for future exploratory work to be performed in the given domain.

Keywords HR

prirodna obrada jezika, strojno učenje, analiza argumentacije, klasifikacija stavova, klasifikacija tvrdnji

Keywords EN

natural language processing, machine learning, argumentation mining, stance detection, claim classification

Defense date

6.7.2017.

Thesis task HR

Thesis task EN

Online user comments are a valuable source of information for the analysis of opinions on events and their protagonists, political decisions, political subjects, ideological issues, contested topics, etc. Stance classification is a new research area at the intersection of sentiment analysis and argumentation mining, concerned with the automatic detection of stances expressed in text. The task is especially challenging in the context of user comments on the Internet, due to the informality and brevity of texts. The topic of this thesis is the application of machine learning to claim and stanceclassification in online user comments in English. Do a literature study on short text classification, with an emphasis on stance classification. Devise and implement a model for claim classification based on their type within informal argumentation (fact, value, policy) and for claim stance classification with respect to a given topic (for, against, neutral). For model development and testing, use the dataset from (Hasan and Ng, 2014), and carry out an additional annotation round at the level of individual claims. Carry out a detailed evaluation of the model, comparison against a baseline, a statistical analysis of the results, and an error analysis. All references must be cited, and all source code, documentation, executables, and datasets must be provided with the thesis.

Publicly available

Published paper(s)

File

TakeLab-ZR-2017-ToniKukurin.pdf