Theses

Student

Ivan Mršić

Title HR

Prepoznavanje i klasifikacija imenovanih entiteta za jezično sučelje bazi podataka

Title EN

Entity Recognition and Classification for a Natural Language Database Interface

Year

2017

Level

Undergraduate

Supervisor

Jan Šnajder

Co-supervisor

Hands-on assistant

Study Programme

FER

Programme

FER2

Thesis ID

5326

Number of pages

Language

Abstract HR

Zbog uvijek prisutne ljudske potrebe za razumijevanjem što ve ́cih količina podataka i rasta računalne snage baze grafova dobivaju na popularnosti. Još jedan korak k boljoj integraciji čovjeka i računala jest izrada jezičnog sučelja za komunikaciju s takvom bazom. U ovom radu razvijen je i ispitan sustav za izvlačenje entiteta iz upita u svrhu izgradnje takvog sučelja, pogotovo imenovanih entiteta te su u tu svrhu izgrađena četiri modela. Dobiveni rezultati su ohrabrujući, te pokazuju da se i na vrlo malom skupu podataka mogu naučiti modeli koji rade sa specijaliziranim upitima.

Abstract EN

Because of the always present human need for understanding big quantities of data and computer power growth, graph databases are gaining in popularity. Another step in a better connection of human and the computer would be a natural language interface for communication with such a database. This thesis describes the development and evaluation of a entity extraction from queries, which is part of such natural language interface. Special attention was given to named entity extraction. Four models were built for this purpose. End results are encouraging, showing that even on small datasets models can be created to deduct from specialized queries.

Keywords HR

obrada prirodnog jezika, izvlačenje imenovanih entiteta, stroj potpornih vektora, stabla odluke, CRF, jezično sučelje, graf baza podataka

Keywords EN

natural language processing, named entity extraction, support vector machine, decision trees, CRF, natural language interface, graph database

Defense date

6.7.2017.

Thesis task HR

Thesis task EN

A natural language database interface allows the users to query the database in a controlled natural language. Such interfaces rely on natural language processing and machine learning for analyzing the query. One of the steps in query analysis is the extraction of key information from the query, such as named entities. The topic of this thesis is the automatic extraction of named entities from user queries over a database of famous people. Study the methods for named entity extraction, with an emphasis on machine learning methods for sequential labeling. Devise and implement a method for named entity extraction from user queries in English, which will also cover the classification of each entity as either a target or reference entity with respect to the query intent. Compile a suitable collection for model training and testing, which will include query examples with pre-labeled named entities. Carry out an experimental evaluation of the model, a comparison against a baseline, a statistical analysis of the results, and an error analysis. All references must be cited, and all source code, documentation, executables, and datasets must be provided with the thesis.

Publicly available

Published paper(s)

File

TakeLab-ZR-2017-IvanMrsic.pdf