General description of the course
This year the course will consider two language technological applications in depth: machine translation (from August to mid-October) and distributional semantics (from mid-October to the end of the semester).
Machine translation
The idea that computers could be used to translate from one human language to another is nearly as old as the computer itself. Since 1950 there has been active research on the topic, and large resources and human efforts have been invested. Initially the results were rather disappointing, but during the last decade or so, machine translation has been put to use thanks to the internet and hand-held devices.
We will in this part take a glimpse on the history of machine trnaslation and techniques that have been and are currently used. We will consider in depth the ideas and techniques underlying so-called statistical machine translation (SMT) which is used e.g. by Google translate. We will also consider why machine translation is hard and problems which remain to be solved.
Distributional Semantics: Extracting Meaning from Data
The second half of the course will be devoted to distributional semantic models, which gained particular traction in natural language processing field in the recent years. In a few words, these are the computational approaches to model semantic similarity and dis-similarity of words using their typical contexts in large volumes of natural texts, with no need to define meanings manually. Such algorithms (often powered by artificial neural networks) today work under the hood of many real world applications, including search engines like Google and dialog agents like Apple's Siri.
We will thoroughly cover this area of research, starting from linguistic foundations and up to the most recent trends and discoveries. The course will include implementing and training a working distributional model in Python.
Published Aug. 16, 2016 4:27 PM