Problem Set (3b)
When we first posted our final Problem Set (3b) a little more than a week ago, there were three mistakes: (a) we had used the name ‘train.mrg’ in SVN, instead of ‘wsj.mrg’, as used in the exercise text; (b) the lexeme count in the grammar estimated from this file (i.e. the training part of the treebank) is around 44,000 (rather than 50,000); and, most importantly, (c) our skeleton code in ‘chart.lsp’ contained some tabulator characters (where there should rather be eight spaces). We encourage everyone to ‘svn update’ their INF4820 directory, to fix the file naming and have those tabulators removed from ‘chart.lsp’.
In the lecture tomorrow, we will fill in the remaining theory for one-best Viterbi chart parsing with PCFGs and evaluation of statistical parsers. Also, we will consolidate our emerging understanding of syntactic analysis for natural language with a quiz towards the end of the lecture. The deadline for submitting our final problem set is Wednesday next week, November 19.