I have got some questions and I want to withdraw a hint from the exercise set, so I include some last minute comments here.
**kwargs
I have got some questions regarding **kwargs; how they function and can be put to use.
To give an impression, take as example the simple call [12] under section 1.3.
chunker_0 = Chunker(dev_train)
This is equivalent to
chunker_0 = Chunker(dev_train, features=npchunk_features_0)
With this call, the **kwargs of Chunker.__init__() is instantiated by features=npchunk_features_0. We could in principle have any number of keyword arguments here. Chunker.__init__ sends these arguments further to its tagger. In this case, that means that
self.tagger = tagger(tagged_sents, **kwargs)
gets instantiated as
self.tagger = ConsecutiveTagger(tagged_sents, features=npchunk_features_0)
7 POS-tagged vs. untagged text
I withdraw the hint. First, it did not work as expected due to the way the Chunker.evaluate() is implemented in NLTK. Second, it was not a principle solution.
There are several ways to solve this point. The fastest is probable to stick to using the Chunker.evaluate() method. To get that to work, you must extract the words from dev_test and pos tag them. Then you must replace the pos-tags in dev_test with your own pos-tags (but not replace the chunk tags). You must also translate back and forth between the tree format and the conll format.
11 Do it yourself
Several seem to have problems with this one. First you must get a clear understanding of what constitutes an NP-chunk when presented in the IOB format. Then you must identify the chunks in both columns. And then you compare the two lists of chunks to identify the ones that are common, i.e. the true positives (TPs), and the FPs and the FNs.