A small percentage of wrongly tagged words (2-3%) is a feature that is always present in traditional taggers. In order to try to improve this, we consider the idea of using the usual HMM approach in combination with syntactic information, that is, with probabilistic parsing techniques, in order to integrate much longer dependencies. The aim of the present work is to design methods for such techniques.
The general idea is: for the parseable sentences, to simply consider the tags in the leaves of the most probable syntactic tree; and for the non-parseable sentences, to consider all the analysis of all substrings of the input sentence and design strategies to combine those syntactic constraints so as to tag the sentence. We evaluate the performance of these new disambiguation techniques, and compare these with the traditional ones.
This work shows that the use of the syntactic constraints imposed by a grammar can be very useful for disambiguating a text in natural language. Although there are still a lot of interesting experiments to be performed, the obtained results seem to point in the right direction.