Traditional models of parsing as used in interfaces have shown to be weak and ineffective in complex tasks such as processing of naturally-occurring texts. Broad coverage parsers go mad when confronted with extended inputs without sufficient information to control the interpretation process. It is well known – for example – that parsing can become a combinatoric task. The use of effective control strategies is necessary to overcome these shortcomings. Extra-linguistic criteria (e.g. statistics, semantic or goal-driven heuristics) can be employed to reduce the combinatorics of parsing and to avoid misdirected efforts, focusing the analysis on the most promising solutions. In this paper we present an approach to text parsing where an agenda-based bidirectional chart parser is coupled with a set of extra-linguistic control strategies. Those strategies affect the agenda management favoring some tasks, delaying or pruning others. Currently implemented strategies integrate both general and domain-specific criteria such as grammar rule scores and goal-driven (i.e. template-based) information. A preprocessing phase elaborate the text in order to collect information for the parsing control: text segmentation produces hints on the text superficial structure, and statistical classification provides the templates to fill. Afterwards the linguistic analyzer processes the text in two steps: segment parsing and segment combination. The control strategies are mainly applied during the latter phase. The two step parser is able to cope gradually with the linguistic complexity of the input, pursuing a complete analysis only when it is feasible; he feature of bidirectionality allows to maximize the coverage on chunks of input. Some preliminary results seem to show an improvement of the parsing efficiency.

Controlling Bidirectional Parsing for Efficient Text Analysis

Lavelli, Alberto
1995-01-01

Abstract

Traditional models of parsing as used in interfaces have shown to be weak and ineffective in complex tasks such as processing of naturally-occurring texts. Broad coverage parsers go mad when confronted with extended inputs without sufficient information to control the interpretation process. It is well known – for example – that parsing can become a combinatoric task. The use of effective control strategies is necessary to overcome these shortcomings. Extra-linguistic criteria (e.g. statistics, semantic or goal-driven heuristics) can be employed to reduce the combinatorics of parsing and to avoid misdirected efforts, focusing the analysis on the most promising solutions. In this paper we present an approach to text parsing where an agenda-based bidirectional chart parser is coupled with a set of extra-linguistic control strategies. Those strategies affect the agenda management favoring some tasks, delaying or pruning others. Currently implemented strategies integrate both general and domain-specific criteria such as grammar rule scores and goal-driven (i.e. template-based) information. A preprocessing phase elaborate the text in order to collect information for the parsing control: text segmentation produces hints on the text superficial structure, and statistical classification provides the templates to fill. Afterwards the linguistic analyzer processes the text in two steps: segment parsing and segment combination. The control strategies are mainly applied during the latter phase. The two step parser is able to cope gradually with the linguistic complexity of the input, pursuing a complete analysis only when it is feasible; he feature of bidirectionality allows to maximize the coverage on chunks of input. Some preliminary results seem to show an improvement of the parsing efficiency.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1132
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact