TIES is a trainable information extraction system developed in an object-oriented fashion with Java. The application package supplies a set of interfaces and classes for training, testing and running an extraction task both in traditional (natural text) and wrapper (machine-generated or rigidly-structured text) domains. TIES is based on a reimplementation of the Boosted Wrapper Induction (BWI) algorithm devised by Dayne Freitag and Nicholas Kushmerick [1]. The system architecture is strongly based on boosting and wrapper induction techniques, but it has a high degree of flexibility allowing programmers, if necessary, to develop their own weak learner implementation to bootstrap, as well as to add new validation strategies. The default implementation exploits only simple features, which map an individual token to an arbitrary set of wildcards (e.g. capitalized, lower-case, punctuation), but more complex features (e.g., morpho-syntactic ones), if available, could be provided to the algorithm. In this case a different feature extraction method must be supplied. The system comes with default implementation of all the interfaces defined, therefore the application can also be used without programming experience. In the remaining sections of the tutorial, you are provided with step-by-step instructions for installing, configuring and performing common tasks using TIES software. You will benefit most from this tutorial when you complete these sections in order. You will be performing these tasks in the actual TIES environment - not a simulation
TIES 1.2 User Manual
2003-01-01
Abstract
TIES is a trainable information extraction system developed in an object-oriented fashion with Java. The application package supplies a set of interfaces and classes for training, testing and running an extraction task both in traditional (natural text) and wrapper (machine-generated or rigidly-structured text) domains. TIES is based on a reimplementation of the Boosted Wrapper Induction (BWI) algorithm devised by Dayne Freitag and Nicholas Kushmerick [1]. The system architecture is strongly based on boosting and wrapper induction techniques, but it has a high degree of flexibility allowing programmers, if necessary, to develop their own weak learner implementation to bootstrap, as well as to add new validation strategies. The default implementation exploits only simple features, which map an individual token to an arbitrary set of wildcards (e.g. capitalized, lower-case, punctuation), but more complex features (e.g., morpho-syntactic ones), if available, could be provided to the algorithm. In this case a different feature extraction method must be supplied. The system comes with default implementation of all the interfaces defined, therefore the application can also be used without programming experience. In the remaining sections of the tutorial, you are provided with step-by-step instructions for installing, configuring and performing common tasks using TIES software. You will benefit most from this tutorial when you complete these sections in order. You will be performing these tasks in the actual TIES environment - not a simulationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.