In this project we propose methods for an appropriate caching and re-organization of the Web contents centered around the notion of "quiews", namely appropriate views of the Web on specific topics with emphasis on the quality. The extraction of information takes place on given topics using focused crawling, while the indexing and searching tasks are carried out by both content-based and collaborative filtering methods. Unlike search engines, aimed at indexing all the Web, it is reasonable to conceive advanced categorization models involving sophisticated taxonomies and to construct indexes enriched by linguistic features, which allow us to face the ambiguity that are typically connected with simple keywords-based queries. In addition, in spite of nowadays search engines, in the proposed model the searching platforms are expected to benefit from the users' relevant feedback and to learn the page rank, since they operate into a controlled environment. In order to limit unnecessary duplications of quiews, any federation of search engines which operate in different sites and organizations communicate through a distributed scheme
Quiew Requirements
Triolo, Enrico
2005-01-01
Abstract
In this project we propose methods for an appropriate caching and re-organization of the Web contents centered around the notion of "quiews", namely appropriate views of the Web on specific topics with emphasis on the quality. The extraction of information takes place on given topics using focused crawling, while the indexing and searching tasks are carried out by both content-based and collaborative filtering methods. Unlike search engines, aimed at indexing all the Web, it is reasonable to conceive advanced categorization models involving sophisticated taxonomies and to construct indexes enriched by linguistic features, which allow us to face the ambiguity that are typically connected with simple keywords-based queries. In addition, in spite of nowadays search engines, in the proposed model the searching platforms are expected to benefit from the users' relevant feedback and to learn the page rank, since they operate into a controlled environment. In order to limit unnecessary duplications of quiews, any federation of search engines which operate in different sites and organizations communicate through a distributed schemeI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.