IRIS Institutional Research Information System

Semantic Image Interpretation is the task of extracting a structured semantic description from images. This requires the detection of visual relationships: triples 〈subject, relation, object〉 describing a semantic relation between a subject and an object. A pure supervised approach to visual relationship detection requires a complete and balanced training set for all the possible combinations of 〈subject, relation, object〉. However, such training sets are not available and would require a prohibitive human effort. This implies the ability of predicting triples which do not appear in the training set. This problem is called zero-shot learning. State-of-the-art approaches to zero-shot learning exploit similarities among relationships in the training set or external linguistic knowledge. In this paper, we perform zero-shot learning by using Logic Tensor Networks, a novel Statistical Relational Learning framework that exploits both the similarities with other seen relationships and background knowledge, expressed with logical constraints between subjects, relations and objects. The experiments on the Visual Relationship Dataset show that the use of logical constraints outperforms the current methods. This implies that background knowledge can be used to alleviate the incompleteness of training sets.

Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation

Donadello, Ivan;Serafini, Luciano

2019-01-01

Abstract

Semantic Image Interpretation is the task of extracting a structured semantic description from images. This requires the detection of visual relationships: triples 〈subject, relation, object〉 describing a semantic relation between a subject and an object. A pure supervised approach to visual relationship detection requires a complete and balanced training set for all the possible combinations of 〈subject, relation, object〉. However, such training sets are not available and would require a prohibitive human effort. This implies the ability of predicting triples which do not appear in the training set. This problem is called zero-shot learning. State-of-the-art approaches to zero-shot learning exploit similarities among relationships in the training set or external linguistic knowledge. In this paper, we perform zero-shot learning by using Logic Tensor Networks, a novel Statistical Relational Learning framework that exploits both the similarities with other seen relationships and background knowledge, expressed with logical constraints between subjects, relations and objects. The experiments on the Visual Relationship Dataset show that the use of logical constraints outperforms the current methods. This implies that background knowledge can be used to alleviate the incompleteness of training sets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN
	
				978-1-7281-1985-4
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Compensating_Supervision_Incompleteness_Prior_Knowledge_Semantic_Image_Interpretation.pdf solo utenti autorizzati Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 314.51 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	314.51 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/319868

Citazioni

ND

social impact