Cities' visual appearance plays a central role in shaping human perception and response to the surrounding urban environment. For example, the visual qualities of urban spaces affect the psychological states of their inhabitants and can induce negative social outcomes. Hence, it becomes critically important to understand people's perceptions and evaluations of urban spaces. Previous works have demonstrated that algorithms can be used to predict high level attributes of urban scenes (e.g. safety, attractiveness, uniqueness), accurately emulating human perception. In this paper we propose a novel approach for predicting the perceived safety of a scene from Google Street View Images. Opposite to previous works, we formulate the problem of learning to predict high level judgments as a ranking task and we employ a Convolutional Neural Network (CNN), significantly improving the accuracy of predictions over previous methods. Interestingly, the proposed CNN architecture relies on a novel pooling layer, which permits to automatically discover the most important areas of the images for predicting the concept of perceived safety. An extensive experimental evaluation, conducted on the publicly available Place Pulse dataset, demonstrates the advantages of the proposed approach over state-of-the-art methods.
Predicting and understanding urban perception with convolutional neural networks
Porzi, Lorenzo;Rota Bulò, Samuel;Lepri, Bruno;Ricci, Elisa
2015-01-01
Abstract
Cities' visual appearance plays a central role in shaping human perception and response to the surrounding urban environment. For example, the visual qualities of urban spaces affect the psychological states of their inhabitants and can induce negative social outcomes. Hence, it becomes critically important to understand people's perceptions and evaluations of urban spaces. Previous works have demonstrated that algorithms can be used to predict high level attributes of urban scenes (e.g. safety, attractiveness, uniqueness), accurately emulating human perception. In this paper we propose a novel approach for predicting the perceived safety of a scene from Google Street View Images. Opposite to previous works, we formulate the problem of learning to predict high level judgments as a ranking task and we employ a Convolutional Neural Network (CNN), significantly improving the accuracy of predictions over previous methods. Interestingly, the proposed CNN architecture relies on a novel pooling layer, which permits to automatically discover the most important areas of the images for predicting the concept of perceived safety. An extensive experimental evaluation, conducted on the publicly available Place Pulse dataset, demonstrates the advantages of the proposed approach over state-of-the-art methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.