Smart homes for the aging population have recently started attracting the attention of the research community. One of the problems of interest is this of monitoring the activities of daily living (ADLs) of the elderly aiming at their protection and well-being. In this work, we present our initial efforts to automatically recognize ADLs using multimodal input from audio-visual sensors. For this purpose, and as part of Integrated Project Netcarity, far-field microphones and cameras have been installed inside an apartment and used to collect a corpus of ADLs, acted by multiple subjects. The resulting data streams are processed to generate perception-based acoustic features, as well as human location coordinates that are employed as visual features. The extracted features are then presented to Gaussian mixture models for their classification into a set of predefined ADLs. Our experimental results show that both acoustic and visual features are useful in ADL classification, but performance of the latter deteriorates when subject tracking becomes inaccurate. Furthermore, joint audio-visual classification by simple concatenative feature fusion significantly outperforms both unimodal classifiers.
Multimodal Classification of Activities of Daily Living inside Smart Homes
Mana, Nadia;Pianesi, Fabio;Chippendale, Paul Ian;Lanz, Oswald;
2009-01-01
Abstract
Smart homes for the aging population have recently started attracting the attention of the research community. One of the problems of interest is this of monitoring the activities of daily living (ADLs) of the elderly aiming at their protection and well-being. In this work, we present our initial efforts to automatically recognize ADLs using multimodal input from audio-visual sensors. For this purpose, and as part of Integrated Project Netcarity, far-field microphones and cameras have been installed inside an apartment and used to collect a corpus of ADLs, acted by multiple subjects. The resulting data streams are processed to generate perception-based acoustic features, as well as human location coordinates that are employed as visual features. The extracted features are then presented to Gaussian mixture models for their classification into a set of predefined ADLs. Our experimental results show that both acoustic and visual features are useful in ADL classification, but performance of the latter deteriorates when subject tracking becomes inaccurate. Furthermore, joint audio-visual classification by simple concatenative feature fusion significantly outperforms both unimodal classifiers.File | Dimensione | Formato | |
---|---|---|---|
LibRamMan_IWAAL_2009.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
DRM non definito
Dimensione
288.26 kB
Formato
Adobe PDF
|
288.26 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.