Video analysis and understanding is undoubtedly an important research area, whose interest has grown in the last decade, promoting a set of interesting applications, each one characterized by different goals.
In general, when describing a signal, ideally the model should mimic the underlying process that is thought to generate it. The advantage is that the inferences over such model can easily be interpretable, applicable and generalizable to other signals. This is mainly true when the signal is a video sequence representing human activities: in this case, the aim of a good model is to encode as parameters meaningful gestures, words, actions and the like, performing classification and recognition tasks in an intuitive way.
Graphical generative models are discovered to be optimal frameworks in this sense: they offer an elegant mathematical framework to combine the
observations of the activities to be modeled (bottom-up) with complex behavioral priors (top-down), in order to provide expectations about the processes and dealing properly with the uncertainty.
In this talk, I will present a video-analysis hierarchical framework, that exploits through a connection among generative models the most
relevant characteristic of a video sequence, i.e. its representative visual information. Roughly speaking, each module of such a hierarchy deals with a basic visual entity of a video sequence (the pixel, regions of pixels), producing descriptions at different levels of detail. The proposed framework permits to perform clustering and classification of video sequences in an effective fashion.
CSS e script comuni siti DOL - frase 9957