"Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing" S Surya*, R K Sarvadevabhatla* and V Babu R. (*equal contribution) AAAI Conference on Artificial Intelligence, 2018 (AAAI '17, poster) .
Project PagePDF -
"Switching Convolutional Networks for Crowd Counting" S Surya*, D B Sam* and V Babu R. (*equal contribution) 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '17, poster).
Project PagePDF -
"SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition" S Surya*, R K Sarvadevabhatla*, SSS Kruthiventi, V Babu R. (*equal contribution) Proceedings of the 2016 ACM International Conference on Multimedia (ACMMM 16, poster).
pdfarXivbibtexcodeposter
"TraCount: A Deep Convolutional Neural Network for HighlyOverlapping Vehicle Counting" S Surya, and V Babu R.
Proceedings of the 2016 Indian Conference on Computer Vision, Graphics and Image Processing, oral presentation (ICVGIP 2016).
bibtex
"Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing" R K Sarvadevabhatla*, S Surya*, Trisha Mittal, V Babu R. (*equal contribution) AAAI Conference on Artificial Intelligence, 2018 (AAAI, poster).
Abstract:
The ability of machine-based agents to play games in human-like fashion is considered a benchmark of progress in AI. In this paper, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. To mimic Pictionary-style guessing, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.
"TraCount: A Deep Convolutional Neural Network for HighlyOverlapping Vehicle Counting" S Surya, and V Babu R.
Proceedings of the 2016 Indian Conference on Computer Vision, Graphics and Image Processing,oral presentation (ICVGIP 2016).
Abstract:We propose a novel deep framework, TraCount, for highlyoverlapping vehicle counting in congested traffic scenes. Tra-Count uses multiple fully convolutional(FC) sub-networks topredict the density map for a given static image of a traf-fic scene. The different FC sub-networks provide a range insize of receptive fields that enable us to count vehicles whoseperspective effect varies significantly in a scene due to thelarge visual field of surveillance cameras. The predictions ofdifferent FC sub-networks are fused by weighted averagingto obtain a final density map.We show that TraCount outperforms the state of the artmethods on the challenging TRANCOS dataset that has atotal of 46796 vehicles annotated across 1244 images.
"SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition" R K Sarvadevabhatla*, S Surya*, SSS Kruthiventi, V Babu R. (*equal contribution) Proceedings of the 2016 ACM International Conference on Multimedia (ACMMM, poster).
Abstract:
Current state of the art object recognition architectures achieve
impressive performance but are typically specialized for a
single depictive style (e.g. photos only, sketches only). In
this paper, we present SwiDeN: our Convolutional Neural
Network (CNN) architecture which recognizes objects re-
gardless of how they are visually depicted (line drawing, re-
alistic shaded drawing, photograph etc.). In SwiDeN, we
utilize a novel `deep' depictive style-based switching mech-
anism which appropriately addresses the depiction-speci c
and depiction-invariant aspects of the problem. We compare
SwiDeN with alternative architectures and prior work on a
50-category Photo-Art dataset containing objects depicted
in multiple styles. Experimental results show that SwiDeN
outperforms other approaches for the depiction-invariant ob-
ject recognition problem.
"Switching Convolutional Networks for Crowd Counting" S Surya*, D B Sam*, V Babu R. (*equal contribution) 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '17, poster).
Abstract:
We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence on par or better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.