Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN  address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling . These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.