@daiict.ac.in
Assistant Professor
DAIICT
I am currently working as an Assistant Professor at Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar. I have been a Post-Doctoral Fellow with Image Processing and Computer Vision (IPCV) Lab, Department of Electrical Engineering, IIT Madras under the supervision of Prof. A. N. Rajagoplan. I have recieved my PhD from School of Computing and Electrical Engineering, IIT Mandi under the supervision of Dr. Anil K. Sao. Title of my PhD thesis is "Novel Approaches for Super Resolution of Intensity/Range Image using Sparse Representation".
Ph.D., B.Tech
Image Processing, Computer Vision, Machine Learning
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Shradha Makhija, Srimanta Mandal, Utkarsh Pandya, Sanid Chirakkal, and Deepak Putrevu
Springer Nature Switzerland
Manali Bhavsar and Srimanta Mandal
Springer Nature Switzerland
Rushi Vachhani, Srimanta Mandal, and Bakul Gohel
IEEE
Face recognition systems have been significantly advanced in recent times by the emergence and advancement of deep learning approaches. However, low-resolution face recognition is still a challenging task, especially when the images are snapped in an unconstrained condition. Such conditions are common in surveillance applications, where the snapped facial images often contain blur, non-uniform illumination, non-frontal pose, and occlusions. Further, the resolution of the images is often restricted due to different constraints. In this paper, we propose a network architecture consisting of multi-stream CNN based on the Siamese framework for low-resolution face recognition. Multiple streams of the CNN extracts complementary discriminative information from face images suitable for recognition purpose. The proposed architecture is evaluated on SCface, LFW, and YouTube face datasets.
Akash Dhedhi, Srimanta Mandal, and Rajib Lochan Das
IEEE
The availability of dehazing datasets has enabled various deep learning techniques to perform effectively on hazy images. Most of the developed frameworks focus on removing homogeneous haze. However, homogeneous-centric methods produce sub-optimal results on non-homogeneous haze. The primary reason is that the architectures devised to handle homogeneous haze fail to address the non-uniformity of haze in non-homogeneous case. The secondary reason is the unavailability of enough data for the non-homogeneous scenario. Although many works cite the lack of data as a primary concern for poor performance, we find that the results are sub-standard even if the homogeneous-centric networks are trained with non-homogeneous data. Hence, there is a requirement for a network architecture that can handle non-homogeneous haze in a better way. In this work, we propose to use multiple attention mechanisms in parallel along with pre-trained ConvNeXt blocks. Specifically, we use pixel, channel, and residual channel attention mechanisms. Pixel attention can complement channel attention in dealing with space-variant haze when connected in parallel. On the other hand, residual channel attention fetches hazy image-related features and caters to better information flow toward the output. The proposed method by concatenating the attention-based features yields better results than the existing approaches.
Keval Rajyaguru, Srimanta Mandal, and Suman K. Mitra
IEEE
Facial expression transfer is an important problem in the computer vision community due to its applicability in human-computer interaction, facial animation, etc. Most of the existing approaches mainly deal with images. However, expression transfer to videos is yet to be explored. The extension of existing image-based approaches to videos often leads to flickering artifacts in the result. This is due to the temporal inconsistency that may have been produced due to the independent processing of frames. In this paper, we address this problem by proposing a combination of loss functions in association with a deep learning-based method. Image reconstruction loss takes care of spatial domain consistency. The flow and warp losses address the temporal consistency. By maintaining consistency spatially as well as temporally, our method produces visually plausible results. The efficacy of our method is further explored in different problems like video dehazing, and video colorization.
Urvi Oza, Arpit Pipara, Srimanta Mandal, and Pankaj Kumar
IEEE
Image colorization is a technique to add color values to the grayscale image by learning the mapping between input intensity and probable chrominance values. This paper proposes an end-to-end image colorization architecture based on an ensemble of deep convolutional neural networks (DCNN). Ensemble DCNN architecture for image colorization is our novel contribution. The architecture takes inspiration from the encoder-decoder design. The encoder comprises various pre-trained DCNN models, and the decoder consists of a series of convolution and up-sampling layers. The decoder enables the merging and propagation of multi-level features from DCNN models. Further, we explore different fusion strategies to combine multi-level features from DCNNs as part of an ensemble encoder. We have experimented with DIV2k and CIFAR10 datasets. The performance of our proposed approach is evaluated in terms of subjective and reference-based image quality assessment metrics, which shows that our results are pretty competitive compared to the existing approaches.
Shivangi Gajjar, Avik Hati, Shruti Bhilare, and Srimanta Mandal
IEEE
Deep neural network (DNN) models have gained popularity for most image classification problems. However, DNNs also have numerous vulnerable areas. These vulnerabilities can be exploited by an adversary to execute a successful adversarial attack, which is an algorithm to generate perturbed inputs that can fool a well-trained DNN. Among various existing adversarial attacks, DeepFool, a white-box untargeted attack is considered as one of the most reliable algorithms to compute adversarial perturbations. However, in some scenarios such as person recognition, adversary might want to carry out a targeted attack such that the input gets misclassified in a specific target class. Moreover, studies show that defense against a targeted attack is tougher than an untargeted one. Hence, generating a targeted adversarial example is desirable from an attacker’s perspective. In this paper, we propose ‘Targeted DeepFool’, which is based on computing a minimal amount of perturbation required to reach the target hyperplane. The proposed algorithm produces minimal amount of distortion for conventional image datasets: MNIST and CIFAR10. Further, Targeted DeepFool shows excellent performance in terms of adversarial success rate.
Meet Shah, Srimanta Mandal, Shruti Bhilare, and Avik Hati
IEEE
Despite high prediction accuracy, deep networks are vulnerable to adversarial attacks, designed by inducing human-indiscernible perturbations to clean images. Hence, adversarial samples can mislead already trained deep networks. The process of generating adversarial examples can assist us in investigating the robustness of different models. Many developed adversarial attacks often fail under challenging black-box settings. Hence, it is required to improve transferability of adversarial attacks to an unknown model. In this aspect, we propose to increase the rate of transferability by inducing linearity in a few intermediate layers of architecture. The proposed design does not disturb the original architecture much. The design focuses on significance of intermediate layers in generating feature maps suitable for a task. By analyzing the intermediate feature maps of architecture, a particular layer can be more perturbed to improve the transferability. The performance is further enhanced by considering diverse input patterns. Experimental results demonstrate the success in increasing the transferability of our proposition.
Manan Gajjar and Srimanta Mandal
Springer International Publishing
Nilam Chaudhari, Suman K. Mitra, Srimanta Mandal, Sanid Chirakkal, Deepak Putrevu, and Arundhati Misra
Springer International Publishing
Nilam Chaudhari, S. Mitra, Srimanta Mandal, S. Chirakkal, D. Putrevu and A. Misra
Purva Mhasakar, Prapti Trivedi, Srimanta Mandal, and Suman K. Mitra
Springer Science and Business Media LLC
Nilam Chaudhari, Suman K. Mitra, Sanid Chirakkal, Srimanta Mandal, Deepak Putrevu, and Arundhati Misra
SPIE-Intl Soc Optical Eng
Abstract. Discrimination of crop varieties spanned over heterogeneous agriculture land is a vital application of polarimetric SAR images for agriculture monitoring and assessment. The covariance matrix of polarimetric SAR images is observed to follow a complex Wishart distribution for major classification tasks. It is true for homogeneous regions, but for heterogeneous regions, the covariance matrix follows a mixture of multiple Wishart distributions. We aim to improve the classification accuracy when the terrain under observation is heterogeneous. For this purpose, Wishart mixture model is employed along with expectation-maximization (EM) algorithm for parameter estimation. Elbow method helps us to devise the number of mixtures. The convergence of the EM algorithm depends on the choice of initial points. So, to improve the robustness of the model, different initialization approaches, such as random, K-means, and global K-means, are embedded in the EM algorithm. Further, the degrees of freedom is one of the crucial parameters of Wishart distribution. Therefore, the impact of different degrees of freedom is analyzed on classification accuracy. The method that is equipped with initialization technique along with optimum degrees of freedom is assessed using three full polarimetric SAR data sets of agriculture lands. The first two are benchmark data sets of Flevoland, Netherlands, region acquired by AIRSAR sensor, and third is our study area of Mysore, India, acquired by RADARSAT-2 sensor.
Pious Pradhan, Alokendu Mazumder, Srimanta Mandal, and Badri N Subudhi
IEEE
Underwater image enhancement has been considered as one of the prime research areas due to its massive significance in underwater surveillance and the development of underwater autonomous robotics. Deep learning methods have been used for image processing, where heavy models like GANs and very deep CNNs are being deployed for the task. Due to the bulky nature of the models, they consume significant memory and are numerically expensive in computational tasks, making them inefficient to some degree in underwater exploration tasks. These models are primarily trained over synthetically generated data which makes them less correlative for real-world tasks. This paper proposes a deep network architecture that uses a series of convolutional blocks to fuse significant complementary features of two separate enhanced versions of the input image along with the input one. Further, a combination of perceptual and structural similarity losses is used to find out the error. We have also benchmarked our model on three underwater datasets, highlighting the generalizing capabilities over a mix of real-world and synthetic data.
Arpit Pipara, Urvi Oza, and Srimanta Mandal
IEEE
Underwater image color correction has been gaining traction due to its usage in marine biology and surveillance. Color corrected images also help marine archaeologists in locating objects. The underwater image suffers from various degradation with respect to the depth at which the image is taken. In this paper, we propose an alternate path to correct the color of the underwater images. We address the problem of underwater image color correction as a colorization task. For this purpose, we propose a deep learning architecture that comprises of an ensemble encoder and a decoder. The ensemble encoder part uses pre-trained networks to extract multi-level features. These features are then fused together and are used up by the decoder to generate the color corrected output. We evaluate the performance of our model using reference-based as well as no reference-based metrics. The metrics indicate that the produced results are inline with the human perceptual system.
Purva Mhasakar, Srimanta Mandal, and Suman K. Mitra
Springer Singapore
Parita Chavda, Srimanta Mandal, and Suman K. Mitra
Springer Singapore
Kuldeep Purohit, Srimanta Mandal, and A.N. Rajagopalan
Elsevier BV
Abstract Efficiency of gradient propagation in intermediate layers of convolutional neural networks is of key importance for super-resolution task. To this end, we propose a deep architecture for single image super-resolution (SISR), which is built using efficient convolutional units we refer to as mixed-dense connection blocks (MDCB). The design of MDCB combines the strengths of both residual and dense connection strategies, while overcoming their limitations. To enable super-resolution for multiple factors, we propose a scale-recurrent framework which reutilizes the filters learnt for lower scale factors recursively for higher factors. This leads to improved performance and promotes parametric efficiency for higher factors. We train two versions of our network to enhance complementary image qualities using different loss configurations. We further employ our network for video super-resolution task, where our network learns to aggregate information from multiple frames and maintain spatio-temporal consistency. The proposed networks lead to qualitative and quantitative improvements over state-of-the-art techniques on image and video super-resolution benchmarks.
Bhavya Shah, Krutarth Bhatt, Srimanta Mandal, and Suman K. Mitra
Springer International Publishing
Facial emotion recognition plays an important role in day-to-day activities. To address this, we propose a novel encoder/decoder network namely EmotionCaps, which models the facial images using matrix capsules, where hierarchical pose relationships between facial parts are built into internal representations. An optimal number of capsules and their dimension is chosen, as these hyper-parameters in the network play an important role to capture the complex facial pose relationship. Further, the batch normalization layer is introduced to expedite the convergence. To show the effectiveness of our network, EmotionCaps is evaluated for seven basic emotions in a wide range of head orientations. Additionally, our method is able to analyze facial images even in the presence of noise and blur quite accurately.
Srimanta Mandal and A. N. Rajagopalan
Institute of Electrical and Electronics Engineers (IEEE)
Atmospheric medium often constrains the visibility of outdoor scenes due to scattering of light rays. This causes attenuation in the irradiance reaching the imaging device along with an additive component to render a hazy effect in the image. The visibility is further reduced for poorly illuminated scenes. The attenuation becomes wavelength dependent in underwater scenario, causing undesired color cast along with hazy effect. In order to suppress the effect of different atmospheric/underwater conditions such as haze and to enhance the contrast of such images, we reformulate local haziness in a generalized manner. The parameters are estimated by harnessing the similarity of patches within a local neighborhood. Unlike existing methods, our approach is developed based on the assumption that for outdoor scenes, the depth of patches changes gradually in a local neighborhood surrounding the patch. This change in depth can be approximated by patch similarity in that neighborhood. As the attenuation in irradiance of an image in presence of atmospheric medium relies on the depth of the scene, the coefficients related to the attenuation are estimated from the weights of patch similarity. The additive haze effect is deduced using non-local mean of the patch. Our experimental results demonstrate the effectiveness of our approach in reducing the haze component as well as in enhancing the image under different conditions of haze (daytime, nighttime, and underwater).
Kuldeep Purohit, Srimanta Mandal, and A. N. Rajagopalan
The Optical Society
Attenuation and scattering of light are responsible for haziness in images of underwater scenes. To reduce this effect, we propose an approach for single-image dehazing by multilevel weighted enhancement of the image. The underlying principle is that enhancement at different levels of detail can undo the degradation caused by underwater haze. The depth information is captured implicitly while going through different levels of details due to the depth-variant nature of haze. Hence, we judiciously assign weights to different levels of image details and reveal that their linear combination along with the coarsest information can successfully restore the image. Results demonstrate the efficacy of our approach as compared to state-of-the-art underwater dehazing methods.
Kuldeep Purohit, Srimanta Mandal, and A. N. Rajagopalan
Springer International Publishing
Recent advances in the design of convolutional neural network (CNN) have yielded significant improvements in the performance of image super-resolution (SR). The boost in performance can be attributed to the presence of residual or dense connections within the intermediate layers of these networks. The efficient combination of such connections can reduce the number of parameters drastically while maintaining the restoration quality. In this paper, we propose a scale recurrent SR architecture built upon units containing series of dense connections within a residual block (Residual Dense Blocks (RDBs)) that allow extraction of abundant local features from the image. Our scale recurrent design delivers competitive performance for higher scale factors while being parametrically more efficient as compared to current state-of-the-art approaches. To further improve the performance of our network, we employ multiple residual connections in intermediate layers (referred to as Multi-Residual Dense Blocks), which improves gradient propagation in existing layers. Recent works have discovered that conventional loss functions can guide a network to produce results which have high PSNRs but are perceptually inferior. We mitigate this issue by utilizing a Generative Adversarial Network (GAN) based framework and deep feature (VGG) losses to train our network. We experimentally demonstrate that different weighted combinations of the VGG loss and the adversarial loss enable our network outputs to traverse along the perception-distortion curve. The proposed networks perform favorably against existing methods, both perceptually and objectively (PSNR-based) with fewer parameters.
Srimanta Mandal, Kuldeep Purohit, and A. N. Rajagopalan
ACM
In practice, images can contain different amounts of noise for different color channels, which is not acknowledged by existing super-resolution approaches. In this paper, we propose to super-resolve noisy color images by considering the color channels jointly. Noise statistics are blindly estimated from the input low-resolution image and are used to assign different weights to different color channels in the data cost. Implicit low-rank structure of visual data is enforced via nuclear norm minimization in association with adaptive weights, which is added as a regularization term to the cost. Additionally, multi-scale details of the image are added to the model through another regularization term that involves projection onto PCA basis, which is constructed using similar patches extracted across different scales of the input image. The results demonstrate the super-resolving capability of the approach in real scenarios.
Srimanta Mandal and A. N. Rajagopalan
Springer Singapore
Super-resolving a noisy image is a challenging problem, and needs special care as compared to the conventional super resolution approaches, when the power of noise is unknown. In this scenario, we propose an approach to super-resolve single noisy image by minimizing nuclear norm in a virtual sparse domain that tunes with the power of noise via parameter learning. The approach minimizes nuclear norm to explore the inherent low-rank structure of visual data, and is further augmented with coarse-to-fine information by adaptively re-aligning the data along the principal components of a dictionary in virtual sparse domain. The experimental results demonstrate the robustness of our approach across different powers of noise.
Seema Kumari, Srimanta Mandal, and Arnav Bhavsar
Springer Singapore
For the image denoising task, the prior information obtained from grouping similar non-local patches has been shown to serve as an effective regularizer. Nevertheless, noise may create ambiguity in grouping similar patches, hence it may degrade the results. However, most of the non-local similarity based approaches do not take care of the issue of noisy grouping. Hence, we propose to denoise an image by mitigating the issue of grouping non-local similar patches in presence of noise in transform domain using sparsity and edge preserving constraints. The effectiveness of the transform domain grouping of patches is utilized for learning dictionaries, and is further extended for achieving an initial approximation of sparse coefficient vector for the clean image patches. We have demonstrated the results of effective grouping of similar patches in denoising intensity as well as range images.