@uaeu.ac.ae
United Arab Emirates University
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Mubarak Albarka Umar, Najah AbuAli, Khaled Shuaib, and Ali Ismail Awad
Elsevier BV
Mubarak Albarka Umar, Zhanfang Chen, Khaled Shuaib, and Yan Liu
Elsevier BV
Mubarak Albarka Umar and Khaled Shuaib
IEEE
The advancement of smart grids has addressed many challenges of traditional power grids, yet it has also introduced new vulnerabilities to cyber-attacks that can disrupt power, leading to severe socio-economic impacts like blackouts and grid disturbance. While numerous supervised machine learning methods have been proposed to detect cyber-attacks in smart grids, they require a large dataset of normal and attack instances for training. However, gathering sufficient samples of diverse attack scenarios, especially zero-day attacks, is challenging. In this paper, we develop a semi-supervised model to detect grid attacks using phasor measurement unit (PMU) data from a power system dataset. Principal component analysis (PCA) is applied to select optimal components, and the model is trained using high instances of normal event data, thus enabling it to identify new, unknown attack patterns. We also developed a supervised model for comparison, evaluating both using key metrics. Results demonstrate that the semi-supervised model is more effective in detecting attack events (with 91.2% precision and 90% accuracy) than the supervised approach (90.7% precision and 91.8% accuracy).
Mubarak Albarka Umar
IEEE
Cloud computing and virtualization are fundamental to modern computer system design. As cloud computing adoption grows across organizations, evaluating its performance becomes essential. This study simulates and analyzes a cloud datacenter’s performance using queuing models. Specifically, an Mt/M/1/K queuing system is employed, with arrival parameters estimated from real CPU utilization data from the Bitbrains datacenter, modeled as discrete Homogeneous Poisson Processes (HPPs). The simulation results of the modeled system reveal minimal average waiting time and efficient task processing with low delays. Additionally, the study highlights the significant impact of service rates on the average response times of tasks arriving as discrete HPPs. These findings offer valuable insights into cloud datacenter performance, aiding in informed decisions for service upgrades and optimal resource utilization.
Ali Nawaz, Mubarak Albarka Umar, Khaled Shuaib, Amir Ahmad, and Abdelkader Nasreddine Belkacem
IEEE
With a couple of million lives lost annually, cardiovascular disease (CVD) is the leading cause of death globally; about 80% of which are due to arrhythmia. Electrocardiogram (ECG) signals are important for arrhythmia diagnosis, researchers have used various ECG datasets in building arrhythmia detection systems to automate the manual time-consuming diagnostic process. However, existing datasets have class imbalance issues, and the traditional oversampling and undersampling techniques prove ineffective in handling the imbalance problem. We propose a novel approach to handling arrhythmia detection as an anomaly case to address this. In our proposed approach, we first use Generative Adversarial Networks (GANs) to synthetically generate normal training instances from the MIT-BIH arrhythmia dataset and then we use only the synthetically generated normal data to build the anomaly model using autoencoder (AE); employing the AE for unsupervised anomaly detection help in overcoming the GAN convergence issues. We evaluate the model using test data comprising both normal and abnormal samples that are not used by the GAN and compare its performance with other state-of-the-art works. The model achieved improved arrhythmia detection with an AUC-ROC of 0.6768 and an AUC-PR of 0.8537. While effectively tackling data scarcity and imbalance, this work also contributes valuable perspectives to enhance arrhythmia detection systems, providing a foundation for more reliable and adaptable solutions in healthcare.
Mubarak Albarka Umar, Ali Nawaz, and Tariq Qayyum
IEEE
Over 10 Million deaths in the world are because of cancer. Cancer is the second leading cause of death after cardiovascular disease. Additionally, Cancer has significant effects on the socioeconomic status of a family. There are several studies about socioeconomic status and cancer. This work firstly focuses on exploring the relationships between socioeconomic status and cancer mortality rate from disparate open-source data using statistical analysis. Initially, the data consists of 34 features which are reduced to 13 most relevant features using the backward selection method. Secondly, based on the cancer data, we build an appropriate model that can predict the cancer death rate. Specifically, a linear regression model is built and trained for cancer mortality rate prediction. Several models were first built and linear regression diagnostics are performed on the models to check for any assumption violations, finally, the most appropriate model is selected and fine-tuned to provide optimized results. The model is assessed and R2 and RMSE are used to evaluate the model's performance, the model achieved an R2 of 81.12% and RMSE score of 12.23 on test data. Our work also highlights the importance of checking regression assumptions in linear regression modeling.
Muhammad Danish Waseem, Ali Nawaz, Uzair Rasheed, Abir Raza, and Mubarak Omar Albarka
IEEE
Dengue is a viral disease, spread by the mosquito species Aedes aegypti. According to WHO, every year 100-400 million cases of dengue infection are reported worldwide. Dengue mosquito inhibits in tropical regions and proliferates in wet climate conditions. Since it is impossible to clean those regions from the mosquito completely, therefore an analysis of the relationship between different climatic factors and dengue spread is important to forecast the number of cases ahead so that precautionary measures can be taken beforehand to minimize the disease spread. Specifically, to predict the spread we employed two prominent time series models i.e. SARIMA and SARIMAX on the publicly available DengAI dataset. The performance of the models is evaluated by using Mean Absolute Error (MAE), achieving MAE scores of 27.39 and 25.52 on SARIMA and SARIMAX respectively, which reveals that our proposed methodology outperformed other existing machine learning methods.
Mubarak Albarka Umar, Chen Zhanfang, and Yan Liu
ACM
One of the key challenges of the machine learning (ML) based intrusion detection system (IDS) is the expensive computation time which is largely caused by the redundant, incomplete, and unrelated features contain in the IDS datasets. To overcome such challenges and ensure building efficient and more accurate IDS models, many researchers utilize preprocessing techniques such as normalization and feature selection, and a hybrid modeling approach is typically used. In this work, we propose a hybrid IDS modeling approach with an algorithm for feature selection (FS) and another for building the IDS. The FS method is a wrapper-based FS with a decision tree as the feature evaluator. Five selected ML algorithms are individually used in combination with the proposed FS method to build five IDS models using the UNSW-NB15 dataset. As a baseline, five more IDS models are built, in a single modeling approach, using the full features of the datasets. We evaluate the effectiveness of our proposed method by comparing it with the baseline models and also with state-of-the-art works. Our method achieves the best DR of 97.95% and proved to be quite effective in comparison to state-of-the-art works. We, therefore, recommend its usage especially in IDS modeling with the UNSW-NB15 dataset.
Le Cui, Libo Cheng, Xiaoming Jiang, Zhanfang Chen, and Albarka
IOS Press