GVDeepNet: Unsupervised Deep Learning Techniques for Effective Genetic Variant Classification

  • Ghulam Muhammad Department of Computer Science, Bahria University, Karachi, Pakistan
  • Umair Saeed Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
  • Noman Islam College of Computing and Information Sciences, PAF KIET, Karachi, Pakistan
  • Kamlesh Kumar Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
  • Fahad Hussain Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
  • Mansoor Ahmed Khurro Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
  • Aftab Ahmed Shaikh Department of Computer Science, Sindh Madressatul Islam University, Karachi, Pakistan
  • Iqra Ali Department of Computer System Engineering, NED University of Engineering and Technology, Karachi, Pakistan
Keywords: Genetic Variant, DNA nucleotide sequence, Classification, Deep Learning, Autoencoder, Self-Organizing Map

Abstract

Many lives have been lost due to genetic diseases and the inbility to identify them. The genetic disorder is mainly because of the alteration in the common DNA nucleotide sequence, where benign and pathogenic are the common examples of these genetic variants. Deliberate changes for the gene mutation may cause unexpected results; at times, required results occur though it is highly likely to get unexpected outcomes too. In this paper, two unsupervised deep learning classification methods have been proposed to classify these genetic changes. For this work, Self-Organizing Map (SOM) and Autoencoder models have been used. SOM is an unsupervised learning technique used to obtain a low dimensional representation of the data. The SOM has been implemented using MiniSOM library. Autoencoder comprises an encoder and decoder component. The information encoded by encoder is decoded using the decoder component to obtain as close representation to the input as possible. The analysis were performed on ClinVar dataset comprising 6 lac records. The dataset is publicly available. The data was first subjected to pre-processing to handle missing and duplicate values. The result showed the good performance of Autoencoder, where its accuracy is 97% (on Test Data), and SOM has an accuracy of 96% (on Test Data). It has been concluded that unsupervised deep learning models, SOM and Autoencoder, retain enough prediction power to classify and identify if the underline alternation in the gene gives positives changes or the contrary

Published
2022-03-09
How to Cite
[1]
G. Muhammad, “GVDeepNet: Unsupervised Deep Learning Techniques for Effective Genetic Variant Classification”, PakJET, vol. 5, no. 1, pp. 16-22, Mar. 2022.