Show simple item record

dc.contributor.advisorJoshi, Manjunath V.
dc.contributor.authorShah, Jeni Snehal
dc.date.accessioned2019-03-19T09:30:51Z
dc.date.available2019-03-19T09:30:51Z
dc.date.issued2018
dc.identifier.citationShah, Jeni Snehal (2018). Imbalanced Bioassay Data Classification for Drug Discovery. Dhirubhai Ambani Institute of Information and Communication Technology, ix, 47 p. (Acc. No: T00699)
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/733
dc.description.abstractAll the methods developed for pattern recognition will show inferior performance if the dataset presented to it is imbalanced, i.e. if the samples belonging to one class are much more in number compared to the samples from the other class/es. Due to this, imbalanced dataset classification has been an active area of research in machine learning. In this thesis, a novel approach to classifying imbalanced bioassay data is presented. Bioassay data classification is an important task in drug discovery. Bioassay data consists of feature descriptors of various compounds and the corresponding label which denotes its potency as a drug: active or inactive. This data is highly imbalanced, with the percentage of active compounds ranging from 0.1% to 1.4%, leading to inaccuracies in classification for the minority class. An approach for classification in which separate models are trained by using different features derived by training stacked autoencoders (SAE) is proposed. After learning the features using SAEs, feed-forward neural networks (FNN) are used for classification, which are trained to minimize a class sensitive cost function. Before learning the features, data cleaning is performed using Synthetic Minority Oversampling Technique (SMOTE) and removing Tomek links. Different levels of features can be obtained using SAE. While some active samples may not be correctly classified by a trained network on a certain feature space, it is assumed that it can be classified correctly in another feature space. This is the underlying assumption behind learning hierarchical feature vectors and learning separate classifiers for each feature space. vi
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectPattern recognition
dc.subjectDeep learning
dc.subjectMachine learning
dc.classification.ddc005.74 SHA
dc.titleImbalanced bioassay data classification for drug discovery
dc.typeDissertation
dc.degreeM. Tech
dc.student.id201611003
dc.accession.numberT00699


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record