Simultaneous and Single Gene Expression: Computational Analysis for Malaria Treatment Discovery
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
The major aim of this work is to develop an efficient and effective k-means algorithm to
cluster malaria microarray data to enable the extraction of a functional relationship of
genes for malaria treatment discovery. However, traditional k-means and most k-means
variants are still computationally expensive for large datasets such as microarray data,
which have large datasets with a large dimension size d. Huge data is generated and
biologists have the challenge of extracting useful information from volumes of microarray
data. Firstly, in this work, we develop a novel k-means algorithm, which is simple but
more efficient than the traditional k-means and the recent enhanced k-means. Using our
method, the new k-means algorithm is able to save significant computation time at each
iteration and thus arrive at an O(nk2) expected run time. Our new algorithm is based on the
recently established relationship between principal component analysis and the k-means
clustering. We further prove that our algorithm is correct theoretically. Results obtained
from testing the algorithm on three biological data and three non-biological data also
indicate that our algorithm is empirically faster than other known k-means algorithms. We
assessed the quality of our algorithm clusters against the clusters of known structure using
the Hubert-Arabie Adjusted Rand index (ARIHA), we found that when k is close to d, the
quality is good (ARIHA > 0.8) and when k is not close to d, the quality of our new k-means
algorithm is excellent (ARIHA > 0.9). We compare three different k-means algorithms
including our novel Metric Matrics k-means (MMk-means), results from an in-vitro
microarray data with the classification from an in-vivo microarray data in order to perform
a comparative functional classification of P. falciparum genes and further validate the
effectiveness of our MMk-means algorithm. Results from this study indicate that the
resulting distribution of the comparison of the three algorithms’ in- vitro clusters against
the in-vivo clusters is similar, thereby authenticating our MMk-means method and its
effectiveness. Lastly using clustering, R programming (with Wilcoxon statistical test on
this platform) and the new microarray data of P. yoelli at the liver stage and the P.
falciparum microarray data at the blood stages, we extracted twenty nine (29) viable P.
falciparum and P. yoelli genes that can be used for designing a Polymerase Chain
Reaction (PCR) primer experiment for the detection of malaria at the liver stage. Due to
the intellectual property right, we are unable to list these genes here.
Keywords
QA Mathematics, QA75 Electronic computers. Computer science