Skip to content
S sufi.my
Back to Projects

Case Study

AI Malware Detection

Published research on neural network-based malware detection using binary data analysis.

PythonNeural NetworksBinary Data AnalysisResearch

Overview

This project was a research effort that resulted in a published book chapter. The goal was to improve malware detection accuracy by applying neural network classification to raw binary executable data — moving beyond traditional signature-based detection methods.

Problem

  • Traditional antivirus tools rely on known signatures, which means they miss zero-day malware and polymorphic variants.
  • Behavioral analysis is effective but resource-intensive and slow to execute at scale.
  • Existing ML approaches often required extensive manual feature engineering, making them difficult to maintain as malware evolves.
  • The research question: can a neural network trained directly on binary data achieve competitive detection accuracy without hand-crafted feature extraction?

Approach

Data Collection and Preprocessing

  • Sourced a dataset of benign and malicious executables from established malware research repositories.
  • Converted raw binary files into fixed-length numerical representations suitable for neural network input.
  • Applied normalization and padding strategies to handle variable file sizes while preserving meaningful binary patterns.
  • Split the dataset into training, validation, and test sets with stratified sampling to ensure balanced class representation.

Model Design

  • Designed a feedforward neural network architecture with multiple hidden layers, batch normalization, and dropout for regularization.
  • Experimented with different activation functions, layer depths, and learning rate schedules to find the optimal configuration.
  • Used binary cross-entropy loss and evaluated with accuracy, precision, recall, and F1-score to get a complete picture of classification performance.

Evaluation

  • Compared the neural network model against baseline classifiers (logistic regression, random forest) to validate that the added complexity was justified.
  • Analyzed confusion matrices to understand where the model struggled — identifying false negative patterns that could inform future improvements.
  • Tested generalization by evaluating on malware families not seen during training.

Results

  • The neural network model achieved strong classification accuracy, outperforming the baseline models on the test set.
  • Precision and recall metrics showed the model was effective at catching malicious samples without excessive false positives.
  • The approach demonstrated that direct binary analysis is a viable alternative to manual feature engineering for malware classification.

Publication

Enhancing AI Malware Detection Using Neural Network with Binary Data Analysis Book chapter, Atlantis Press (2024). DOI: 10.2991/978-94-6463-589-8_7

This work was peer-reviewed and accepted as part of a published proceedings volume. The paper contributed a practical demonstration that binary-level features, without hand-crafted feature engineering, can achieve competitive performance for malware classification — with implications for building more adaptive detection systems.

BibTeX:

@inbook{sufi2024malware,
  title     = {Enhancing AI Malware Detection Using Neural Network
               with Binary Data Analysis},
  booktitle = {Proceedings of Atlantis Press},
  year      = {2024},
  doi       = {10.2991/978-94-6463-589-8_7},
  url       = {https://doi.org/10.2991/978-94-6463-589-8_7}
}

Lessons Learned

  • Research requires a different kind of rigor than production engineering — every claim must be backed by data, every comparison must be fair, and every limitation must be acknowledged.
  • Data preprocessing decisions (how you represent the binary data) had a larger impact on final accuracy than model architecture choices.
  • Writing for publication taught me to communicate complex technical work clearly and concisely — a skill that transfers directly to engineering documentation and technical writing.
  • The experience solidified my understanding of how to evaluate trade-offs systematically, which I now apply to production system design decisions.