Model Compression

This page showcases my research in model compression, focusing on efficient neural network architectures and optimization techniques. My work spans from decentralized signal classification to advanced pruning methods and efficient vision transformers.

Research Overview

Model compression is crucial for deploying deep learning models on resource-constrained devices. My research focuses on three key areas:

Decentralized Classification: Using early exit mechanisms for efficient distributed inference
Structural Pruning: Advanced pruning techniques for optimal brain connections
Efficient Vision Transformers: Token dropping and optimization for vision models

Publications

1. DCentNet: Decentralized multistage biomedical signal classification using early exits

Authors: Xiaolin Li, Binhua Huang, Barry Cardiff, Deepu John
Journal: Biomedical Signal Processing and Control, Volume 104, 2025
DOI: 10.1016/j.bspc.2024.107468

Abstract: This work presents DCentNet, a novel approach for decentralized biomedical signal classification using early exit mechanisms. The method enables efficient distributed inference across multiple devices while maintaining high accuracy.

Key Contributions:

Decentralized multistage classification framework
Early exit mechanisms for computational efficiency
Real-time biomedical signal processing
Embedded system deployment

Code Repository: DCentNet GitHub Repository

2. Optimal Brain Connection: Towards Efficient Structural Pruning

Authors: S. Chen, W. Ma, Binhua Huang, Q. Wang, G. Wang, W. Sun, L. Huang, D. John
Journal: arXiv preprint, 2025
arXiv: 2508.05521

Abstract: This paper introduces a novel structural pruning method that identifies optimal brain connections in neural networks. The approach achieves significant model compression while preserving performance.

Key Contributions:

Advanced structural pruning techniques
Optimal connection identification
Performance-preserving compression
Comprehensive evaluation on multiple datasets

3. TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers

Authors: Guoxin Wang, Qingyuan Wang, Binhua Huang, Shaowu Chen, Deepu John
Journal: arXiv preprint, 2025
arXiv: 2509.03379

Abstract: TinyDrop presents a training-free token dropping framework for Vision Transformers, guided by lightweight vision models. The method reduces computational costs while maintaining accuracy.

Key Contributions:

Training-free token dropping framework
Lightweight guidance model integration
Significant FLOPs reduction (up to 80%)
Plug-and-play compatibility with diverse ViT architectures

Implementation Details

DCentNet Implementation

The DCentNet repository contains comprehensive implementation including:

Experimental Code

Inference and Sender (BLE): Deep sleep mode, inference mode, connected mode, and broadcast mode implementations
Receiver (BLE): Connected and broadcast mode receivers
Embedded System Deployment: Arduino IDE integration with trained models

Training Code

Edge Impulse Integration: Simple training notebook with API key support
Customizable Networks: Flexible architecture for different use cases
Multiple Model Variants: 2-5 layer CNN architectures

Deployment Guide

Launch Arduino IDE (Version 2.3.3)
Add trained model library
Install ArduinoBLE library (Version 1.3.7)
Include inference file in your code

Model Variants Available

EEPS-2LayerCNN.zip
EEPS-3LayerCNN.zip
EEPS-4LayerCNN.zip
EEPS-5LayerCNN.zip
2Layerswith96%Acc.zip

Research Impact

These works contribute to the field of model compression through:

Practical Deployment: DCentNet provides real-world embedded system solutions
Theoretical Advances: Optimal Brain Connection offers novel pruning methodologies
Efficiency Gains: TinyDrop demonstrates significant computational savings

Future Directions

Integration of early exit mechanisms with advanced pruning techniques
Cross-domain application of token dropping methods
Real-time optimization for edge computing scenarios

This page summarizes my contributions to model compression research, spanning from practical embedded implementations to theoretical advances in neural network optimization.