Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Accelerating Embedded Neural Network Inference on FPGA Softcore Processors
Date
Sep 2024 - May 2025
Tools & Technologies
Artix-7 FPGA, Xilinx MicroBlaze, Vivado, Vitis IDE, C/C++, VHDL
In this project, I tackled one of the core challenges of embedded machine learning: achieving real-time performance without the bottlenecks of external communication overhead.
While embedded systems often rely on off-chip accelerators to speed up inference, the latency introduced by data transfers across system boundaries can be crippling for applications that demand immediacy, such as robotics, wearable health devices, or autonomous sensing platforms.
To address this, I designed a fully on-chip hardware-software co-design framework using an FPGA-based softcore processor (MicroBlaze) and custom-built VHDL acceleration modules for neural network operations. By tightly integrating the accelerator within the FPGA fabric and eliminating external communication overhead, I created a pipeline that allowed inference tasks to remain entirely on-chip.
One of the aspects I am most proud of in this work is the end-to-end custom acceleration pipeline I implemented. Rather than relying on pre-existing IP cores or black-box accelerators, I built VHDL-based modules tailored to neural network primitives, ensuring both flexibility and efficiency. This gave me hands-on experience in hardware description languages while deepening my understanding of how low-level design decisions ripple up to impact system-level performance.
The experimental results were striking: my design achieved a 420× speedup over baseline softcore execution, showcasing how carefully engineered hardware-software co-design can radically transform performance in constrained embedded environments, earning me the Technical Excellence Award in Computer Engineering.
This work reflects my broader interest in pushing the boundaries of embedded AI systems, particularly in contexts where every microsecond matters.





