Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
On-Chip vs. Off-Chip FPGA Acceleration for Embedded Neural Networks
Date
Sep 2024 - May 2025
Tools & Technologies
STM32 (Cortex-M), Artix-7 FPGA, Xilinx MicroBlaze, Vivado, Vitis IDE, STM32CubeIDE, C/C++, VHDL
Whereas my earlier work focused on eliminating communication bottlenecks through an on-chip acceleration strategy, this project zoomed out to ask a broader systems-level question: What is the actual trade-off between on-chip and off-chip FPGA acceleration strategies for embedded machine learning?
The motivation was clear. In the embedded systems community, both on-chip and off-chip accelerators are widely discussed, yet direct, systematic comparisons are surprisingly scarce. I wanted to provide a rigorous evaluation that could help both researchers and practitioners make informed design choices, depending on whether their priority was latency, flexibility, or scalability.
To this end, I constructed and benchmarked four distinct hardware configurations spanning both silicon-based microcontrollers and FPGA softcore processors, with accelerators placed either on-chip (directly in FPGA fabric) or off-chip (interfacing through communication buses). By holding the neural network workloads constant and varying only the architectural setup, I was able to isolate the true impact of communication overhead on inference latency.
For me personally, this project underscored the importance of asking the right systems-level questions. Sometimes the most impactful contribution is not just building something faster, but creating a structured comparison that helps others make better design trade-offs. This mindset, of combining deep technical implementation with broad architectural reasoning, is one I hope to carry forward into future research.



