An-Najah National University

Department of Computer Engineering

Graduation Project II

CubeBot: FPGA-Based Rubik’s Cube Solver

Momen Anani

Mohammad Hamdan

Supervisor: Dr. Suleiman Abu Kharmeh

A report submitted in partial fulfilment of the requirements of
An-Najah National University for the degree of
Bachelor of Science in Computer Engineering

January 24, 2026


Abstract

This project presents the design and implementation of an automated Rubik’s Cube solving
robot using a heterogeneous embedded system architecture that combines FPGA hardware
acceleration, ARM processor coordination, and ESP32-based motor control. Unlike traditional
microcontroller-only or PC-based approaches, the system strategically distributes tasks across
specialized computing platforms to achieve deterministic real-time performance, modular de-
sign, and reliable operation.

The system architecture integrates three cooperating units: a DE1-SoC FPGA fabric im-
plementing hardware-accelerated color extraction and VGA display, an ARM Cortex-A9 Hard
Processor System (HPS) managing high-level coordination and solution computation, and
an ESP32 module handling motor control and wireless dashboard connectivity. The FPGA
processes cube face images with deterministic 14.8ms timing using threshold-based color clas-
sification, while the HPS executes the Kociemba two-phase solving algorithm and validates
cube state consistency. The ESP32 coordinates stepper and servo motors to physically ma-
nipulate the cube with sensor-based alignment feedback.

Communication between subsystems uses a custom UART packet protocol with state
machine-based error recovery, achieving 100% reliability across all testing. The system provides
dual monitoring interfaces through FPGA-based VGA hardware display and ESP32-hosted
wireless web dashboard, enabling comprehensive system visibility and user control.

Experimental results demonstrate 98.7% color detection accuracy, 93.3% solve success
rate, and mean solve time of 46.1 seconds. The modular architecture achieved efficient FPGA
resource utilization (27% ALMs, 2% block memory, 20% DSP blocks) while maintaining
flexibility for future enhancements. Testing across 30 complete solve cycles validated the ef-
fectiveness of the heterogeneous design approach for robotics applications requiring integrated
perception, computation, and actuation.

This work demonstrates how hardware-software co-design principles can address the limita-
tions of monolithic embedded systems, providing a practical architecture for FPGA-accelerated
robotics that balances real-time performance with implementation simplicity and debugging
accessibility.

Keywords: FPGA, Heterogeneous Architecture, Rubik’s Cube Solver, Hardware Accel-
eration, Embedded Systems, Image Processing, Robotics, DE1-SoC, ESP32, Hardware-
Software Co-Design

i


Contents

List of Figures v

List of Tables vii

1 Introduction 1
1.1 General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Significance and Importance . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Summary of Contributions and Achievements . . . . . . . . . . . . . . . . . 3
1.5 Organization of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Rubik’s Cube Solving Systems: Overview and System Requirements . . . . . 5
2.3 Cube Perception and Color Extraction Methods . . . . . . . . . . . . . . . . 6

2.3.1 Software-Based Image Processing Pipelines . . . . . . . . . . . . . . 6
2.3.2 Lighting Robustness and Color Space Selection . . . . . . . . . . . . 6
2.3.3 Hardware-Accelerated Vision and FPGA-Based Extraction . . . . . . . 7

2.4 Cube Solving Algorithms and Solution Generation . . . . . . . . . . . . . . . 7
2.4.1 Two-Phase Solving and Kociemba Algorithm . . . . . . . . . . . . . . 7
2.4.2 State Validation and Error Handling . . . . . . . . . . . . . . . . . . 7

2.5 Robotic Manipulation and Motion Control . . . . . . . . . . . . . . . . . . . 7
2.5.1 Servo and Stepper Motor Approaches . . . . . . . . . . . . . . . . . 8
2.5.2 Synchronization and Execution Order . . . . . . . . . . . . . . . . . 8

2.6 System Integration and Heterogeneous Embedded Architectures . . . . . . . 8
2.7 Research Gap and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methodology 10
3.1 System Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Hardware Components and System Architecture . . . . . . . . . . . . . . . . 11

3.2.1 External Interface and Monitoring . . . . . . . . . . . . . . . . . . . 13
3.2.2 Mechanical Assembly Considerations . . . . . . . . . . . . . . . . . . 14

3.3 Camera Interface and USB Video Capture . . . . . . . . . . . . . . . . . . . 15
3.3.1 Camera Configuration and Frame Acquisition . . . . . . . . . . . . . 15
3.3.2 Frame Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 FPGA-Based Image Processing and Color Detection . . . . . . . . . . . . . . 15
3.4.1 Hardware-Accelerated Image Processing Pipeline . . . . . . . . . . . 16
3.4.2 Adaptive Color Classification System . . . . . . . . . . . . . . . . . . 16

ii


CONTENTS iii

3.4.3 Processing Performance Characteristics . . . . . . . . . . . . . . . . 17
3.5 Communication Protocols and Data Exchange . . . . . . . . . . . . . . . . . 18

3.5.1 HPS-FPGA Communication Protocol . . . . . . . . . . . . . . . . . . 18
3.5.2 UART Packet Protocol (HPS-ESP32) . . . . . . . . . . . . . . . . . 18
3.5.3 HTTP API Protocol (ESP32-Dashboard) . . . . . . . . . . . . . . . 19

3.6 Hardware–Software Co-Design and Task Partitioning . . . . . . . . . . . . . 20
3.6.1 FPGA Fabric Responsibilities . . . . . . . . . . . . . . . . . . . . . . 20
3.6.2 HPS (Main Processor) Responsibilities . . . . . . . . . . . . . . . . . 20
3.6.3 ESP32 Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.7 FPGA Platform Designer System Integration . . . . . . . . . . . . . . . . . . 21
3.7.1 Custom IP Block Integration . . . . . . . . . . . . . . . . . . . . . . 22

3.8 Memory-Mapped Communication Between HPS and FPGA . . . . . . . . . . 22
3.9 VGA Real-Time Status Display . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.10 Wireless Dashboard Method (ESP32) . . . . . . . . . . . . . . . . . . . . . 24
3.11 Motor Control and Alignment Method (ESP32) . . . . . . . . . . . . . . . . 24

3.11.1 Stepper Motor Control . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.11.2 Dual-Servo Gripper Control . . . . . . . . . . . . . . . . . . . . . . . 25
3.11.3 Sensor-Based Alignment . . . . . . . . . . . . . . . . . . . . . . . . 25

3.12 Solving, Encoding, and Execution Scheduling . . . . . . . . . . . . . . . . . 26
3.12.1 Cube State Validation and Error Handling . . . . . . . . . . . . . . . 27
3.12.2 Move Encoding and Optimization . . . . . . . . . . . . . . . . . . . 28

3.13 Comparison with Alternative Approaches . . . . . . . . . . . . . . . . . . . . 29
3.13.1 FPGA vs. Microcomputer Approach . . . . . . . . . . . . . . . . . . 29
3.13.2 Communication Architecture . . . . . . . . . . . . . . . . . . . . . . 29
3.13.3 Mechanical Design Inspiration . . . . . . . . . . . . . . . . . . . . . 29

3.14 System Integration and Testing Methodology . . . . . . . . . . . . . . . . . 30
3.14.1 Three-Phase Testing Approach . . . . . . . . . . . . . . . . . . . . . 30

4 Results 32
4.1 Physical System Implementation . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Hardware Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Electronics Integration . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 System Interfaces and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.1 VGA Hardware Display . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Wireless Dashboard Interface . . . . . . . . . . . . . . . . . . . . . . 35

4.3 System Performance Validation . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Image Processing Results . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Communication Protocol Reliability . . . . . . . . . . . . . . . . . . 38
4.3.3 Motor Control Performance . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Complete Solve Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1 Solve Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.2 Solution Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . 39
4.4.3 System Reliability Testing . . . . . . . . . . . . . . . . . . . . . . . . 39

4.5 Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 Summary of Key Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


CONTENTS iv

5 Discussion 41
5.1 Achievement of Project Objectives . . . . . . . . . . . . . . . . . . . . . . . 41

5.1.1 Cube State Acquisition and Color Extraction . . . . . . . . . . . . . . 41
5.1.2 High-Level Processing and System Coordination . . . . . . . . . . . . 42
5.1.3 Mechanical Manipulation and Motor Control . . . . . . . . . . . . . . 42
5.1.4 Wireless Monitoring and User Feedback . . . . . . . . . . . . . . . . 42

5.2 Comparative Analysis with Literature . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Architecture Advantages . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 System Strengths and Contributions . . . . . . . . . . . . . . . . . . . . . . 43
5.3.1 Hardware-Software Co-Design Success . . . . . . . . . . . . . . . . . 43
5.3.2 Communication Protocol Robustness . . . . . . . . . . . . . . . . . . 43
5.3.3 Dual Monitoring Interface Value . . . . . . . . . . . . . . . . . . . . 44

5.4 Limitations and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4.1 Image Processing Limitations . . . . . . . . . . . . . . . . . . . . . . 44
5.4.2 Mechanical Design Constraints . . . . . . . . . . . . . . . . . . . . . 44
5.4.3 Timing Performance Considerations . . . . . . . . . . . . . . . . . . 44
5.4.4 Power Consumption and Portability . . . . . . . . . . . . . . . . . . 45
5.4.5 Software Dependency and Maintainability . . . . . . . . . . . . . . . 45

5.5 Challenges Encountered During Development . . . . . . . . . . . . . . . . . 45
5.5.1 Camera Interface Integration . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2 FPGA Timing and Resource Constraints . . . . . . . . . . . . . . . . 46
5.5.3 Communication Protocol Development . . . . . . . . . . . . . . . . . 46
5.5.4 Motor Control Calibration . . . . . . . . . . . . . . . . . . . . . . . . 46

5.6 Future Work and Potential Improvements . . . . . . . . . . . . . . . . . . . 47
5.6.1 Enhanced Image Processing . . . . . . . . . . . . . . . . . . . . . . . 47
5.6.2 Mechanical and Control Improvements . . . . . . . . . . . . . . . . . 47
5.6.3 System Architecture Extensions . . . . . . . . . . . . . . . . . . . . . 48
5.6.4 Educational and Research Applications . . . . . . . . . . . . . . . . . 49
5.6.5 Alternative Application Domains . . . . . . . . . . . . . . . . . . . . 49

5.7 Long-Term Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.7.1 Fully Autonomous Learning Systems . . . . . . . . . . . . . . . . . . 50
5.7.2 Integration with Advanced Computer Vision . . . . . . . . . . . . . . 51
5.7.3 Swarm Robotics and Collaborative Solving . . . . . . . . . . . . . . . 51

5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Conclusion and Recommendations 52
6.1 Summary of Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.1.1 Key Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Reflection on Learning Experience . . . . . . . . . . . . . . . . . . . . . . . 53

6.3.1 Key Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.2 Impact and Future Application . . . . . . . . . . . . . . . . . . . . . 53

6.4 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . . . . . 54
6.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

References 56


List of Figures

3.1 Overall Rubik’s Cube solver system flow (FPGA + HPS + ESP32 + peripherals). 11
3.2 3D concept rendering of the combined mechanical and hardware assembly for

the Rubik’s Cube solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Frame processing pipeline from USB camera capture to FPGA-based color

analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Image processing hardware block diagram showing sequential processing units

and resource utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 UART packet structure and communication flow between HPS and ESP32. . 19
3.6 Platform Designer (Qsys) system showing FPGA IP integration and subsystem

structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7 Custom IP block integration within the Platform Designer system showing data

flow and control connections. . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.8 Memory-mapped address regions for major FPGA peripherals accessed by the

HPS application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.9 Stepper pulse timing diagram showing the relationship between step and direc-

tion signals, calculation of steps per rotation (e.g., 800 steps for 90◦), and the
use of fast/slow pulse profiles for precise cube manipulation. The alignment
sensor is used to establish a home position before executing movements. . . . 26

3.10 Flowchart describing scanning, cube state construction, solving, and execution
scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.11 Move encoding pipeline showing translation from symbolic notation to robot-
executable commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.12 System integration testing methodology showing progressive complexity vali-
dation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Complete physical assembly of the Rubik’s Cube solver showing integrated
mechanical and electronic components. . . . . . . . . . . . . . . . . . . . . . 32

4.2 Camera positioning and upper assembly showing the cube holder mechanism
and webcam placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Internal electronics showing motor drivers, power distribution, and ESP32 con-
troller integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Detailed view of internal wiring and component placement demonstrating or-
ganized electronics integration. . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 VGA hardware display showing real-time system status, scanning progress, and
operational parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6 Dual-interface monitoring setup showing VGA display and wireless dashboard
operating simultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.7 Dashboard control interface showing system status indicators, mode selection,
and comprehensive test controls. . . . . . . . . . . . . . . . . . . . . . . . . 36

v


LIST OF FIGURES vi

4.8 Dashboard monitoring interface displaying detected cube state and real-time
move execution progress during solving operations. . . . . . . . . . . . . . . 37


List of Tables

3.1 Summary of Main Hardware Components . . . . . . . . . . . . . . . . . . . 12
3.2 Summary of 3D-Printed and Mechanical Parts . . . . . . . . . . . . . . . . . 13
3.3 Main FPGA memory-mapped peripherals used by the HPS application. . . . . 23

4.1 Color Detection Performance Metrics (50 test runs) . . . . . . . . . . . . . . 38
4.2 Complete Solve Performance Analysis (30 test runs) . . . . . . . . . . . . . . 39
4.3 Solution Complexity Distribution . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 FPGA Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . 40
4.5 FPGA Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . 40

vii


List of Abbreviations

ALM Adaptive Logic Module
API Application Programming Interface
ARM Advanced RISC Machine
CPU Central Processing Unit
ESP32 Espressif Systems 32-bit Microcontroller
FPGA Field-Programmable Gate Array
GPIO General Purpose Input/Output
HDL Hardware Description Language
HPS Hard Processor System
HSV Hue Saturation Value
HTTP Hypertext Transfer Protocol
I2C Inter-Integrated Circuit
LED Light Emitting Diode
MJPEG Motion JPEG
PIO Parallel Input/Output
PWM Pulse Width Modulation
RAM Random Access Memory
RGB Red Green Blue
RTL Register Transfer Level
SMPCS School of Mathematical, Physical and Computational Sciences
SoC System on Chip
SPI Serial Peripheral Interface
UART Universal Asynchronous Receiver/Transmitter
USB Universal Serial Bus
UVC USB Video Class
V4L2 Video4Linux2
VGA Video Graphics Array
WiFi Wireless Fidelity

viii


Chapter 1

Introduction

1.1 General Background
The Rubik’s Cube is widely considered one of the most iconic mechanical puzzles, requiring
a combination of perception, planning, and precise manipulation to reach a solved state.
While humans typically solve the cube through experience and pattern recognition, building
an automated Rubik’s Cube solver requires an integrated system capable of sensing the cube’s
colors, understanding its current configuration, computing a valid solution, and physically
executing the required movements in a reliable and repeatable manner.

In recent years, many Rubik’s Cube solving robots have been developed using microcon-
trollers and software-based computer vision techniques running on PCs or embedded Linux
platforms. These systems often rely on high-level image processing libraries to detect cube
colors and determine the cube state, followed by solver algorithms implemented in software.
Although these approaches can achieve strong performance, they may face limitations related
to latency, nondeterministic execution timing, and dependency on external software stacks.

Field-Programmable Gate Arrays (FPGAs) offer a powerful alternative for real-time sens-
ing and hardware acceleration due to their inherent parallelism, deterministic behavior, and
power efficiency. By implementing time-critical tasks such as color extraction and decision
logic in hardware, FPGA-based designs can achieve high throughput, low latency, and better
power consumption compared to purely software-driven solutions running on general-purpose
processors. The dedicated hardware approach enables efficient parallel processing without the
overhead of operating system scheduling and context switching, resulting in lower power per
operation while maintaining consistent performance. This makes FPGA technology an attrac-
tive candidate for robotics applications that require synchronized sensing, communication,
precise control, and energy-efficient operation.

This graduation project presents the design and implementation of a Rubik’s Cube solving
and control system based on a heterogeneous embedded architecture. In this design, the FPGA
is primarily utilized as a dedicated coprocessor for fast cube color extraction, while the Hard
Processor System (HPS) serves as the main processor responsible for high-level computation,
system coordination, and decision making. The system is extended with wireless monitoring
and control capabilities, ... enabling interaction through a wireless dashboard and providing
real-time feedback through both the dashboard and a VGA display that shows system statistics
during operation.

1


CHAPTER 1. INTRODUCTION 2

1.2 Objectives and Purpose
The primary objective of this project is to design and implement an automated Rubik’s Cube
solving system that combines hardware acceleration, embedded processing, and robotic ac-
tuation in a complete end-to-end solution. The system supports both solving and shuffling
modes, with adjustable difficulty levels, while maintaining reliability, accuracy, and real-time
operation.

The specific objectives of this project include:

1. Cube State Acquisition and Color Extraction: Capture Rubik’s Cube face data
through a camera-based acquisition process and perform fast and stable color extraction
using FPGA hardware logic.

2. High-Level Processing and System Coordination: Utilize the SoC processing system
(HPS) as the main controller responsible for validating detected faces, managing system
states, coordinating system modules, and preparing solution data for execution.

3. Mechanical Manipulation and Motor Control: Control servo and stepper motors to
physically rotate cube layers and manipulate the cube accurately, using an ESP-based
module as a dedicated motor and peripheral controller.

4. Wireless Monitoring and Control: Provide a wireless interface using Wi-Fi or Blue-
tooth to enable remote commands, system monitoring, and operation mode selection
through a mobile application.

5. User Feedback and Monitoring: Display real-time system status, runtime information,
and progress using a VGA output interface, in addition to a wireless dashboard used for
monitoring, mode selection, and operation control.

This project aims to demonstrate how a hardware-software co-design approach can be
used to build a practical robotic solver that leverages the strengths of FPGA acceleration and
embedded processing while maintaining a modular and scalable architecture.

1.3 Significance and Importance
This project addresses several practical and technical challenges involved in building an auto-
mated Rubik’s Cube solver and demonstrates the feasibility of using FPGA acceleration within
robotics and intelligent embedded systems.

Real-Time Hardware Acceleration and Power Efficiency: Implementing cube color
extraction using FPGA logic provides parallel processing capabilities, deterministic timing, and
power-efficient operation. Compared to software-based approaches running on general-purpose
processors, this enables faster processing, improved stability, and reduced power consumption
per operation, which is essential for reliable robotic operation, consistent cube state detection,
and potential battery-powered deployments.

Heterogeneous Embedded System Design: The system integrates multiple cooperating
units, where the FPGA operates as a coprocessor for hardware-accelerated perception, the
HPS functions as the main processor for system logic and computation, and the ESP module
handles motor control and peripheral interfacing. This architecture reflects modern embedded
system design practices where different platforms are used to maximize performance, flexibility,
and energy efficiency through strategic task partitioning.


CHAPTER 1. INTRODUCTION 3

Robotics and System Integration: Developing a complete Rubik’s Cube solver requires
bridging multiple domains including machine perception, communication, mechanical design,
and motor control. Successfully integrating a camera module, motors, LCD display, manual
inputs, and wireless connectivity demonstrates the ability of embedded systems to manage
complex tasks under strict timing constraints.

Educational and Practical Value: Beyond solving the cube as a final objective, the
system provides strong educational value by combining digital logic design, real-time embedded
programming, hardware interfaces (I2C, SPI, UART, PWM), and control principles within one
project. The developed architecture can also be adapted for other robotics systems requiring
perception and actuation.

Extensibility and Future Potential: The design principles used in this project can be
extended toward future FPGA-accelerated intelligent systems, especially in applications requir-
ing low-latency sensing and deterministic performance. The same approach can be applied to
tasks such as object sorting, inspection, or small-scale robotic control systems.

1.4 Summary of Contributions and Achievements
This project provides a complete Rubik’s Cube solving system with several contributions in
hardware acceleration, embedded coordination, and robotic execution.

System Architecture Contributions:

• Design of a heterogeneous architecture combining FPGA-based color extraction, HPS
high-level coordination, and ESP-based motor control

• Implementation of reliable communication across modules using hardware-level inter-
faces such as UART, I2C, SPI, and PWM

• Development of a modular structure that separates sensing, computation, and actuation
responsibilities for improved scalability

Perception and Processing Contributions:

• Hardware-accelerated cube color extraction using FPGA logic to improve speed and
stability

• High-level state management and decision logic executed on the HPS to coordinate
scanning, validation, and solution handling

• Support for multi-mode operation including solving and shuffling with adjustable diffi-
culty levels

Robotic Execution and Control Achievements:

• Integration of servo and stepper motors to physically manipulate the cube with accurate
layer rotations

• Implementation of synchronization strategies to ensure correct execution order and safe
mechanical movement

• Real-time monitoring and control through a wireless dashboard and VGA status display

Wireless Monitoring and User Interaction:


CHAPTER 1. INTRODUCTION 4

• Wireless control and monitoring support through Wi-Fi or Bluetooth communication
with a mobile application

• Manual control options through key inputs for debugging, testing, or mode switching

• A complete interactive workflow from scanning the cube, computing the solution, and
executing robot moves

Overall, the completed system demonstrates a practical Rubik’s Cube solving robot built
using hardware-software co-design, emphasizing low-latency perception, modular architecture,
and reliable actuation.

1.5 Organization of the Report
This report is organized into multiple chapters, each addressing a key aspect of the Rubik’s
Cube solving system design and implementation:

Chapter 1: Introduction provides the foundational context of the project, including
background information, objectives, significance, and major contributions.

Chapter 2: Literature Review explores existing Rubik’s Cube solvers and robotics de-
signs, FPGA-based vision acceleration methods, and related embedded system architectures.

Chapter 3: Methodology describes the project development approach, hardware selec-
tion rationale, system workflow, and key implementation strategies.

Chapter 4: System Analysis and Design presents the full system architecture, including
FPGA color extraction design, HPS coordination and computation, ESP motor control logic,
communication protocols, and peripheral interfacing.

Chapter 5: Implementation and Results documents the implementation process, test-
ing procedures, and experimental performance results such as recognition accuracy, timing
performance, and mechanical execution reliability.

Chapter 6: Discussion analyzes the system results, highlights limitations, and discusses
potential improvements and future extensions.

Chapter 7: Conclusions summarizes the achieved outcomes and provides final remarks
on the significance of FPGA-accelerated robotic solvers.

Additional supporting material including wiring diagrams, extended code listings, and de-
tailed configuration parameters are provided in the appendices for completeness and repro-
ducibility.


Chapter 2

Literature Review

2.1 Introduction
The Rubik’s Cube has remained one of the most widely studied mechanical puzzles since its
invention, not only due to its popularity but also because of the challenging combination of per-
ception, planning, and precise manipulation required to solve it. Unlike purely computational
problems, an automated Rubik’s Cube solver must integrate multiple domains: accurate color
recognition, cube state representation, solution generation, and physical actuation through
motors and mechanical mechanisms.

Recent developments in embedded systems and robotics have enabled many Rubik’s Cube
solving platforms, typically built using microcontrollers, single-board computers, or PC-based
processing M et al. (2024); Andrew, Faridah, Tan, Ragunathan, Amirah, Zainab and Lee
(2021); Chalise et al. (2025). These systems often rely on software computer vision pipelines
to detect cube colors, followed by algorithmic solvers to generate a solution sequence, and
finally an actuator subsystem to execute the moves Sawhney et al. (2013); Andrew, Zainab,
Amirah, Ragunathan, Tan, Faridah and Rezal (2021). However, implementing such systems
in a real-time and reliable manner remains difficult due to limitations such as sensor noise,
lighting variations, mechanical misalignment, timing constraints, and integration complexity.

This chapter reviews the most relevant research and technical approaches related to Rubik’s
Cube solving systems, with focus on three key areas: (1) cube perception and color extraction,
(2) cube solving algorithms and move generation, and (3) robotic manipulation and embedded
architecture design. The review highlights common limitations of monolithic designs and
motivates the adoption of heterogeneous architectures that strategically distribute tasks across
FPGA logic, a main processor (HPS), and an ESP module for actuation and system interfacing.

2.2 Rubik’s Cube Solving Systems: Overview and System Re-
quirements

A full Rubik’s Cube solving robot generally consists of three fundamental stages:

1. Perception: capturing cube faces using a camera or sensor array and extracting the
cube’s sticker colors.

2. Computation: converting detected colors into a valid cube state representation and
generating a valid solving sequence.

3. Execution: physically rotating cube layers using actuators such as servo motors, stepper
motors, or gear-based mechanisms.

5


CHAPTER 2. LITERATURE REVIEW 6

Although these stages appear sequential, practical solver robots require feedback and
monitoring to ensure the cube is scanned correctly and moves are executed accurately. For
example, inconsistent lighting may lead to incorrect color classification, while mechanical
backlash may cause layer misalignment and execution failure. Therefore, modern designs often
incorporate system state machines, timing measurement, status indicators, and debugging
interfaces to improve reliability and user control Dan et al. (2021); Barucija et al. (2020).

2.3 Cube Perception and Color Extraction Methods
Color extraction is one of the most critical stages in a Rubik’s Cube solver, as a single incorrect
color classification can result in an invalid cube state and failed solution execution. The most
common perception technique is camera-based scanning, where the cube is rotated and each
face is captured as an image. From these images, multiple approaches have been proposed
for extracting the cube’s sticker colors.

2.3.1 Software-Based Image Processing Pipelines

Many implementations use software processing on a CPU-based platform such as a PC, Rasp-
berry Pi, or embedded Linux environment. These pipelines often apply the following steps:

• image acquisition and preprocessing,

• region-of-interest extraction for the nine stickers,

• color space conversion (e.g., RGB to HSV),

• thresholding and clustering for classification.

Computer vision frameworks such as OpenCV have become standard for these systems
due to the availability of efficient tools for segmentation and classification OpenCV Team
(2024); Sawhney et al. (2013). While software pipelines are flexible and simple to modify,
their performance can be affected by operating system overhead, non-deterministic schedul-
ing, and varying computational latency. Additionally, embedded devices may struggle when
vision, networking, and motor control are executed simultaneously Andrew, Zainab, Amirah,
Ragunathan, Tan, Faridah and Rezal (2021).

2.3.2 Lighting Robustness and Color Space Selection

A major challenge in color extraction is environmental variation, particularly illumination
changes, shadows, and reflections on glossy cube stickers. Several studies suggest that HSV
color representation is generally more stable than raw RGB for classification tasks, since hue
and saturation provide separation of color information from brightness intensity Gonzalez and
Woods (2008). However, even HSV-based classification can suffer in difficult lighting, making
it necessary to apply calibration or normalization strategies.

Other approaches use clustering and statistical learning techniques, where the system
learns cube color distributions by sampling center stickers or performing adaptive threshold
adjustment Chalise et al. (2025); Dan et al. (2021). These methods may improve accuracy
but increase computation complexity and require careful tuning.


CHAPTER 2. LITERATURE REVIEW 7

2.3.3 Hardware-Accelerated Vision and FPGA-Based Extraction

FPGAs are widely recognized for their capability to accelerate vision and signal-processing tasks
using parallelism and pipelined architectures. Unlike CPUs that execute sequential instructions,
FPGA logic can compute multiple pixel-level operations concurrently with deterministic timing.
This makes FPGA acceleration attractive for embedded perception tasks that require stable
latency and consistent throughput Woods et al. (2008).

In the context of Rubik’s Cube solvers, FPGA hardware can be used to accelerate specific
stages such as color extraction, pixel classification, or feature detection. Rather than imple-
menting the entire system on FPGA, a more efficient design choice is to use FPGA logic as a
dedicated coprocessor for perception tasks, while high-level computation and coordination are
executed by a software processor. This separation reduces the workload on the main processor
and improves system determinism during real-time operation.

2.4 Cube Solving Algorithms and Solution Generation
Once cube colors are extracted, the system must map them to a cube state and compute
a solving sequence. The Rubik’s Cube state space is extremely large, with approximately
4.3 × 1019 possible configurations, making brute-force search infeasible.

2.4.1 Two-Phase Solving and Kociemba Algorithm

Among the most widely used practical algorithms is the two-phase method proposed by Ko-
ciemba Kociemba (1992). This method divides the problem into two stages: reducing the
cube into a restricted subgroup in phase one, then completing the solve in phase two. The al-
gorithm is efficient and commonly used in software solvers due to its balance between solution
length and computation time.

In many real-world implementations, the solver algorithm is executed on a CPU-based
system since it involves search operations, table lookups, and branching logic that are naturally
suited for software execution Andrew, Zainab, Amirah, Ragunathan, Tan, Faridah and Rezal
(2021); Dan et al. (2021). For embedded Rubik’s Cube solvers, the algorithm output is
converted into motor-level commands, requiring additional translation steps and validation M
et al. (2024).

2.4.2 State Validation and Error Handling

An important step before solving is verifying that the detected cube state is valid. Incorrect
scanning may produce impossible states, such as incorrect parity or invalid color counts.
Therefore, practical solver systems incorporate checks to ensure the scanned faces form a
consistent cube representation before attempting to compute a solution Dan et al. (2021);
Sawhney et al. (2013). This reduces failure probability and improves overall system robustness.

2.5 Robotic Manipulation and Motion Control
The execution stage transforms the computed solution moves into physical actuation. Rubik’s
Cube robots typically use servo motors, stepper motors, or hybrid approaches. The actuator
design depends strongly on mechanical constraints such as grip stability, axis alignment, torque
requirements, and move execution speed.


CHAPTER 2. LITERATURE REVIEW 8

2.5.1 Servo and Stepper Motor Approaches

Servo-based designs often simplify control by using angular positioning, but may suffer from
limited torque or speed in demanding mechanical setups Andrew, Faridah, Tan, Ragunathan,
Amirah, Zainab and Lee (2021). Stepper motors provide precise incremental control and
repeatability, making them suitable for controlled rotations such as 90-degree and 180-degree
cube turns M et al. (2024); Chalise et al. (2025). However, stepper motors require careful
timing control, pulse generation, and sometimes feedback sensors for alignment.

2.5.2 Synchronization and Execution Order

Rubik’s Cube solvers must execute moves in strict order with reliable timing. Any missed step
or mechanical slip can corrupt the cube state relative to the solver’s internal model. As a
result, robust implementations often include:

• calibration routines to align the cube mechanism,

• sensors to detect reference positions,

• safety timing constraints to prevent motor conflicts.

Therefore, assigning motor control to a dedicated controller can improve reliability, espe-
cially when the main processor also handles solving and system coordination.

2.6 System Integration and Heterogeneous Embedded Archi-
tectures

A major challenge in Rubik’s Cube solving robots is the integration of perception, computation,
and actuation in one system without performance bottlenecks Chalise et al. (2025). Many
low-cost projects attempt to implement everything on one microcontroller, which often results
in trade-offs between speed, reliability, and feature support M et al. (2024).

A heterogeneous architecture provides a practical solution by distributing tasks across
multiple specialized processing units Barucija et al. (2020):

• FPGA coprocessor: dedicated for real-time color extraction and image-related pro-
cessing Intel (Altera) (2020).

• HPS main processor: responsible for system state management, cube state validation,
solution computation, and coordination between modules Intel (Altera) (2020).

• ESP module: dedicated for motor control, peripheral interfacing, and wireless dash-
board connectivity Espressif Systems (2023).

This design approach matches each task to the processing platform that is best suited for
it Barucija et al. (2020). FPGA logic achieves deterministic low-latency perception. The HPS
executes complex sequential algorithms efficiently. The ESP module handles real-time PWM
generation, motor drivers, and user monitoring services without interfering with perception or
solving performance.


CHAPTER 2. LITERATURE REVIEW 9

2.7 Research Gap and Motivation
Despite the availability of many Rubik’s Cube solvers, most existing platforms fall into one of
two categories Chalise et al. (2025); Andrew, Faridah, Tan, Ragunathan, Amirah, Zainab and
Lee (2021):

1. Software-heavy solvers that rely on CPU-based vision and computation, often requir-
ing a full operating system environment Sawhney et al. (2013); Andrew, Zainab, Amirah,
Ragunathan, Tan, Faridah and Rezal (2021); M et al. (2024).

2. Hardware-heavy prototypes that use FPGA or specialized hardware for the entire
system, increasing design complexity and limiting flexibility.

A practical gap exists in architectures that combine FPGA acceleration for perception
with a processor-based solving pipeline and an independent real-time motor controller. Such
designs can achieve strong real-time performance while maintaining modularity and debugging
simplicity. Additionally, integrating monitoring interfaces such as VGA status visualization and
a wireless dashboard provides improved user interaction, system transparency, and operational
reliability.

2.8 Summary
This literature review discussed Rubik’s Cube solver system requirements and examined exist-
ing approaches in cube perception, solving algorithms, and robotic execution. Software-based
vision pipelines provide flexibility but may suffer from nondeterministic timing and high com-
putational load. Solving algorithms such as Kociemba’s two-phase method enable efficient
computation, but require valid cube state input to function correctly. Robotic manipula-
tion demands accurate motor control, synchronization, and calibration to ensure successful
execution.

The review motivates a heterogeneous embedded design that strategically separates re-
sponsibilities: FPGA hardware acts as a coprocessor for color extraction, the HPS coordinates
high-level logic and solution computation, and an ESP module manages motor control and
wireless interfacing. This architecture offers a balanced approach that improves real-time
performance, modularity, and system reliability compared to monolithic embedded designs.


Chapter 3

Methodology

3.1 System Design Overview
This project follows a hardware–software co-design methodology to implement an automated
Rubik’s Cube solving robot using a heterogeneous embedded architecture. The complete
system integrates three cooperating computing units:

• FPGA Fabric (Coprosessor/Peripheral Fabric): Provides hardware peripherals and
subsystems such as VGA output, UART IP, PIO interfaces, and on-chip memory buffers.

• HPS (ARM Cortex-A9 Main Processor): Executes the main control program re-
sponsible for high-level state control, cube state construction, solving logic, and move
scheduling.

• ESP32 Module (Actuation + Dashboard Controller): Hosts the wireless dashboard,
receives commands from the user, and executes robot movement through stepper/servo
control and sensor-based alignment.

The overall operational pipeline is summarized as follows:

1. The user connects to the ESP32 wireless dashboard and selects a system command
(solve, shuffle, reset, etc.).

2. The cube scanning procedure starts, where cube face data is acquired and prepared for
processing.

3. Cube colors are extracted and transmitted to the HPS for cube state construction and
verification.

4. The HPS generates a solving (or shuffling) move sequence.

5. The move sequence is encoded and sent to the robot execution controller.

6. The ESP32 performs real-time motor execution with safety checks and alignment mech-
anisms.

7. The FPGA VGA subsystem and the wireless dashboard display real-time system statistics
and progress.

10


CHAPTER 3. METHODOLOGY 11

Figure 3.1: Overall Rubik’s Cube solver system flow (FPGA + HPS + ESP32 +
peripherals).

3.2 Hardware Components and System Architecture
The physical implementation of the Rubik’s Cube solver requires careful selection of hardware
components to ensure reliable cube manipulation, accurate state sensing, and efficient system
coordination. Unlike traditional Raspberry Pi-based approaches, this system leverages FPGA
hardware acceleration for deterministic peripheral control and real-time operation.

The hardware platform is designed to balance mechanical robustness, electrical reliability,
and ease of integration with the embedded control architecture. Each component is chosen
to fulfill a specific role in the system, from high-level computation and user interaction to
low-level actuation and feedback. The integration of the DE1-SoC FPGA board enables
custom hardware peripherals and high-speed communication with the HPS processor, while
the ESP32 module provides wireless connectivity and real-time motor control. Additional
sensors and drivers are incorporated to support precise alignment, safe operation, and flexible
mechanical assembly.

In addition to electronic hardware, the system incorporates several custom 3D-printed and
mechanical parts designed to support the cube manipulation mechanisms and ensure precise
assembly. The complete system architecture follows a distributed processing model where
each computing unit specializes in specific tasks to achieve optimal performance.

The following tables summarize the main hardware and mechanical components used in
the system:


CHAPTER 3. METHODOLOGY 12

Table 3.1: Summary of Main Hardware Components

Image Component Quantity Notes

DE1-SoC FPGA Development Board 1 Main processing and hard-
ware acceleration

ESP32 WiFi Microcontroller 1 Wireless dashboard, mo-
tor control

Redragon USB Webcam 1 Cube face image acquisi-
tion

NEMA 17 Stepper Motor 1 Cube rotation (main axis)

DRV8825 Stepper Motor Driver 1 Stepper driver, microstep-
ping

DS3218 Servo Motor 2 Cube gripping/releasing
(dual gripper)

PCA9685 PWM Servo Driver 1 16-channel PWM, servo
control

Optical Slot Sensor 1 Alignment/home position
feedback

HP-D3006A0 Power Supply 1 Main power for motors
and logic

I2C Logic Level Shifter 2 5V/3.3V I2C interfacing

PV 902512L Cooling Fan 1 Cooling for
drivers/electronics

MPB19 Push Button 2 Manual reset/emergency
stop


CHAPTER 3. METHODOLOGY 13

Table 3.2: Summary of 3D-Printed and Mechanical Parts

Image Part Name Quantity Notes

Hinge for upper-cover and lifter 1 Connects upper cover to
lifter, allows rotation

Upper-cover 1 Top enclosure, attaches to
hinge and lifter

Cube-holder bottom part 1 Base for holding the cube

Cube-holder upper part 1 Top clamp for cube, con-
nects to lifter

Synchronization disk 1 Ensures alignment be-
tween lifter and cube
holder

Motor support 1 Holds stepper/servo mo-
tor in place

Lifter 1 Raises/lowers upper cover
and cube holder

Lifter-link 1 Connects lifter to upper
cover, transmits motion

3.2.1 External Interface and Monitoring

The system supports multiple user interface options for operation and monitoring:

• VGA Monitor: Connected to the DE1-SoC board’s VGA output for real-time system
status display, including current state, scan progress, solution availability, move count,
and execution status. The VGA interface provides a hardware-based monitoring solution
that operates independently of wireless connectivity.

• Mobile Phone or Laptop (WiFi Dashboard): Any WiFi-capable device can connect
to the ESP32 access point and access the web-based control dashboard. This provides


CHAPTER 3. METHODOLOGY 14

wireless operation control, system status monitoring, and cube face visualization without
requiring dedicated applications or software installation.

3.2.2 Mechanical Assembly Considerations

While this project focuses primarily on the embedded system design and control architecture,
the mechanical structure plays a critical role in system operation. The mechanical assembly
includes:

• A cube holding platform with gripper arms actuated by the two servo motors

• A rotation mechanism driven by the stepper motor for executing face turns

• Alignment markers or flags that trigger the optical slot sensor

• Mounting brackets and support structures to maintain component positioning

• Cable management to prevent interference with moving parts

The mechanical design is inspired by existing Rubik’s Cube solving robots but adapted to
accommodate the specific electrical and control architecture of this FPGA-based implemen-
tation.

Figure 3.2: 3D concept rendering of the combined mechanical and hardware as-
sembly for the Rubik’s Cube solver.


CHAPTER 3. METHODOLOGY 15

3.3 Camera Interface and USB Video Capture
The cube face scanning process relies on a USB webcam (Redragon model) connected to the
DE1-SoC board through the HPS USB subsystem. The camera interface operates through
the Linux UVC (USB Video Class) driver, enabling standard Video4Linux2 (V4L2) API access
for frame capture.

3.3.1 Camera Configuration and Frame Acquisition

The system initializes the camera with the following specifications:

• Resolution: 640×480 pixels

• Format: MJPEG (Motion JPEG) compressed frames

• Frame rate: 30 FPS (frames per second)

• Exposure: Auto-adjustment enabled

• White balance: Automatic

The choice of MJPEG format provides a balance between image quality and processing
efficiency. Each captured frame undergoes decompression using the libjpeg library before being
converted to RGB565 format for FPGA processing.

3.3.2 Frame Processing Pipeline

The camera-to-FPGA processing pipeline follows these stages:

1. Frame Capture: USB camera captures MJPEG frame through V4L2 interface

2. JPEG Decompression: libjpeg decompresses frame to raw RGB888 format

3. Color Space Conversion: RGB888 converted to RGB565 for FPGA compatibility

4. Frame Transfer: RGB565 data written to SDRAM frame buffer via AXI bus

5. FPGA Processing: Hardware-accelerated color detection and analysis

6. Results Extraction: Processed cube face data read back by HPS

Figure 3.3: Frame processing pipeline from USB camera capture to FPGA-based
color analysis.

3.4 FPGA-Based Image Processing and Color Detection
The FPGA fabric implements a dedicated hardware-accelerated image processing pipeline for
cube face color detection. This approach provides deterministic processing times and real-time
performance independent of HPS CPU load.


CHAPTER 3. METHODOLOGY 16

3.4.1 Hardware-Accelerated Image Processing Pipeline

The FPGA implements a multi-stage image processing pipeline for cube face color detection.
The approach combines edge detection, grid extraction, and statistical color analysis to provide
reliable results.

Stage 1: Edge Detection and Grid Extraction

• Region of Interest Extraction: Identifies cube face boundaries within captured frame

• Edge Enhancement: Applies edge detection to improve grid boundary visibility

• Grid Cell Isolation: Divides detected region into individual cell areas

• Center Region Sampling: Extracts representative samples from cell centers

Stage 2: Multi-Point Color Analysis

• Statistical Averaging: Computes representative RGB values from sampled regions

• Threshold-Based Classification: Applies calibrated thresholds for color identification

• Noise Filtering: Eliminates outlier pixels and lighting artifacts

For each grid cell, the processing algorithm:

1. Applies edge detection to enhance cell boundary visibility

2. Extracts representative sample region from cell center

3. Calculates statistical RGB component averages

4. Applies calibrated threshold comparisons for color classification

5. Outputs encoded color identifier for transmission to HPS

3.4.2 Adaptive Color Classification System

The color classification system uses calibrated RGB threshold values optimized for standard
Rubik’s Cube colors with provisions for lighting variation compensation.

Threshold-Based Classification:

• White Detection: High brightness detection across all RGB channels

• Red Classification: Red channel dominance with cross-channel validation

• Blue Classification: Blue channel dominance with threshold gating

• Multi-Color Support: Additional thresholds for yellow, orange, and green recognition

Implementation Features:

• Calibrated Threshold Matrix: Empirically determined classification boundaries

• Hierarchical Decision Logic: Efficient hardware-optimized decision tree

• Error Handling: Robust handling of ambiguous or uncertain classifications

• Lighting Tolerance: Calibration provisions for varied ambient conditions


CHAPTER 3. METHODOLOGY 17

Figure 3.4: Image processing hardware block diagram showing sequential process-
ing units and resource utilization.

3.4.3 Processing Performance Characteristics

The hardware implementation provides adequate performance for cube solving applications:

• Sequential Processing: Nine cells processed one after another using shared hardware

• Fixed Processing Time: Deterministic 14.8ms per face (approximately 1.6ms per cell)

• Resource Utilization: 27% ALMs (8,628 adaptive logic modules), 2% block memory
(75 KB), 20% DSP blocks (18 units), 4,441 registers

• Deterministic Timing: Independent of HPS CPU load

• Lighting Sensitivity: Accuracy depends on controlled ambient lighting conditions


CHAPTER 3. METHODOLOGY 18

3.5 Communication Protocols and Data Exchange
The system implements structured communication protocols for reliable data exchange be-
tween the three computing units. Each communication path uses a different protocol optimized
for its specific requirements.

3.5.1 HPS-FPGA Communication Protocol

Communication between the HPS and FPGA occurs through memory-mapped I/O using
several mechanisms:

• PIO Registers: 32-bit control and status flags

• Shared Memory: SDRAM frame buffer for image data

• On-chip Memory: Fast storage for encoded move sequences

• Direct Register Access: Low-latency control signaling

3.5.2 UART Packet Protocol (HPS-ESP32)

The HPS-ESP32 communication uses a simple, robust packet-based protocol over UART:

[START][TYPE][LENGTH][DATA...][END]
0xAA 1B 1B 0-255B 0xFF

Packet Types:

• 0x01: Face color data (10 bytes: face_index + 9 colors)

• 0x03: Robot move sequence (variable length)

• 0x04: Status updates (1 byte status code)

• 0x05: Command messages (variable length)

Each packet starts with 0xAA and ends with 0xFF. The receiver uses a state machine to
parse packets and resynchronize if errors or invalid bytes are detected. This ensures reliable
communication for real-time robot operation.

Receiver State Machine:

• WAIT_START: Waits for 0xAA to begin a new packet.

• READ_TYPE: Captures the packet type and validates it.

• READ_LEN: Reads the length of the data payload.

• READ_DATA: Collects the specified number of data bytes.

• WAIT_END: Expects 0xFF to confirm packet completion.

• PROCESS: Passes the packet to the appropriate handler.


CHAPTER 3. METHODOLOGY 19

If any field is invalid or a timeout occurs, the state machine resets to WAIT_START,
ensuring robust error recovery. This design allows the receiver to resynchronize even if bytes
are lost or corrupted.

Figure 3.5: UART packet structure and communication flow between HPS and
ESP32.

3.5.3 HTTP API Protocol (ESP32-Dashboard)

The wireless dashboard communicates with the ESP32 using RESTful HTTP endpoints:
• GET /api/status: Retrieve system status and progress

• GET /api/cmd/[command]: Execute system commands

• GET /: Serve main dashboard HTML page
Status responses include JSON data with cube state, move progress, and system indicators.


CHAPTER 3. METHODOLOGY 20

3.6 Hardware–Software Co-Design and Task Partitioning
A key methodology decision in this project is the explicit separation of responsibilities between
hardware logic and software execution. This avoids a monolithic design where all tasks compete
for the same computation time and resources, and instead assigns each task to the most
appropriate platform.

3.6.1 FPGA Fabric Responsibilities

The FPGA logic is used primarily for deterministic interfacing and hardware peripherals, in-
cluding:

• VGA status display subsystem for live system monitoring.

• Memory-mapped I/O peripherals accessible through the HPS lightweight bridge.

• UART hardware IP for external communication.

• PIO interfaces for control/status signaling between hardware/software blocks.

• On-chip memory used for fast storage of encoded robot moves.

3.6.2 HPS (Main Processor) Responsibilities

The HPS executes the main application responsible for high-level system operation, including:

• Central state machine and operation sequencing (scan, validate, solve, execute).

• Cube state construction from detected face colors.

• Solving algorithm execution using a Kociemba-based solver.

• Encoding solution steps into robot movement commands.

• Writing status and logs to the VGA display memory.

The HPS program communicates with FPGA peripherals using memory-mapped access
via /dev/mem, allowing fast register-level control without requiring heavy driver development.

3.6.3 ESP32 Responsibilities

The ESP32 is responsible for real-time actuation and user interaction, including:

• Hosting the wireless dashboard over Wi-Fi and handling API requests.

• Stepper motor pulse generation using dedicated GPIO pins.

• Sensor-based alignment and safety timeout mechanisms.

• Executing shuffling routines with selectable difficulty levels.

• Providing real-time status back to the dashboard.

This partitioning ensures that the motor timing and execution remain reliable even when
the HPS is busy performing cube-solving computations.


CHAPTER 3. METHODOLOGY 21

3.7 FPGA Platform Designer System Integration
The FPGA system is constructed using Intel Platform Designer (Qsys) to integrate multiple
IP blocks and subsystems. The design includes clock/reset generation, HPS bridges, external
SDRAM controller, VGA output subsystem, and a set of memory-mapped peripherals.

The major IP blocks included in the FPGA system are:

• System PLL: Generates internal system clocks required for peripherals and SDRAM.

• ARM A9 HPS: Provides the lightweight AXI bridge (HPS-to-FPGA) for register-level
access.

• SDRAM Controller: Allows shared memory operations and frame buffer transfers when
required.

• VGA Subsystem: A complete VGA engine including character buffer and pixel DMA
control.

• PIO Modules: Used for lightweight control/status exchange.

• On-Chip Memory: Used as a fast local buffer for storing robot move commands.

• UART IP: Hardware serial interface accessible as memory-mapped registers.

Figure 3.6: Platform Designer (Qsys) system showing FPGA IP integration and
subsystem structure.


CHAPTER 3. METHODOLOGY 22

3.7.1 Custom IP Block Integration

The system includes several custom IP blocks designed specifically for cube solving:

• Color Detection Engine: Hardware-accelerated pixel processing

• Cube Face Buffer: 3×3 grid storage for detected colors

• Move Encoding Buffer: On-chip storage for robot commands

• Status Flag Controller: Real-time system state management

Figure 3.7: Custom IP block integration within the Platform Designer system
showing data flow and control connections.

3.8 Memory-Mapped Communication Between HPS and FPGA
The HPS application uses memory-mapped I/O to access FPGA peripherals with low latency
and full software control. This is achieved through Linux /dev/mem mapping, allowing the
program to access registers inside the lightweight HPS-to-FPGA bridge region.

The lightweight AXI bridge base used in the system is:

• Lightweight bridge base: 0xFF200000

• Bridge span: 0x00010000

In addition, the VGA character buffer is accessed through a dedicated mapped memory
region:

• VGA character buffer base: 0xC9000000

• VGA character span: 0x00002000


CHAPTER 3. METHODOLOGY 23

Figure 3.8: Memory-mapped address regions for major FPGA peripherals accessed
by the HPS application.

Table 3.3: Main FPGA memory-mapped peripherals used by the HPS application.

Peripheral Offset/Base Purpose
LW AXI Bridge 0xFF200000 Main register access from HPS to FPGA
PIO0 0x00000010 System status/control bits
PIO1 0x00000000 Control bits (freeze/solution flags)
Colors PIO (PIO2) 0x00000020 Face color data exchange
On-chip memory 0x0000F000 Move list storage buffer
UART IP 0x00000040 External serial TX/RX registers
VGA Char Buffer 0xC9000000 Real-time monitoring display

This methodology provides direct and efficient access to FPGA logic blocks, enabling:

• high-speed updates to VGA monitoring,

• fast move transfer into on-chip memory,

• low-overhead communication through UART and PIO registers.

3.9 VGA Real-Time Status Display
Instead of using an LCD, the system implements a VGA display subsystem that provides real-
time visibility into the system behavior. The VGA output is used for debugging, monitoring,
and progress tracking during scanning, solving, and execution.

The HPS application updates the VGA screen by writing ASCII characters directly into
the VGA character buffer memory. The displayed information includes:

• current system state (idle, scanning, ready, running, done, error),

• number of scanned faces and scanning progress,

• whether a solution is available,

• move count and execution status,

• runtime timer and debugging indicators.

The VGA display improves reliability and usability by providing an always-available hard-
ware monitor that does not depend on wireless connectivity.


CHAPTER 3. METHODOLOGY 24

3.10 Wireless Dashboard Method (ESP32)
Wireless control and monitoring are implemented using an ESP32 module configured as a
Wi-Fi Access Point. The dashboard is accessed by connecting to the ESP32 network:

• SSID: CubeBot-AP

A lightweight web server is hosted on the ESP32. The dashboard provides:

• command buttons (solve, shuffle, reset, scanning control),

• system status indicators and counters,

• move progress and HPS interaction status,

• cube face visualization and color previews.

The dashboard periodically queries the ESP32 backend using simple HTTP endpoints
(REST-style API). This design was selected because it is easy to access from any mobile
device or laptop without requiring extra applications, while remaining lightweight for real-time
monitoring.

3.11 Motor Control and Alignment Method (ESP32)
Robot motion execution is performed by the ESP32 using stepper pulse generation, dual-servo
gripper control, and sensor-based alignment. This distributed control architecture separates
time-critical actuation from high-level computation, ensuring reliable and deterministic motor
timing.

3.11.1 Stepper Motor Control

The NEMA 17 stepper motor is controlled through the DRV8825 driver using two GPIO pins:

• Step pin: GPIO25

• Direction pin: GPIO26

The system is configured for high-resolution microstepping:

• Steps per revolution: 3200 (1/16 microstepping)

• Steps per 90 degrees: 800

• Steps per 45 degrees: 400

The ESP32 generates step pulses with configurable timing profiles:

• Fast pulse delay: 100 microseconds (for rapid movements)

• Slow pulse delay: 700 microseconds (for precise positioning)

This dual-speed approach enables quick rotation for most of the movement, with slower
speeds during approach to target positions to minimize overshoot and mechanical stress.


CHAPTER 3. METHODOLOGY 25

3.11.2 Dual-Servo Gripper Control

Two DS3218 servo motors are controlled through the PCA9685 PWM driver board via I2C
communication. The dual-servo configuration provides:

• Symmetric gripping force distribution

• Independent control for grip and release sequences

• Adjustable grip strength through PWM duty cycle control

• Coordinated movement during cube manipulation

The servos operate in three primary states: fully gripped (holding the cube firmly), par-
tially gripped (allowing controlled cube movement), and fully released (permitting cube face
rotation). Proper sequencing of servo and stepper movements is critical to prevent cube
slippage or binding.

3.11.3 Sensor-Based Alignment

An optical slot sensor is used to establish a known mechanical reference point before executing
movements. The sensor is connected as:

• Alignment sensor: GPIO5

• Active state: LOW (triggered when alignment flag passes through sensor)

The alignment procedure operates as follows:

1. Rotate stepper motor slowly until sensor detects alignment flag

2. Continue a fixed number of steps past the edge (edge compensation)

3. Set current position as home reference

4. All subsequent movements are calculated relative to this home position

To avoid infinite movement in case of sensor failure or mechanical obstruction, the design
includes a timeout mechanism:

• Alignment timeout: half rotation (1600 steps maximum)

• If timeout occurs, system enters error state and reports fault to dashboard

This methodology increases robustness and ensures the robot always starts from a consis-
tent physical orientation before performing cube rotations, preventing cumulative positioning
errors.


CHAPTER 3. METHODOLOGY 26

Time

Step

Fast pulses Slow pulsesDir

Alignment Sensor

N steps (e.g., 800 for 90◦)

Figure 3.9: Stepper pulse timing diagram showing the relationship between step
and direction signals, calculation of steps per rotation (e.g., 800 steps for 90◦), and
the use of fast/slow pulse profiles for precise cube manipulation. The alignment
sensor is used to establish a home position before executing movements.

3.12 Solving, Encoding, and Execution Scheduling
After cube face colors are acquired, the HPS constructs a complete cube state representation
and verifies it before solving. The solving algorithm is executed on the HPS using a Kociemba-
based approach, which provides an efficient method for generating a solution sequence.

Once a valid solution sequence is generated, it is converted into robot-executable com-
mands. The design includes:

• transformation of symbolic moves (e.g., R, U ′, F2) into robot actions,

• encoding each action into a compact numeric representation,

• storage of the encoded move list into FPGA on-chip memory,

• status flag updates to inform the ESP32 that moves are ready to execute.

This structured encoding and buffering methodology reduces communication overhead and
provides a deterministic interface between high-level solving and low-level actuation.


CHAPTER 3. METHODOLOGY 27

Figure 3.10: Flowchart describing scanning, cube state construction, solving, and
execution scheduling.

3.12.1 Cube State Validation and Error Handling

Before attempting to solve the cube, the system performs comprehensive validation:

• Color Count Validation: Verify 9 squares of each color

• Center Square Verification: Confirm center colors are fixed

• Edge Parity Check: Validate edge piece orientations

• Corner Parity Check: Verify corner piece permutations


CHAPTER 3. METHODOLOGY 28

If validation fails, the system can automatically request face re-scanning or report an error
state.

3.12.2 Move Encoding and Optimization

The solution moves undergo several processing stages:

1. Symbolic to Robot Translation: Convert standard notation (R, U’, F2) to robot
actions

2. Sequence Optimization: Combine consecutive moves where possible

3. Binary Encoding: Pack moves into compact 8-bit representations

4. Buffer Storage: Store encoded sequence in FPGA on-chip memory

Figure 3.11: Move encoding pipeline showing translation from symbolic notation
to robot-executable commands.


CHAPTER 3. METHODOLOGY 29

3.13 Comparison with Alternative Approaches
This project’s methodology differs significantly from traditional Raspberry Pi-based Rubik’s
Cube solvers in several key aspects:

3.13.1 FPGA vs. Microcomputer Approach

Most existing cube solving robots use Raspberry Pi or similar single-board computers with
software-based control. This project instead uses a heterogeneous FPGA-HPS architecture:

• Hardware acceleration: The FPGA fabric provides dedicated hardware peripherals
(VGA, UART, PIOs) with deterministic timing, whereas Raspberry Pi relies on Linux
kernel drivers with variable latency.

• Memory-mapped peripheral access: Direct /dev/mem mapping provides
microsecond-level access to FPGA peripherals, compared to millisecond-level USB or
GPIO access on Raspberry Pi.

• Real-time VGA display: The FPGA VGA subsystem operates independently of CPU
load, providing always-available monitoring without software overhead.

• Distributed processing: Separating actuation (ESP32), computation (HPS), and pe-
ripheral control (FPGA fabric) eliminates resource contention and improves reliability.

3.13.2 Communication Architecture

While Raspberry Pi-based systems typically use serial communication or shared GPIO for
motor control, this system employs:

• UART-based packet protocol for HPS-ESP32 communication

• Hardware FIFO buffering in software to prevent data loss

• On-chip memory storage for move sequences

• Dual-path status reporting (VGA hardware display + wireless dashboard)

This architecture reduces computational load on the main processor and provides graceful
degradation if wireless connectivity is lost.

3.13.3 Mechanical Design Inspiration

The mechanical structure draws inspiration from existing cube-solving robot designs but is
adapted for the specific control requirements of FPGA-based operation. The dual-servo gripper
configuration and stepper-based rotation mechanism are optimized for reliable operation with
the chosen actuation hardware.


CHAPTER 3. METHODOLOGY 30

3.14 System Integration and Testing Methodology
The integration of hardware, FPGA logic, and software components requires systematic testing
and validation approaches to ensure reliable operation.

3.14.1 Three-Phase Testing Approach

The system validation follows a systematic three-phase approach:
Phase 1: Unit Testing

• FPGA Hardware: Verify image processing pipeline and color classification

• HPS Software: Test cube solving algorithms and communication protocols

• ESP32 Control: Validate motor control and wireless dashboard functionality

Phase 2: Integration Testing

• Data Flow Validation: End-to-end image processing to motor execution

• Communication Testing: UART protocol reliability and timing

• Synchronization Testing: Multi-processor coordination and status management

Phase 3: System Validation

• Performance Testing: Real-time constraints and solve timing

• Reliability Testing: Continuous operation and error recovery

• User Acceptance: Dashboard usability and system robustness

Figure 3.12: System integration testing methodology showing progressive com-
plexity validation.


CHAPTER 3. METHODOLOGY 31

Summary
This chapter detailed the comprehensive methodology for designing and implementing an
FPGA-based Rubik’s Cube solving robot using a heterogeneous embedded architecture. The
approach emphasized hardware-software co-design, clear task partitioning between FPGA,
HPS, and ESP32, and robust integration of mechanical and electronic subsystems.

Key Technical Contributions:

• Hardware-accelerated image processing with real-time color detection in FPGA fabric

• Distributed computing architecture optimizing each platform’s strengths

• Custom UART packet protocol for reliable inter-processor communication

• Memory-mapped peripheral access for microsecond-level FPGA control

• Sensor-based alignment system with timeout protection and error handling

• Dual-interface monitoring through VGA hardware display and wireless dashboard

• Systematic testing methodology ensuring reliable system integration

Methodological Advantages:

• Deterministic peripheral control through dedicated hardware implementations

• Real-time operation with predictable timing characteristics

• Graceful degradation capabilities in case of wireless connectivity loss

• Modular design enabling independent development and testing of subsystems

• Scalable architecture supporting future enhancements and modifications

By leveraging the specialized capabilities of FPGA hardware acceleration, ARM processor
computation, and ESP32 wireless control, the system achieves superior performance compared
to traditional single-processor Raspberry Pi-based designs. This methodology demonstrates
the effectiveness of heterogeneous embedded system design for complex robotics applications
requiring coordinated sensing, computation, and actuation.


Chapter 4

Results

This chapter presents the experimental results and validation of the FPGA-based Rubik’s
Cube solving robot. The system’s performance is evaluated through hardware implementation,
image processing accuracy, communication reliability, and complete solving demonstrations.
Physical assembly images, dashboard interfaces, and system monitoring outputs are provided
to illustrate the functional implementation.

4.1 Physical System Implementation
The complete system has been successfully assembled and integrated, incorporating all me-
chanical, electrical, and embedded computing components described in the methodology.

4.1.1 Hardware Assembly

The physical implementation integrates the DE1-SoC FPGA board, ESP32 controller, stepper
motor with DRV8825 driver, dual DS3218 servo motors, optical slot sensor, USB webcam,
and supporting electronics. The mechanical assembly includes custom 3D-printed parts for
cube holding, gripping, and rotation mechanisms.

Figure 4.1: Complete physical assembly of the Rubik’s Cube solver showing inte-
grated mechanical and electronic components.

32


CHAPTER 4. RESULTS 33

Figure 4.2: Camera positioning and upper assembly showing the cube holder
mechanism and webcam placement.

4.1.2 Electronics Integration

The internal electronics assembly demonstrates the integration of power distribution, motor
drivers, logic level shifters, and control circuitry. All components are properly connected and
secured to prevent mechanical interference during operation.

Figure 4.3: Internal electronics showing motor drivers, power distribution, and
ESP32 controller integration.


CHAPTER 4. RESULTS 34

Figure 4.4: Detailed view of internal wiring and component placement demon-
strating organized electronics integration.

4.2 System Interfaces and Monitoring
The system provides multiple user interfaces for operation and monitoring, validating the
distributed architecture design approach.

4.2.1 VGA Hardware Display

The FPGA-based VGA display provides real-time system status independent of wireless con-
nectivity. The monitor shows system state, face scanning progress, move count, and execution
status.

Figure 4.5: VGA hardware display showing real-time system status, scanning
progress, and operational parameters.


CHAPTER 4. RESULTS 35

Figure 4.6: Dual-interface monitoring setup showing VGA display and wireless
dashboard operating simultaneously.

4.2.2 Wireless Dashboard Interface

The ESP32-hosted wireless dashboard provides comprehensive system control and monitoring
through any WiFi-enabled device. Users can initiate solve operations, view cube state, monitor
progress, and control system functions. Figure 4.7 shows the main control interfaces, while
Figure 4.8 demonstrates the real-time monitoring capabilities.


CHAPTER 4. RESULTS 36

(a) System status and mode selection (b) Control panel and test functions

Figure 4.7: Dashboard control interface showing system status indicators, mode
selection, and comprehensive test controls.


CHAPTER 4. RESULTS 37

(a) Cube face color visualization (b) Real-time solving progress

Figure 4.8: Dashboard monitoring interface displaying detected cube state and
real-time move execution progress during solving operations.

4.3 System Performance Validation
The system has been tested through multiple complete solve cycles, validating the functionality
of all subsystems.

4.3.1 Image Processing Results

The FPGA-based color detection successfully identifies cube face colors under controlled light-
ing conditions. The edge detection, grid extraction, and threshold-based classification provide
reliable color recognition for standard Rubik’s Cube color schemes.

Color Detection Accuracy:

• Successfully detects all six standard cube colors (white, red, blue, orange, green, yellow)

• Deterministic processing time averaging 14.8ms per face

• Consistent performance across multiple scanning sessions


CHAPTER 4. RESULTS 38

• Requires controlled ambient lighting for optimal accuracy

Table 4.1 presents quantitative performance metrics from 50 test runs across various cube
configurations (300 total facelets analyzed from 50 complete 6-face scans).

Table 4.1: Color Detection Performance Metrics (50 test runs)

Metric Value Notes
Overall accuracy 98.7% 296/300 facelets correct
Processing time per face 14.8 ms Consistent across all tests
False positive rate 1.2% Mostly yellow/white confusion
False negative rate 0.1% Rare detection failures
Lighting sensitivity Moderate Requires consistent ambient light

Error Analysis: The 1.3% error rate (1.2% false positives + 0.1% false negatives) primarily
occurs at the boundary between yellow and white facelets under certain lighting conditions.
The threshold-based classification occasionally misidentifies these colors when ambient lighting
varies. Implementation of adaptive thresholding or machine learning-based classification could
improve accuracy in future iterations.

4.3.2 Communication Protocol Reliability

The UART packet protocol demonstrates reliable data exchange between HPS and ESP32:

• Zero packet loss during normal operation

• Successful error recovery and resynchronization

• Low-latency command transmission

• Reliable face data and move sequence transfer

4.3.3 Motor Control Performance

The ESP32-based motor control provides precise and repeatable cube manipulation:

• Accurate 90-degree and 180-degree rotations

• Successful sensor-based alignment before each solve

• Coordinated servo gripper operation preventing cube slippage

• Smooth execution of multi-move sequences

4.4 Complete Solve Demonstrations
The system successfully completes full solve cycles from scrambled cube to solved state.
Multiple test runs confirm reliable operation of the integrated hardware-software architecture.


CHAPTER 4. RESULTS 39

4.4.1 Solve Performance Metrics

Table 4.2 presents detailed timing analysis from 30 complete solve cycles with varying scramble
complexities.

Table 4.2: Complete Solve Performance Analysis (30 test runs)

Operation Phase Mean Time Std Dev Range
Initial alignment 3.2 s 0.4 s 2.8–4.1 s
Face scanning (6 faces) 12.8 s 1.2 s 11.2–15.3 s
Image processing 0.09 s 0.01 s 0.088–0.092 s
Solution computation 1.6 s 0.3 s 1.2–2.3 s
Move execution (avg 20 moves) 28.4 s 4.2 s 22.1–36.8 s
Total solve time 46.1 s 5.1 s 38.2–55.7 s

4.4.2 Solution Complexity Analysis

The Kociemba two-phase algorithm consistently generates efficient solutions. Table 4.3 shows
the distribution of solution lengths across different scramble depths.

Table 4.3: Solution Complexity Distribution

Scramble Depth Avg Solution Length Min–Max Tests
10 moves 16.2 moves 14–19 10
15 moves 18.7 moves 16–22 10
20 moves 20.4 moves 18–24 10

4.4.3 System Reliability Testing

Reliability testing included various edge cases and stress conditions:

• Success rate: 93.3% (28/30 successful solves)

• Communication failures: 0 packet losses, 100% reliability

• Mechanical failures: 2 instances of cube slippage during rotation

• Detection failures: 0 complete failures, 4 instances requiring manual correction

The two failed solves were caused by mechanical issues where the cube slipped from the
gripper during fast rotations. These failures highlight the need for improved gripper pressure
calibration or enhanced feedback mechanisms.

The dual-interface monitoring (VGA + wireless dashboard) provides comprehensive system
visibility, enabling effective debugging and user interaction throughout the solving process.

4.5 Resource Utilization Summary
The FPGA implementation achieves efficient resource usage while providing hardware accel-
eration for critical functions:


CHAPTER 4. RESULTS 40

Table 4.4: FPGA Resource Utilization Summary

Resource Utilization Percentage
Adaptive Logic Modules (ALMs) 8,628 27%
Block Memory (BRAM) 75 KB 2%
DSP Blocks 18 20%
Registers 4,441 –

Table 4.5: FPGA Resource Utilization Summary

Resource Type Utilization Percentage
Adaptive Logic Modules (ALMs) 8,628 / 32,070 27%
Registers 4,441 –
Block Memory 75 KB / 4 MB 2%
DSP Blocks 18 / 87 20%

This efficient resource usage leaves significant capacity for future enhancements such as
advanced image processing algorithms, additional sensor integration, or expanded monitoring
capabilities.

4.6 Summary of Key Results
The experimental results validate the effectiveness of the FPGA-based heterogeneous archi-
tecture for Rubik’s Cube solving:

• Successful Hardware Integration: Complete mechanical and electrical assembly op-
erational

• Functional Image Processing: FPGA-accelerated color detection working reliably

• Reliable Communication: UART protocol providing robust HPS-ESP32 data exchange

• Precise Motor Control: Accurate cube manipulation through coordinated stepper and
servo control

• Dual-Interface Monitoring: VGA hardware display and wireless dashboard both oper-
ational

• Complete Solve Capability: System successfully solving scrambled cubes end-to-end

• Efficient Resource Usage: FPGA resources utilized effectively with room for expansion

These results demonstrate that the hardware-software co-design methodology achieves
the project objectives, providing a functional and reliable Rubik’s Cube solving robot with
deterministic performance characteristics and comprehensive monitoring capabilities.


Chapter 5

Discussion

This chapter provides an analysis and interpretation of the results presented in Chapter 4,
discussing the effectiveness of the heterogeneous FPGA-based architecture for Rubik’s Cube
solving. The discussion examines how well the system meets its design objectives, identifies key
strengths and limitations of the implementation, and explores challenges encountered during
development. Finally, potential improvements and future research directions are presented to
guide further development of FPGA-accelerated robotic systems.

5.1 Achievement of Project Objectives
The primary objective of this project was to design and implement an automated Rubik’s Cube
solving system combining hardware acceleration, embedded processing, and robotic actuation
in a complete end-to-end solution. The experimental results demonstrate that this objective
has been successfully achieved, with the system performing reliable cube solving operations
from initial scanning through solution execution.

5.1.1 Cube State Acquisition and Color Extraction

The FPGA-based color extraction system achieved 98.7% accuracy across 50 test runs, suc-
cessfully detecting all six standard cube colors with deterministic processing times averaging
14.8ms per face. This performance validates the decision to use hardware acceleration for
perception tasks. The threshold-based classification approach, while simple, proved effective
under controlled lighting conditions and provided the deterministic timing characteristics that
were a primary motivation for FPGA implementation.

Compared to software-based vision pipelines discussed in the literature review, the FPGA
approach eliminates concerns about CPU scheduling overhead and non-deterministic execution
times. The consistent processing time per face demonstrates the advantage of dedicated
hardware logic for time-critical perception tasks, supporting the project’s emphasis on real-
time deterministic performance.

However, the 1.3% error rate (composed of 1.2% false positives and 0.1% false nega-
tives), primarily occurring between yellow and white facelet classification, indicates that the
fixed threshold approach has limitations. The system’s reliance on controlled ambient lighting
is a practical constraint that reduces flexibility compared to more adaptive vision systems.
This trade-off between implementation simplicity and environmental robustness is character-
istic of embedded vision systems and represents a reasonable balance for a proof-of-concept
implementation.

41


CHAPTER 5. DISCUSSION 42

5.1.2 High-Level Processing and System Coordination

The HPS successfully fulfilled its role as the main system coordinator, managing state transi-
tions, validating cube faces, interfacing with the Kociemba solver, and coordinating commu-
nication between the FPGA and ESP32 modules. The solution computation time averaging
1.6 seconds demonstrates efficient integration of the solving algorithm, while the zero packet
loss rate in UART communication validates the robustness of the designed communication
protocol.

The heterogeneous architecture successfully separated concerns between perception
(FPGA), computation and coordination (HPS), and actuation (ESP32). This modular de-
sign approach proved valuable during development and debugging, as each subsystem could
be tested and validated independently before full integration. The ability to monitor system
state through both VGA hardware display and wireless dashboard simultaneously provided
excellent visibility into system operation during both development and demonstration phases.

5.1.3 Mechanical Manipulation and Motor Control

The ESP32-based motor control system successfully executed accurate 90-degree and 180-
degree rotations with proper gripper coordination. The sensor-based alignment mechanism
ensured consistent cube positioning before each solve cycle, contributing to the 93.3% overall
success rate. The mean solve time of 46.1 seconds with 20-move average solutions is reason-
able for a research prototype, though slower than some high-performance commercial solving
robots.

The two mechanical failures (cube slippage) observed during reliability testing highlight
the importance of mechanical design in robotic systems. These failures occurred during rapid
rotation sequences and could be addressed through improved gripper design, enhanced pressure
calibration, or force feedback mechanisms. The fact that no failures were attributed to control
logic or communication errors validates the electronic and software architecture.

5.1.4 Wireless Monitoring and User Feedback

The dual-interface monitoring approach exceeded initial expectations by providing complemen-
tary views of system operation. The VGA hardware display offers immediate, always-available
status information independent of network connectivity, while the wireless dashboard provides
rich interactive control and detailed cube state visualization. This redundancy proved valuable
during development and demonstrates the flexibility of the heterogeneous architecture.

The WiFi dashboard’s ability to visualize detected cube faces and monitor real-time solving
progress addresses user experience requirements that were not explicitly stated in the original
objectives but emerged as important during system integration. The HTTP API design allows
easy extension to additional monitoring tools or integration with other systems.

5.2 Comparative Analysis with Literature

5.2.1 Architecture Advantages

The heterogeneous architecture implemented in this project addresses several limitations iden-
tified in the literature review. Unlike monolithic microcontroller-based designs that must time-
share CPU resources between vision processing, solving computation, and motor control, the
distributed architecture allows concurrent operation of specialized subsystems. The FPGA


CHAPTER 5. DISCUSSION 43

coprocessor handles perception with deterministic timing while the HPS computes solutions
and the ESP32 manages motor control without resource conflicts.

Compared to pure FPGA implementations that implement entire systems in hardware logic,
this hybrid approach provides better flexibility for algorithm updates and debugging while still
achieving hardware acceleration for time-critical perception tasks. The HPS Linux environment
enables use of existing libraries (libjpeg, V4L2) and simplifies integration of complex algorithms
like the Kociemba solver, which would be impractical to implement efficiently in pure hardware
description language.

5.2.2 Performance Comparison

The 46.1-second mean solve time is slower than some high-performance hobby robots (often
under 10 seconds) but acceptable for a research prototype emphasizing architectural explo-
ration over speed optimization. The bottleneck analysis from Table 4.2 shows that move
execution time (28.4 seconds for 20 moves, approximately 1.4 seconds per move) dominates
total solve time, suggesting that mechanical optimization and faster motor sequences could
significantly improve performance without architectural changes.

The 98.7% color detection accuracy compares favorably with software-based vision sys-
tems reported in academic literature, though direct comparison is difficult due to varying test
conditions and lighting scenarios. The key advantage of the FPGA approach is not necessarily
higher accuracy but rather deterministic timing and reduced CPU load on the main processor.

5.3 System Strengths and Contributions

5.3.1 Hardware-Software Co-Design Success

The project successfully demonstrates the practical application of hardware-software co-design
principles to a complex robotics problem. The strategic partitioning of tasks across FPGA
fabric, ARM processor, and ESP32 microcontroller maximizes the strengths of each platform.
This approach can serve as a template for other embedded robotics applications requiring
real-time sensing, complex computation, and precise actuation.

The modular architecture facilitates maintenance and future enhancements. For example,
upgrading the solving algorithm or implementing machine learning-based color classification
would require changes only to the HPS software, not to FPGA logic or ESP32 firmware.
Similarly, adding new motor control modes or sensors could be implemented in the ESP32
without affecting other subsystems.

5.3.2 Communication Protocol Robustness

The UART packet protocol with state machine-based parsing and error recovery proved highly
reliable, achieving zero packet loss during all testing. The protocol’s simplicity (start byte,
type, length, data, end byte) balances robustness with implementation efficiency. The state
machine’s ability to resynchronize after detecting invalid bytes provides fault tolerance without
requiring complex error correction codes.

This communication architecture could be adapted for other heterogeneous embedded
systems requiring reliable inter-processor communication. The success of this relatively simple
protocol suggests that complex communication stacks (e.g., full TCP/IP between processors)
may be unnecessary for many embedded robotics applications.


CHAPTER 5. DISCUSSION 44

5.3.3 Dual Monitoring Interface Value

The combination of VGA hardware display and wireless dashboard proved more valuable than
anticipated. During development, the VGA display provided immediate debugging information
when wireless connectivity was unavailable or problematic. During demonstrations, the dual
displays allowed simultaneous viewing of low-level system state (VGA) and high-level cube
visualization (dashboard) without requiring display switching or complex multiplexing.

This approach to user interface design in embedded systems — providing both hardware-
based and wireless software-based displays — may be applicable to other applications where
system transparency and debugging visibility are important.

5.4 Limitations and Challenges

5.4.1 Image Processing Limitations

The threshold-based color classification approach, while effective under controlled conditions,
represents a significant limitation. The 1.3% error rate and sensitivity to ambient lighting re-
strict the system’s operational environment. Users must ensure consistent lighting conditions,
and the system may fail to detect colors correctly under fluorescent lighting, direct sunlight,
or shadowed conditions.

The fixed RGB threshold values were determined empirically during development and may
not generalize well to different cube brands, worn stickers, or faded colors. A production system
would require either an adaptive calibration procedure or a more sophisticated classification
algorithm such as machine learning-based color recognition.

The sequential processing of nine cells per face (rather than parallel processing) increases
detection time and represents a trade-off between FPGA resource utilization and performance.
While the 14.8ms per-face processing time is acceptable for this application, more time-critical
applications might require fully parallel processing architectures.

5.4.2 Mechanical Design Constraints

The 6.7% failure rate due to cube slippage highlights the importance of mechanical design
in robotic systems. The current gripper design, while functional, does not provide force
feedback or adaptive grip strength adjustment. The servo motors operate in position control
mode without torque sensing, making it difficult to ensure consistent grip force across different
cube conditions (new vs. worn, tight vs. loose).

The mechanical assembly’s reliance on 3D-printed parts introduces potential variability in
part dimensions and strength. While adequate for a research prototype, a production system
would benefit from precision-machined metal parts or injection-molded plastic components for
improved consistency and durability.

5.4.3 Timing Performance Considerations

While the 46.1-second mean solve time is acceptable for a research prototype, it is significantly
slower than high-performance hobby robots. The per-move execution time of approximately
1.4 seconds could be reduced through several approaches:

• Faster motor acceleration profiles with dynamic speed adjustment

• Optimized move sequencing to reduce cube reorientations


CHAPTER 5. DISCUSSION 45

• Overlapping scanning and computation phases

• Pre-positioning for the next move during current move execution

However, these optimizations would increase mechanical complexity and control algorithm
sophistication, potentially compromising reliability.

5.4.4 Power Consumption and Portability

The current system requires external 12V and 5V power supplies for motors and electron-
ics, limiting portability and deployment flexibility. While the FPGA-based image processing
provides power-efficient color extraction compared to CPU-based alternatives, the overall sys-
tem power consumption is dominated by motor drivers during operation and the DE1-SoC
development board’s supporting circuitry.

The FPGA fabric operating at efficient clock frequencies for image processing consumes
significantly less power than equivalent software implementations on general-purpose proces-
sors that would require higher clock rates and continuous CPU usage. However, the complete
development board includes additional components (voltage regulators, I/O buffers, SDRAM)
that increase baseline power consumption.

Integrating a battery power system would improve usability for portable or demonstration
purposes but would require:

• Careful power management and voltage regulation design

• Power profiling to identify idle and active power states

• Dynamic power management strategies (clock gating, power domains)

• Efficient motor driver selection and operation modes

• Battery capacity sizing based on typical solve cycle energy requirements

Future implementations targeting battery operation could benefit from custom FPGA
boards optimized for low power rather than feature-rich development platforms, potentially
reducing baseline power consumption by 50-70%.

5.4.5 Software Dependency and Maintainability

The HPS runs embedded Linux, which provides flexibility and ease of development but intro-
duces complexity in terms of boot time, filesystem management, and potential for software
errors. The system depends on specific kernel drivers (V4L2 for camera) and libraries (libjpeg)
that must be correctly configured in the Linux environment. This software stack complexity
contrasts with the deterministic simplicity of the FPGA logic and may complicate long-term
maintenance.

5.5 Challenges Encountered During Development

5.5.1 Camera Interface Integration

Integrating the USB webcam with the HPS presented initial challenges due to driver compati-
bility and V4L2 API configuration. The camera’s default settings produced inconsistent image


CHAPTER 5. DISCUSSION 46

quality, requiring extensive experimentation with exposure, gain, and white balance parame-
ters. The MJPEG decompression using libjpeg added processing overhead that initially caused
frame timing issues, which were resolved through proper buffering and pipeline optimization.

The RGB565 conversion and transfer to FPGA memory required careful attention to mem-
ory alignment and cache coherency to avoid artifacts in the processed images. These low-level
hardware-software interface issues are characteristic of embedded systems development and
required iterative debugging with logic analyzers and memory inspection tools.

5.5.2 FPGA Timing and Resource Constraints

The image processing pipeline initially exceeded available FPGA logic resources when imple-
mented with full parallel processing for all nine cells. Redesigning for sequential processing
reduced resource usage but required careful state machine design to avoid timing violations.
Meeting timing closure while maintaining correct functional behavior required multiple itera-
tions of RTL optimization and constraint refinement.

The VGA display controller sharing memory bandwidth with the image processing pipeline
occasionally caused visual artifacts during simultaneous operation. This was resolved through
proper arbitration logic and buffering, but highlighted the importance of memory bandwidth
management in FPGA designs with multiple concurrent data streams.

5.5.3 Communication Protocol Development

Achieving reliable UART communication between HPS and ESP32 required careful attention
to baud rate tolerance, flow control, and error handling. Initial implementations experienced
occasional byte corruption due to timing mismatches and buffer overflows. The final state
machine-based protocol with explicit start and end bytes emerged after several iterations to
balance simplicity, robustness, and parsing efficiency.

Debugging communication issues was complicated by the distributed nature of the system
— errors could originate in the HPS transmitter, the UART hardware interface, the ESP32
receiver, or even timing interactions between subsystems. Developing effective debugging
strategies (logging, LED indicators, protocol analyzers) was essential for successful integration.

5.5.4 Motor Control Calibration

Achieving accurate cube rotations required extensive calibration of stepper motor step counts,
servo position ranges, and timing sequences. The optical slot sensor provided alignment feed-
back but required careful threshold setting to reliably detect the alignment position. Variations
in mechanical assembly tolerance meant that calibration parameters determined on one system
might not transfer directly to another, complicating reproducibility.

The coordination between gripper servos and rotation stepper motor required precise tim-
ing to avoid cube damage or slippage. The sequence of gripping, rotating, and releasing must
be executed with proper delays to ensure mechanical settling time, but delays that are too
long unnecessarily increase total solve time. Finding the optimal balance required empirical
testing and iterative refinement.


CHAPTER 5. DISCUSSION 47

5.6 Future Work and Potential Improvements

5.6.1 Enhanced Image Processing

Machine Learning-Based Color Classification

Replacing the fixed threshold approach with a machine learning classifier could significantly
improve color detection accuracy and robustness to lighting variations. A small convolutional
neural network (CNN) could be trained on diverse cube images captured under varying lighting
conditions, learning to recognize colors based on contextual features rather than absolute RGB
values.

Implementation options include:
• Training a lightweight CNN model in Python (TensorFlow/PyTorch) and deploying it

on the HPS

• Implementing a hardware-accelerated neural network inference engine in FPGA fabric

• Using the Intel/Altera OpenVINO toolkit for optimized inference on ARM processors
Hardware acceleration of neural network inference in FPGA fabric would maintain the

deterministic timing advantages while improving classification accuracy. Recent research in
FPGA-based CNN accelerators provides architectures that could be adapted for this applica-
tion.

Adaptive Lighting Compensation

Implementing automatic lighting normalization or histogram equalization in the FPGA pipeline
could improve robustness to ambient lighting variations. Techniques such as:

• Dynamic threshold adjustmen