An-Najah National University Department of Computer Engineering Graduation Project II CubeBot: FPGA-Based Rubik’s Cube Solver Momen Anani Mohammad Hamdan Supervisor: Dr. Suleiman Abu Kharmeh A report submitted in partial fulfilment of the requirements of An-Najah National University for the degree of Bachelor of Science in Computer Engineering January 24, 2026 Abstract This project presents the design and implementation of an automated Rubik’s Cube solving robot using a heterogeneous embedded system architecture that combines FPGA hardware acceleration, ARM processor coordination, and ESP32-based motor control. Unlike traditional microcontroller-only or PC-based approaches, the system strategically distributes tasks across specialized computing platforms to achieve deterministic real-time performance, modular de- sign, and reliable operation. The system architecture integrates three cooperating units: a DE1-SoC FPGA fabric im- plementing hardware-accelerated color extraction and VGA display, an ARM Cortex-A9 Hard Processor System (HPS) managing high-level coordination and solution computation, and an ESP32 module handling motor control and wireless dashboard connectivity. The FPGA processes cube face images with deterministic 14.8ms timing using threshold-based color clas- sification, while the HPS executes the Kociemba two-phase solving algorithm and validates cube state consistency. The ESP32 coordinates stepper and servo motors to physically ma- nipulate the cube with sensor-based alignment feedback. Communication between subsystems uses a custom UART packet protocol with state machine-based error recovery, achieving 100% reliability across all testing. The system provides dual monitoring interfaces through FPGA-based VGA hardware display and ESP32-hosted wireless web dashboard, enabling comprehensive system visibility and user control. Experimental results demonstrate 98.7% color detection accuracy, 93.3% solve success rate, and mean solve time of 46.1 seconds. The modular architecture achieved efficient FPGA resource utilization (27% ALMs, 2% block memory, 20% DSP blocks) while maintaining flexibility for future enhancements. Testing across 30 complete solve cycles validated the ef- fectiveness of the heterogeneous design approach for robotics applications requiring integrated perception, computation, and actuation. This work demonstrates how hardware-software co-design principles can address the limita- tions of monolithic embedded systems, providing a practical architecture for FPGA-accelerated robotics that balances real-time performance with implementation simplicity and debugging accessibility. Keywords: FPGA, Heterogeneous Architecture, Rubik’s Cube Solver, Hardware Accel- eration, Embedded Systems, Image Processing, Robotics, DE1-SoC, ESP32, Hardware- Software Co-Design i Contents List of Figures v List of Tables vii 1 Introduction 1 1.1 General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Significance and Importance . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Summary of Contributions and Achievements . . . . . . . . . . . . . . . . . 3 1.5 Organization of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Rubik’s Cube Solving Systems: Overview and System Requirements . . . . . 5 2.3 Cube Perception and Color Extraction Methods . . . . . . . . . . . . . . . . 6 2.3.1 Software-Based Image Processing Pipelines . . . . . . . . . . . . . . 6 2.3.2 Lighting Robustness and Color Space Selection . . . . . . . . . . . . 6 2.3.3 Hardware-Accelerated Vision and FPGA-Based Extraction . . . . . . . 7 2.4 Cube Solving Algorithms and Solution Generation . . . . . . . . . . . . . . . 7 2.4.1 Two-Phase Solving and Kociemba Algorithm . . . . . . . . . . . . . . 7 2.4.2 State Validation and Error Handling . . . . . . . . . . . . . . . . . . 7 2.5 Robotic Manipulation and Motion Control . . . . . . . . . . . . . . . . . . . 7 2.5.1 Servo and Stepper Motor Approaches . . . . . . . . . . . . . . . . . 8 2.5.2 Synchronization and Execution Order . . . . . . . . . . . . . . . . . 8 2.6 System Integration and Heterogeneous Embedded Architectures . . . . . . . 8 2.7 Research Gap and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology 10 3.1 System Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Hardware Components and System Architecture . . . . . . . . . . . . . . . . 11 3.2.1 External Interface and Monitoring . . . . . . . . . . . . . . . . . . . 13 3.2.2 Mechanical Assembly Considerations . . . . . . . . . . . . . . . . . . 14 3.3 Camera Interface and USB Video Capture . . . . . . . . . . . . . . . . . . . 15 3.3.1 Camera Configuration and Frame Acquisition . . . . . . . . . . . . . 15 3.3.2 Frame Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 FPGA-Based Image Processing and Color Detection . . . . . . . . . . . . . . 15 3.4.1 Hardware-Accelerated Image Processing Pipeline . . . . . . . . . . . 16 3.4.2 Adaptive Color Classification System . . . . . . . . . . . . . . . . . . 16 ii CONTENTS iii 3.4.3 Processing Performance Characteristics . . . . . . . . . . . . . . . . 17 3.5 Communication Protocols and Data Exchange . . . . . . . . . . . . . . . . . 18 3.5.1 HPS-FPGA Communication Protocol . . . . . . . . . . . . . . . . . . 18 3.5.2 UART Packet Protocol (HPS-ESP32) . . . . . . . . . . . . . . . . . 18 3.5.3 HTTP API Protocol (ESP32-Dashboard) . . . . . . . . . . . . . . . 19 3.6 Hardware–Software Co-Design and Task Partitioning . . . . . . . . . . . . . 20 3.6.1 FPGA Fabric Responsibilities . . . . . . . . . . . . . . . . . . . . . . 20 3.6.2 HPS (Main Processor) Responsibilities . . . . . . . . . . . . . . . . . 20 3.6.3 ESP32 Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.7 FPGA Platform Designer System Integration . . . . . . . . . . . . . . . . . . 21 3.7.1 Custom IP Block Integration . . . . . . . . . . . . . . . . . . . . . . 22 3.8 Memory-Mapped Communication Between HPS and FPGA . . . . . . . . . . 22 3.9 VGA Real-Time Status Display . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.10 Wireless Dashboard Method (ESP32) . . . . . . . . . . . . . . . . . . . . . 24 3.11 Motor Control and Alignment Method (ESP32) . . . . . . . . . . . . . . . . 24 3.11.1 Stepper Motor Control . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.11.2 Dual-Servo Gripper Control . . . . . . . . . . . . . . . . . . . . . . . 25 3.11.3 Sensor-Based Alignment . . . . . . . . . . . . . . . . . . . . . . . . 25 3.12 Solving, Encoding, and Execution Scheduling . . . . . . . . . . . . . . . . . 26 3.12.1 Cube State Validation and Error Handling . . . . . . . . . . . . . . . 27 3.12.2 Move Encoding and Optimization . . . . . . . . . . . . . . . . . . . 28 3.13 Comparison with Alternative Approaches . . . . . . . . . . . . . . . . . . . . 29 3.13.1 FPGA vs. Microcomputer Approach . . . . . . . . . . . . . . . . . . 29 3.13.2 Communication Architecture . . . . . . . . . . . . . . . . . . . . . . 29 3.13.3 Mechanical Design Inspiration . . . . . . . . . . . . . . . . . . . . . 29 3.14 System Integration and Testing Methodology . . . . . . . . . . . . . . . . . 30 3.14.1 Three-Phase Testing Approach . . . . . . . . . . . . . . . . . . . . . 30 4 Results 32 4.1 Physical System Implementation . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Hardware Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Electronics Integration . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 System Interfaces and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 VGA Hardware Display . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 Wireless Dashboard Interface . . . . . . . . . . . . . . . . . . . . . . 35 4.3 System Performance Validation . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Image Processing Results . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 Communication Protocol Reliability . . . . . . . . . . . . . . . . . . 38 4.3.3 Motor Control Performance . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Complete Solve Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4.1 Solve Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.2 Solution Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . 39 4.4.3 System Reliability Testing . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6 Summary of Key Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 CONTENTS iv 5 Discussion 41 5.1 Achievement of Project Objectives . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Cube State Acquisition and Color Extraction . . . . . . . . . . . . . . 41 5.1.2 High-Level Processing and System Coordination . . . . . . . . . . . . 42 5.1.3 Mechanical Manipulation and Motor Control . . . . . . . . . . . . . . 42 5.1.4 Wireless Monitoring and User Feedback . . . . . . . . . . . . . . . . 42 5.2 Comparative Analysis with Literature . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Architecture Advantages . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3 System Strengths and Contributions . . . . . . . . . . . . . . . . . . . . . . 43 5.3.1 Hardware-Software Co-Design Success . . . . . . . . . . . . . . . . . 43 5.3.2 Communication Protocol Robustness . . . . . . . . . . . . . . . . . . 43 5.3.3 Dual Monitoring Interface Value . . . . . . . . . . . . . . . . . . . . 44 5.4 Limitations and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.4.1 Image Processing Limitations . . . . . . . . . . . . . . . . . . . . . . 44 5.4.2 Mechanical Design Constraints . . . . . . . . . . . . . . . . . . . . . 44 5.4.3 Timing Performance Considerations . . . . . . . . . . . . . . . . . . 44 5.4.4 Power Consumption and Portability . . . . . . . . . . . . . . . . . . 45 5.4.5 Software Dependency and Maintainability . . . . . . . . . . . . . . . 45 5.5 Challenges Encountered During Development . . . . . . . . . . . . . . . . . 45 5.5.1 Camera Interface Integration . . . . . . . . . . . . . . . . . . . . . . 45 5.5.2 FPGA Timing and Resource Constraints . . . . . . . . . . . . . . . . 46 5.5.3 Communication Protocol Development . . . . . . . . . . . . . . . . . 46 5.5.4 Motor Control Calibration . . . . . . . . . . . . . . . . . . . . . . . . 46 5.6 Future Work and Potential Improvements . . . . . . . . . . . . . . . . . . . 47 5.6.1 Enhanced Image Processing . . . . . . . . . . . . . . . . . . . . . . . 47 5.6.2 Mechanical and Control Improvements . . . . . . . . . . . . . . . . . 47 5.6.3 System Architecture Extensions . . . . . . . . . . . . . . . . . . . . . 48 5.6.4 Educational and Research Applications . . . . . . . . . . . . . . . . . 49 5.6.5 Alternative Application Domains . . . . . . . . . . . . . . . . . . . . 49 5.7 Long-Term Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.7.1 Fully Autonomous Learning Systems . . . . . . . . . . . . . . . . . . 50 5.7.2 Integration with Advanced Computer Vision . . . . . . . . . . . . . . 51 5.7.3 Swarm Robotics and Collaborative Solving . . . . . . . . . . . . . . . 51 5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6 Conclusion and Recommendations 52 6.1 Summary of Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.1.1 Key Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3 Reflection on Learning Experience . . . . . . . . . . . . . . . . . . . . . . . 53 6.3.1 Key Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3.2 Impact and Future Application . . . . . . . . . . . . . . . . . . . . . 53 6.4 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . . . . . 54 6.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References 56 List of Figures 3.1 Overall Rubik’s Cube solver system flow (FPGA + HPS + ESP32 + peripherals). 11 3.2 3D concept rendering of the combined mechanical and hardware assembly for the Rubik’s Cube solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Frame processing pipeline from USB camera capture to FPGA-based color analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Image processing hardware block diagram showing sequential processing units and resource utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 UART packet structure and communication flow between HPS and ESP32. . 19 3.6 Platform Designer (Qsys) system showing FPGA IP integration and subsystem structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.7 Custom IP block integration within the Platform Designer system showing data flow and control connections. . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.8 Memory-mapped address regions for major FPGA peripherals accessed by the HPS application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.9 Stepper pulse timing diagram showing the relationship between step and direc- tion signals, calculation of steps per rotation (e.g., 800 steps for 90◦), and the use of fast/slow pulse profiles for precise cube manipulation. The alignment sensor is used to establish a home position before executing movements. . . . 26 3.10 Flowchart describing scanning, cube state construction, solving, and execution scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.11 Move encoding pipeline showing translation from symbolic notation to robot- executable commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.12 System integration testing methodology showing progressive complexity vali- dation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Complete physical assembly of the Rubik’s Cube solver showing integrated mechanical and electronic components. . . . . . . . . . . . . . . . . . . . . . 32 4.2 Camera positioning and upper assembly showing the cube holder mechanism and webcam placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Internal electronics showing motor drivers, power distribution, and ESP32 con- troller integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Detailed view of internal wiring and component placement demonstrating or- ganized electronics integration. . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5 VGA hardware display showing real-time system status, scanning progress, and operational parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 Dual-interface monitoring setup showing VGA display and wireless dashboard operating simultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.7 Dashboard control interface showing system status indicators, mode selection, and comprehensive test controls. . . . . . . . . . . . . . . . . . . . . . . . . 36 v LIST OF FIGURES vi 4.8 Dashboard monitoring interface displaying detected cube state and real-time move execution progress during solving operations. . . . . . . . . . . . . . . 37 List of Tables 3.1 Summary of Main Hardware Components . . . . . . . . . . . . . . . . . . . 12 3.2 Summary of 3D-Printed and Mechanical Parts . . . . . . . . . . . . . . . . . 13 3.3 Main FPGA memory-mapped peripherals used by the HPS application. . . . . 23 4.1 Color Detection Performance Metrics (50 test runs) . . . . . . . . . . . . . . 38 4.2 Complete Solve Performance Analysis (30 test runs) . . . . . . . . . . . . . . 39 4.3 Solution Complexity Distribution . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 FPGA Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . 40 4.5 FPGA Resource Utilization Summary . . . . . . . . . . . . . . . . . . . . . . 40 vii List of Abbreviations ALM Adaptive Logic Module API Application Programming Interface ARM Advanced RISC Machine CPU Central Processing Unit ESP32 Espressif Systems 32-bit Microcontroller FPGA Field-Programmable Gate Array GPIO General Purpose Input/Output HDL Hardware Description Language HPS Hard Processor System HSV Hue Saturation Value HTTP Hypertext Transfer Protocol I2C Inter-Integrated Circuit LED Light Emitting Diode MJPEG Motion JPEG PIO Parallel Input/Output PWM Pulse Width Modulation RAM Random Access Memory RGB Red Green Blue RTL Register Transfer Level SMPCS School of Mathematical, Physical and Computational Sciences SoC System on Chip SPI Serial Peripheral Interface UART Universal Asynchronous Receiver/Transmitter USB Universal Serial Bus UVC USB Video Class V4L2 Video4Linux2 VGA Video Graphics Array WiFi Wireless Fidelity viii Chapter 1 Introduction 1.1 General Background The Rubik’s Cube is widely considered one of the most iconic mechanical puzzles, requiring a combination of perception, planning, and precise manipulation to reach a solved state. While humans typically solve the cube through experience and pattern recognition, building an automated Rubik’s Cube solver requires an integrated system capable of sensing the cube’s colors, understanding its current configuration, computing a valid solution, and physically executing the required movements in a reliable and repeatable manner. In recent years, many Rubik’s Cube solving robots have been developed using microcon- trollers and software-based computer vision techniques running on PCs or embedded Linux platforms. These systems often rely on high-level image processing libraries to detect cube colors and determine the cube state, followed by solver algorithms implemented in software. Although these approaches can achieve strong performance, they may face limitations related to latency, nondeterministic execution timing, and dependency on external software stacks. Field-Programmable Gate Arrays (FPGAs) offer a powerful alternative for real-time sens- ing and hardware acceleration due to their inherent parallelism, deterministic behavior, and power efficiency. By implementing time-critical tasks such as color extraction and decision logic in hardware, FPGA-based designs can achieve high throughput, low latency, and better power consumption compared to purely software-driven solutions running on general-purpose processors. The dedicated hardware approach enables efficient parallel processing without the overhead of operating system scheduling and context switching, resulting in lower power per operation while maintaining consistent performance. This makes FPGA technology an attrac- tive candidate for robotics applications that require synchronized sensing, communication, precise control, and energy-efficient operation. This graduation project presents the design and implementation of a Rubik’s Cube solving and control system based on a heterogeneous embedded architecture. In this design, the FPGA is primarily utilized as a dedicated coprocessor for fast cube color extraction, while the Hard Processor System (HPS) serves as the main processor responsible for high-level computation, system coordination, and decision making. The system is extended with wireless monitoring and control capabilities, ... enabling interaction through a wireless dashboard and providing real-time feedback through both the dashboard and a VGA display that shows system statistics during operation. 1 CHAPTER 1. INTRODUCTION 2 1.2 Objectives and Purpose The primary objective of this project is to design and implement an automated Rubik’s Cube solving system that combines hardware acceleration, embedded processing, and robotic ac- tuation in a complete end-to-end solution. The system supports both solving and shuffling modes, with adjustable difficulty levels, while maintaining reliability, accuracy, and real-time operation. The specific objectives of this project include: 1. Cube State Acquisition and Color Extraction: Capture Rubik’s Cube face data through a camera-based acquisition process and perform fast and stable color extraction using FPGA hardware logic. 2. High-Level Processing and System Coordination: Utilize the SoC processing system (HPS) as the main controller responsible for validating detected faces, managing system states, coordinating system modules, and preparing solution data for execution. 3. Mechanical Manipulation and Motor Control: Control servo and stepper motors to physically rotate cube layers and manipulate the cube accurately, using an ESP-based module as a dedicated motor and peripheral controller. 4. Wireless Monitoring and Control: Provide a wireless interface using Wi-Fi or Blue- tooth to enable remote commands, system monitoring, and operation mode selection through a mobile application. 5. User Feedback and Monitoring: Display real-time system status, runtime information, and progress using a VGA output interface, in addition to a wireless dashboard used for monitoring, mode selection, and operation control. This project aims to demonstrate how a hardware-software co-design approach can be used to build a practical robotic solver that leverages the strengths of FPGA acceleration and embedded processing while maintaining a modular and scalable architecture. 1.3 Significance and Importance This project addresses several practical and technical challenges involved in building an auto- mated Rubik’s Cube solver and demonstrates the feasibility of using FPGA acceleration within robotics and intelligent embedded systems. Real-Time Hardware Acceleration and Power Efficiency: Implementing cube color extraction using FPGA logic provides parallel processing capabilities, deterministic timing, and power-efficient operation. Compared to software-based approaches running on general-purpose processors, this enables faster processing, improved stability, and reduced power consumption per operation, which is essential for reliable robotic operation, consistent cube state detection, and potential battery-powered deployments. Heterogeneous Embedded System Design: The system integrates multiple cooperating units, where the FPGA operates as a coprocessor for hardware-accelerated perception, the HPS functions as the main processor for system logic and computation, and the ESP module handles motor control and peripheral interfacing. This architecture reflects modern embedded system design practices where different platforms are used to maximize performance, flexibility, and energy efficiency through strategic task partitioning. CHAPTER 1. INTRODUCTION 3 Robotics and System Integration: Developing a complete Rubik’s Cube solver requires bridging multiple domains including machine perception, communication, mechanical design, and motor control. Successfully integrating a camera module, motors, LCD display, manual inputs, and wireless connectivity demonstrates the ability of embedded systems to manage complex tasks under strict timing constraints. Educational and Practical Value: Beyond solving the cube as a final objective, the system provides strong educational value by combining digital logic design, real-time embedded programming, hardware interfaces (I2C, SPI, UART, PWM), and control principles within one project. The developed architecture can also be adapted for other robotics systems requiring perception and actuation. Extensibility and Future Potential: The design principles used in this project can be extended toward future FPGA-accelerated intelligent systems, especially in applications requir- ing low-latency sensing and deterministic performance. The same approach can be applied to tasks such as object sorting, inspection, or small-scale robotic control systems. 1.4 Summary of Contributions and Achievements This project provides a complete Rubik’s Cube solving system with several contributions in hardware acceleration, embedded coordination, and robotic execution. System Architecture Contributions: • Design of a heterogeneous architecture combining FPGA-based color extraction, HPS high-level coordination, and ESP-based motor control • Implementation of reliable communication across modules using hardware-level inter- faces such as UART, I2C, SPI, and PWM • Development of a modular structure that separates sensing, computation, and actuation responsibilities for improved scalability Perception and Processing Contributions: • Hardware-accelerated cube color extraction using FPGA logic to improve speed and stability • High-level state management and decision logic executed on the HPS to coordinate scanning, validation, and solution handling • Support for multi-mode operation including solving and shuffling with adjustable diffi- culty levels Robotic Execution and Control Achievements: • Integration of servo and stepper motors to physically manipulate the cube with accurate layer rotations • Implementation of synchronization strategies to ensure correct execution order and safe mechanical movement • Real-time monitoring and control through a wireless dashboard and VGA status display Wireless Monitoring and User Interaction: CHAPTER 1. INTRODUCTION 4 • Wireless control and monitoring support through Wi-Fi or Bluetooth communication with a mobile application • Manual control options through key inputs for debugging, testing, or mode switching • A complete interactive workflow from scanning the cube, computing the solution, and executing robot moves Overall, the completed system demonstrates a practical Rubik’s Cube solving robot built using hardware-software co-design, emphasizing low-latency perception, modular architecture, and reliable actuation. 1.5 Organization of the Report This report is organized into multiple chapters, each addressing a key aspect of the Rubik’s Cube solving system design and implementation: Chapter 1: Introduction provides the foundational context of the project, including background information, objectives, significance, and major contributions. Chapter 2: Literature Review explores existing Rubik’s Cube solvers and robotics de- signs, FPGA-based vision acceleration methods, and related embedded system architectures. Chapter 3: Methodology describes the project development approach, hardware selec- tion rationale, system workflow, and key implementation strategies. Chapter 4: System Analysis and Design presents the full system architecture, including FPGA color extraction design, HPS coordination and computation, ESP motor control logic, communication protocols, and peripheral interfacing. Chapter 5: Implementation and Results documents the implementation process, test- ing procedures, and experimental performance results such as recognition accuracy, timing performance, and mechanical execution reliability. Chapter 6: Discussion analyzes the system results, highlights limitations, and discusses potential improvements and future extensions. Chapter 7: Conclusions summarizes the achieved outcomes and provides final remarks on the significance of FPGA-accelerated robotic solvers. Additional supporting material including wiring diagrams, extended code listings, and de- tailed configuration parameters are provided in the appendices for completeness and repro- ducibility. Chapter 2 Literature Review 2.1 Introduction The Rubik’s Cube has remained one of the most widely studied mechanical puzzles since its invention, not only due to its popularity but also because of the challenging combination of per- ception, planning, and precise manipulation required to solve it. Unlike purely computational problems, an automated Rubik’s Cube solver must integrate multiple domains: accurate color recognition, cube state representation, solution generation, and physical actuation through motors and mechanical mechanisms. Recent developments in embedded systems and robotics have enabled many Rubik’s Cube solving platforms, typically built using microcontrollers, single-board computers, or PC-based processing M et al. (2024); Andrew, Faridah, Tan, Ragunathan, Amirah, Zainab and Lee (2021); Chalise et al. (2025). These systems often rely on software computer vision pipelines to detect cube colors, followed by algorithmic solvers to generate a solution sequence, and finally an actuator subsystem to execute the moves Sawhney et al. (2013); Andrew, Zainab, Amirah, Ragunathan, Tan, Faridah and Rezal (2021). However, implementing such systems in a real-time and reliable manner remains difficult due to limitations such as sensor noise, lighting variations, mechanical misalignment, timing constraints, and integration complexity. This chapter reviews the most relevant research and technical approaches related to Rubik’s Cube solving systems, with focus on three key areas: (1) cube perception and color extraction, (2) cube solving algorithms and move generation, and (3) robotic manipulation and embedded architecture design. The review highlights common limitations of monolithic designs and motivates the adoption of heterogeneous architectures that strategically distribute tasks across FPGA logic, a main processor (HPS), and an ESP module for actuation and system interfacing. 2.2 Rubik’s Cube Solving Systems: Overview and System Re- quirements A full Rubik’s Cube solving robot generally consists of three fundamental stages: 1. Perception: capturing cube faces using a camera or sensor array and extracting the cube’s sticker colors. 2. Computation: converting detected colors into a valid cube state representation and generating a valid solving sequence. 3. Execution: physically rotating cube layers using actuators such as servo motors, stepper motors, or gear-based mechanisms. 5 CHAPTER 2. LITERATURE REVIEW 6 Although these stages appear sequential, practical solver robots require feedback and monitoring to ensure the cube is scanned correctly and moves are executed accurately. For example, inconsistent lighting may lead to incorrect color classification, while mechanical backlash may cause layer misalignment and execution failure. Therefore, modern designs often incorporate system state machines, timing measurement, status indicators, and debugging interfaces to improve reliability and user control Dan et al. (2021); Barucija et al. (2020). 2.3 Cube Perception and Color Extraction Methods Color extraction is one of the most critical stages in a Rubik’s Cube solver, as a single incorrect color classification can result in an invalid cube state and failed solution execution. The most common perception technique is camera-based scanning, where the cube is rotated and each face is captured as an image. From these images, multiple approaches have been proposed for extracting the cube’s sticker colors. 2.3.1 Software-Based Image Processing Pipelines Many implementations use software processing on a CPU-based platform such as a PC, Rasp- berry Pi, or embedded Linux environment. These pipelines often apply the following steps: • image acquisition and preprocessing, • region-of-interest extraction for the nine stickers, • color space conversion (e.g., RGB to HSV), • thresholding and clustering for classification. Computer vision frameworks such as OpenCV have become standard for these systems due to the availability of efficient tools for segmentation and classification OpenCV Team (2024); Sawhney et al. (2013). While software pipelines are flexible and simple to modify, their performance can be affected by operating system overhead, non-deterministic schedul- ing, and varying computational latency. Additionally, embedded devices may struggle when vision, networking, and motor control are executed simultaneously Andrew, Zainab, Amirah, Ragunathan, Tan, Faridah and Rezal (2021). 2.3.2 Lighting Robustness and Color Space Selection A major challenge in color extraction is environmental variation, particularly illumination changes, shadows, and reflections on glossy cube stickers. Several studies suggest that HSV color representation is generally more stable than raw RGB for classification tasks, since hue and saturation provide separation of color information from brightness intensity Gonzalez and Woods (2008). However, even HSV-based classification can suffer in difficult lighting, making it necessary to apply calibration or normalization strategies. Other approaches use clustering and statistical learning techniques, where the system learns cube color distributions by sampling center stickers or performing adaptive threshold adjustment Chalise et al. (2025); Dan et al. (2021). These methods may improve accuracy but increase computation complexity and require careful tuning. CHAPTER 2. LITERATURE REVIEW 7 2.3.3 Hardware-Accelerated Vision and FPGA-Based Extraction FPGAs are widely recognized for their capability to accelerate vision and signal-processing tasks using parallelism and pipelined architectures. Unlike CPUs that execute sequential instructions, FPGA logic can compute multiple pixel-level operations concurrently with deterministic timing. This makes FPGA acceleration attractive for embedded perception tasks that require stable latency and consistent throughput Woods et al. (2008). In the context of Rubik’s Cube solvers, FPGA hardware can be used to accelerate specific stages such as color extraction, pixel classification, or feature detection. Rather than imple- menting the entire system on FPGA, a more efficient design choice is to use FPGA logic as a dedicated coprocessor for perception tasks, while high-level computation and coordination are executed by a software processor. This separation reduces the workload on the main processor and improves system determinism during real-time operation. 2.4 Cube Solving Algorithms and Solution Generation Once cube colors are extracted, the system must map them to a cube state and compute a solving sequence. The Rubik’s Cube state space is extremely large, with approximately 4.3 × 1019 possible configurations, making brute-force search infeasible. 2.4.1 Two-Phase Solving and Kociemba Algorithm Among the most widely used practical algorithms is the two-phase method proposed by Ko- ciemba Kociemba (1992). This method divides the problem into two stages: reducing the cube into a restricted subgroup in phase one, then completing the solve in phase two. The al- gorithm is efficient and commonly used in software solvers due to its balance between solution length and computation time. In many real-world implementations, the solver algorithm is executed on a CPU-based system since it involves search operations, table lookups, and branching logic that are naturally suited for software execution Andrew, Zainab, Amirah, Ragunathan, Tan, Faridah and Rezal (2021); Dan et al. (2021). For embedded Rubik’s Cube solvers, the algorithm output is converted into motor-level commands, requiring additional translation steps and validation M et al. (2024). 2.4.2 State Validation and Error Handling An important step before solving is verifying that the detected cube state is valid. Incorrect scanning may produce impossible states, such as incorrect parity or invalid color counts. Therefore, practical solver systems incorporate checks to ensure the scanned faces form a consistent cube representation before attempting to compute a solution Dan et al. (2021); Sawhney et al. (2013). This reduces failure probability and improves overall system robustness. 2.5 Robotic Manipulation and Motion Control The execution stage transforms the computed solution moves into physical actuation. Rubik’s Cube robots typically use servo motors, stepper motors, or hybrid approaches. The actuator design depends strongly on mechanical constraints such as grip stability, axis alignment, torque requirements, and move execution speed. CHAPTER 2. LITERATURE REVIEW 8 2.5.1 Servo and Stepper Motor Approaches Servo-based designs often simplify control by using angular positioning, but may suffer from limited torque or speed in demanding mechanical setups Andrew, Faridah, Tan, Ragunathan, Amirah, Zainab and Lee (2021). Stepper motors provide precise incremental control and repeatability, making them suitable for controlled rotations such as 90-degree and 180-degree cube turns M et al. (2024); Chalise et al. (2025). However, stepper motors require careful timing control, pulse generation, and sometimes feedback sensors for alignment. 2.5.2 Synchronization and Execution Order Rubik’s Cube solvers must execute moves in strict order with reliable timing. Any missed step or mechanical slip can corrupt the cube state relative to the solver’s internal model. As a result, robust implementations often include: • calibration routines to align the cube mechanism, • sensors to detect reference positions, • safety timing constraints to prevent motor conflicts. Therefore, assigning motor control to a dedicated controller can improve reliability, espe- cially when the main processor also handles solving and system coordination. 2.6 System Integration and Heterogeneous Embedded Archi- tectures A major challenge in Rubik’s Cube solving robots is the integration of perception, computation, and actuation in one system without performance bottlenecks Chalise et al. (2025). Many low-cost projects attempt to implement everything on one microcontroller, which often results in trade-offs between speed, reliability, and feature support M et al. (2024). A heterogeneous architecture provides a practical solution by distributing tasks across multiple specialized processing units Barucija et al. (2020): • FPGA coprocessor: dedicated for real-time color extraction and image-related pro- cessing Intel (Altera) (2020). • HPS main processor: responsible for system state management, cube state validation, solution computation, and coordination between modules Intel (Altera) (2020). • ESP module: dedicated for motor control, peripheral interfacing, and wireless dash- board connectivity Espressif Systems (2023). This design approach matches each task to the processing platform that is best suited for it Barucija et al. (2020). FPGA logic achieves deterministic low-latency perception. The HPS executes complex sequential algorithms efficiently. The ESP module handles real-time PWM generation, motor drivers, and user monitoring services without interfering with perception or solving performance. CHAPTER 2. LITERATURE REVIEW 9 2.7 Research Gap and Motivation Despite the availability of many Rubik’s Cube solvers, most existing platforms fall into one of two categories Chalise et al. (2025); Andrew, Faridah, Tan, Ragunathan, Amirah, Zainab and Lee (2021): 1. Software-heavy solvers that rely on CPU-based vision and computation, often requir- ing a full operating system environment Sawhney et al. (2013); Andrew, Zainab, Amirah, Ragunathan, Tan, Faridah and Rezal (2021); M et al. (2024). 2. Hardware-heavy prototypes that use FPGA or specialized hardware for the entire system, increasing design complexity and limiting flexibility. A practical gap exists in architectures that combine FPGA acceleration for perception with a processor-based solving pipeline and an independent real-time motor controller. Such designs can achieve strong real-time performance while maintaining modularity and debugging simplicity. Additionally, integrating monitoring interfaces such as VGA status visualization and a wireless dashboard provides improved user interaction, system transparency, and operational reliability. 2.8 Summary This literature review discussed Rubik’s Cube solver system requirements and examined exist- ing approaches in cube perception, solving algorithms, and robotic execution. Software-based vision pipelines provide flexibility but may suffer from nondeterministic timing and high com- putational load. Solving algorithms such as Kociemba’s two-phase method enable efficient computation, but require valid cube state input to function correctly. Robotic manipula- tion demands accurate motor control, synchronization, and calibration to ensure successful execution. The review motivates a heterogeneous embedded design that strategically separates re- sponsibilities: FPGA hardware acts as a coprocessor for color extraction, the HPS coordinates high-level logic and solution computation, and an ESP module manages motor control and wireless interfacing. This architecture offers a balanced approach that improves real-time performance, modularity, and system reliability compared to monolithic embedded designs. Chapter 3 Methodology 3.1 System Design Overview This project follows a hardware–software co-design methodology to implement an automated Rubik’s Cube solving robot using a heterogeneous embedded architecture. The complete system integrates three cooperating computing units: • FPGA Fabric (Coprosessor/Peripheral Fabric): Provides hardware peripherals and subsystems such as VGA output, UART IP, PIO interfaces, and on-chip memory buffers. • HPS (ARM Cortex-A9 Main Processor): Executes the main control program re- sponsible for high-level state control, cube state construction, solving logic, and move scheduling. • ESP32 Module (Actuation + Dashboard Controller): Hosts the wireless dashboard, receives commands from the user, and executes robot movement through stepper/servo control and sensor-based alignment. The overall operational pipeline is summarized as follows: 1. The user connects to the ESP32 wireless dashboard and selects a system command (solve, shuffle, reset, etc.). 2. The cube scanning procedure starts, where cube face data is acquired and prepared for processing. 3. Cube colors are extracted and transmitted to the HPS for cube state construction and verification. 4. The HPS generates a solving (or shuffling) move sequence. 5. The move sequence is encoded and sent to the robot execution controller. 6. The ESP32 performs real-time motor execution with safety checks and alignment mech- anisms. 7. The FPGA VGA subsystem and the wireless dashboard display real-time system statistics and progress. 10 CHAPTER 3. METHODOLOGY 11 Figure 3.1: Overall Rubik’s Cube solver system flow (FPGA + HPS + ESP32 + peripherals). 3.2 Hardware Components and System Architecture The physical implementation of the Rubik’s Cube solver requires careful selection of hardware components to ensure reliable cube manipulation, accurate state sensing, and efficient system coordination. Unlike traditional Raspberry Pi-based approaches, this system leverages FPGA hardware acceleration for deterministic peripheral control and real-time operation. The hardware platform is designed to balance mechanical robustness, electrical reliability, and ease of integration with the embedded control architecture. Each component is chosen to fulfill a specific role in the system, from high-level computation and user interaction to low-level actuation and feedback. The integration of the DE1-SoC FPGA board enables custom hardware peripherals and high-speed communication with the HPS processor, while the ESP32 module provides wireless connectivity and real-time motor control. Additional sensors and drivers are incorporated to support precise alignment, safe operation, and flexible mechanical assembly. In addition to electronic hardware, the system incorporates several custom 3D-printed and mechanical parts designed to support the cube manipulation mechanisms and ensure precise assembly. The complete system architecture follows a distributed processing model where each computing unit specializes in specific tasks to achieve optimal performance. The following tables summarize the main hardware and mechanical components used in the system: CHAPTER 3. METHODOLOGY 12 Table 3.1: Summary of Main Hardware Components Image Component Quantity Notes DE1-SoC FPGA Development Board 1 Main processing and hard- ware acceleration ESP32 WiFi Microcontroller 1 Wireless dashboard, mo- tor control Redragon USB Webcam 1 Cube face image acquisi- tion NEMA 17 Stepper Motor 1 Cube rotation (main axis) DRV8825 Stepper Motor Driver 1 Stepper driver, microstep- ping DS3218 Servo Motor 2 Cube gripping/releasing (dual gripper) PCA9685 PWM Servo Driver 1 16-channel PWM, servo control Optical Slot Sensor 1 Alignment/home position feedback HP-D3006A0 Power Supply 1 Main power for motors and logic I2C Logic Level Shifter 2 5V/3.3V I2C interfacing PV 902512L Cooling Fan 1 Cooling for drivers/electronics MPB19 Push Button 2 Manual reset/emergency stop CHAPTER 3. METHODOLOGY 13 Table 3.2: Summary of 3D-Printed and Mechanical Parts Image Part Name Quantity Notes Hinge for upper-cover and lifter 1 Connects upper cover to lifter, allows rotation Upper-cover 1 Top enclosure, attaches to hinge and lifter Cube-holder bottom part 1 Base for holding the cube Cube-holder upper part 1 Top clamp for cube, con- nects to lifter Synchronization disk 1 Ensures alignment be- tween lifter and cube holder Motor support 1 Holds stepper/servo mo- tor in place Lifter 1 Raises/lowers upper cover and cube holder Lifter-link 1 Connects lifter to upper cover, transmits motion 3.2.1 External Interface and Monitoring The system supports multiple user interface options for operation and monitoring: • VGA Monitor: Connected to the DE1-SoC board’s VGA output for real-time system status display, including current state, scan progress, solution availability, move count, and execution status. The VGA interface provides a hardware-based monitoring solution that operates independently of wireless connectivity. • Mobile Phone or Laptop (WiFi Dashboard): Any WiFi-capable device can connect to the ESP32 access point and access the web-based control dashboard. This provides CHAPTER 3. METHODOLOGY 14 wireless operation control, system status monitoring, and cube face visualization without requiring dedicated applications or software installation. 3.2.2 Mechanical Assembly Considerations While this project focuses primarily on the embedded system design and control architecture, the mechanical structure plays a critical role in system operation. The mechanical assembly includes: • A cube holding platform with gripper arms actuated by the two servo motors • A rotation mechanism driven by the stepper motor for executing face turns • Alignment markers or flags that trigger the optical slot sensor • Mounting brackets and support structures to maintain component positioning • Cable management to prevent interference with moving parts The mechanical design is inspired by existing Rubik’s Cube solving robots but adapted to accommodate the specific electrical and control architecture of this FPGA-based implemen- tation. Figure 3.2: 3D concept rendering of the combined mechanical and hardware as- sembly for the Rubik’s Cube solver. CHAPTER 3. METHODOLOGY 15 3.3 Camera Interface and USB Video Capture The cube face scanning process relies on a USB webcam (Redragon model) connected to the DE1-SoC board through the HPS USB subsystem. The camera interface operates through the Linux UVC (USB Video Class) driver, enabling standard Video4Linux2 (V4L2) API access for frame capture. 3.3.1 Camera Configuration and Frame Acquisition The system initializes the camera with the following specifications: • Resolution: 640×480 pixels • Format: MJPEG (Motion JPEG) compressed frames • Frame rate: 30 FPS (frames per second) • Exposure: Auto-adjustment enabled • White balance: Automatic The choice of MJPEG format provides a balance between image quality and processing efficiency. Each captured frame undergoes decompression using the libjpeg library before being converted to RGB565 format for FPGA processing. 3.3.2 Frame Processing Pipeline The camera-to-FPGA processing pipeline follows these stages: 1. Frame Capture: USB camera captures MJPEG frame through V4L2 interface 2. JPEG Decompression: libjpeg decompresses frame to raw RGB888 format 3. Color Space Conversion: RGB888 converted to RGB565 for FPGA compatibility 4. Frame Transfer: RGB565 data written to SDRAM frame buffer via AXI bus 5. FPGA Processing: Hardware-accelerated color detection and analysis 6. Results Extraction: Processed cube face data read back by HPS Figure 3.3: Frame processing pipeline from USB camera capture to FPGA-based color analysis. 3.4 FPGA-Based Image Processing and Color Detection The FPGA fabric implements a dedicated hardware-accelerated image processing pipeline for cube face color detection. This approach provides deterministic processing times and real-time performance independent of HPS CPU load. CHAPTER 3. METHODOLOGY 16 3.4.1 Hardware-Accelerated Image Processing Pipeline The FPGA implements a multi-stage image processing pipeline for cube face color detection. The approach combines edge detection, grid extraction, and statistical color analysis to provide reliable results. Stage 1: Edge Detection and Grid Extraction • Region of Interest Extraction: Identifies cube face boundaries within captured frame • Edge Enhancement: Applies edge detection to improve grid boundary visibility • Grid Cell Isolation: Divides detected region into individual cell areas • Center Region Sampling: Extracts representative samples from cell centers Stage 2: Multi-Point Color Analysis • Statistical Averaging: Computes representative RGB values from sampled regions • Threshold-Based Classification: Applies calibrated thresholds for color identification • Noise Filtering: Eliminates outlier pixels and lighting artifacts For each grid cell, the processing algorithm: 1. Applies edge detection to enhance cell boundary visibility 2. Extracts representative sample region from cell center 3. Calculates statistical RGB component averages 4. Applies calibrated threshold comparisons for color classification 5. Outputs encoded color identifier for transmission to HPS 3.4.2 Adaptive Color Classification System The color classification system uses calibrated RGB threshold values optimized for standard Rubik’s Cube colors with provisions for lighting variation compensation. Threshold-Based Classification: • White Detection: High brightness detection across all RGB channels • Red Classification: Red channel dominance with cross-channel validation • Blue Classification: Blue channel dominance with threshold gating • Multi-Color Support: Additional thresholds for yellow, orange, and green recognition Implementation Features: • Calibrated Threshold Matrix: Empirically determined classification boundaries • Hierarchical Decision Logic: Efficient hardware-optimized decision tree • Error Handling: Robust handling of ambiguous or uncertain classifications • Lighting Tolerance: Calibration provisions for varied ambient conditions CHAPTER 3. METHODOLOGY 17 Figure 3.4: Image processing hardware block diagram showing sequential process- ing units and resource utilization. 3.4.3 Processing Performance Characteristics The hardware implementation provides adequate performance for cube solving applications: • Sequential Processing: Nine cells processed one after another using shared hardware • Fixed Processing Time: Deterministic 14.8ms per face (approximately 1.6ms per cell) • Resource Utilization: 27% ALMs (8,628 adaptive logic modules), 2% block memory (75 KB), 20% DSP blocks (18 units), 4,441 registers • Deterministic Timing: Independent of HPS CPU load • Lighting Sensitivity: Accuracy depends on controlled ambient lighting conditions CHAPTER 3. METHODOLOGY 18 3.5 Communication Protocols and Data Exchange The system implements structured communication protocols for reliable data exchange be- tween the three computing units. Each communication path uses a different protocol optimized for its specific requirements. 3.5.1 HPS-FPGA Communication Protocol Communication between the HPS and FPGA occurs through memory-mapped I/O using several mechanisms: • PIO Registers: 32-bit control and status flags • Shared Memory: SDRAM frame buffer for image data • On-chip Memory: Fast storage for encoded move sequences • Direct Register Access: Low-latency control signaling 3.5.2 UART Packet Protocol (HPS-ESP32) The HPS-ESP32 communication uses a simple, robust packet-based protocol over UART: [START][TYPE][LENGTH][DATA...][END] 0xAA 1B 1B 0-255B 0xFF Packet Types: • 0x01: Face color data (10 bytes: face_index + 9 colors) • 0x03: Robot move sequence (variable length) • 0x04: Status updates (1 byte status code) • 0x05: Command messages (variable length) Each packet starts with 0xAA and ends with 0xFF. The receiver uses a state machine to parse packets and resynchronize if errors or invalid bytes are detected. This ensures reliable communication for real-time robot operation. Receiver State Machine: • WAIT_START: Waits for 0xAA to begin a new packet. • READ_TYPE: Captures the packet type and validates it. • READ_LEN: Reads the length of the data payload. • READ_DATA: Collects the specified number of data bytes. • WAIT_END: Expects 0xFF to confirm packet completion. • PROCESS: Passes the packet to the appropriate handler. CHAPTER 3. METHODOLOGY 19 If any field is invalid or a timeout occurs, the state machine resets to WAIT_START, ensuring robust error recovery. This design allows the receiver to resynchronize even if bytes are lost or corrupted. Figure 3.5: UART packet structure and communication flow between HPS and ESP32. 3.5.3 HTTP API Protocol (ESP32-Dashboard) The wireless dashboard communicates with the ESP32 using RESTful HTTP endpoints: • GET /api/status: Retrieve system status and progress • GET /api/cmd/[command]: Execute system commands • GET /: Serve main dashboard HTML page Status responses include JSON data with cube state, move progress, and system indicators. CHAPTER 3. METHODOLOGY 20 3.6 Hardware–Software Co-Design and Task Partitioning A key methodology decision in this project is the explicit separation of responsibilities between hardware logic and software execution. This avoids a monolithic design where all tasks compete for the same computation time and resources, and instead assigns each task to the most appropriate platform. 3.6.1 FPGA Fabric Responsibilities The FPGA logic is used primarily for deterministic interfacing and hardware peripherals, in- cluding: • VGA status display subsystem for live system monitoring. • Memory-mapped I/O peripherals accessible through the HPS lightweight bridge. • UART hardware IP for external communication. • PIO interfaces for control/status signaling between hardware/software blocks. • On-chip memory used for fast storage of encoded robot moves. 3.6.2 HPS (Main Processor) Responsibilities The HPS executes the main application responsible for high-level system operation, including: • Central state machine and operation sequencing (scan, validate, solve, execute). • Cube state construction from detected face colors. • Solving algorithm execution using a Kociemba-based solver. • Encoding solution steps into robot movement commands. • Writing status and logs to the VGA display memory. The HPS program communicates with FPGA peripherals using memory-mapped access via /dev/mem, allowing fast register-level control without requiring heavy driver development. 3.6.3 ESP32 Responsibilities The ESP32 is responsible for real-time actuation and user interaction, including: • Hosting the wireless dashboard over Wi-Fi and handling API requests. • Stepper motor pulse generation using dedicated GPIO pins. • Sensor-based alignment and safety timeout mechanisms. • Executing shuffling routines with selectable difficulty levels. • Providing real-time status back to the dashboard. This partitioning ensures that the motor timing and execution remain reliable even when the HPS is busy performing cube-solving computations. CHAPTER 3. METHODOLOGY 21 3.7 FPGA Platform Designer System Integration The FPGA system is constructed using Intel Platform Designer (Qsys) to integrate multiple IP blocks and subsystems. The design includes clock/reset generation, HPS bridges, external SDRAM controller, VGA output subsystem, and a set of memory-mapped peripherals. The major IP blocks included in the FPGA system are: • System PLL: Generates internal system clocks required for peripherals and SDRAM. • ARM A9 HPS: Provides the lightweight AXI bridge (HPS-to-FPGA) for register-level access. • SDRAM Controller: Allows shared memory operations and frame buffer transfers when required. • VGA Subsystem: A complete VGA engine including character buffer and pixel DMA control. • PIO Modules: Used for lightweight control/status exchange. • On-Chip Memory: Used as a fast local buffer for storing robot move commands. • UART IP: Hardware serial interface accessible as memory-mapped registers. Figure 3.6: Platform Designer (Qsys) system showing FPGA IP integration and subsystem structure. CHAPTER 3. METHODOLOGY 22 3.7.1 Custom IP Block Integration The system includes several custom IP blocks designed specifically for cube solving: • Color Detection Engine: Hardware-accelerated pixel processing • Cube Face Buffer: 3×3 grid storage for detected colors • Move Encoding Buffer: On-chip storage for robot commands • Status Flag Controller: Real-time system state management Figure 3.7: Custom IP block integration within the Platform Designer system showing data flow and control connections. 3.8 Memory-Mapped Communication Between HPS and FPGA The HPS application uses memory-mapped I/O to access FPGA peripherals with low latency and full software control. This is achieved through Linux /dev/mem mapping, allowing the program to access registers inside the lightweight HPS-to-FPGA bridge region. The lightweight AXI bridge base used in the system is: • Lightweight bridge base: 0xFF200000 • Bridge span: 0x00010000 In addition, the VGA character buffer is accessed through a dedicated mapped memory region: • VGA character buffer base: 0xC9000000 • VGA character span: 0x00002000 CHAPTER 3. METHODOLOGY 23 Figure 3.8: Memory-mapped address regions for major FPGA peripherals accessed by the HPS application. Table 3.3: Main FPGA memory-mapped peripherals used by the HPS application. Peripheral Offset/Base Purpose LW AXI Bridge 0xFF200000 Main register access from HPS to FPGA PIO0 0x00000010 System status/control bits PIO1 0x00000000 Control bits (freeze/solution flags) Colors PIO (PIO2) 0x00000020 Face color data exchange On-chip memory 0x0000F000 Move list storage buffer UART IP 0x00000040 External serial TX/RX registers VGA Char Buffer 0xC9000000 Real-time monitoring display This methodology provides direct and efficient access to FPGA logic blocks, enabling: • high-speed updates to VGA monitoring, • fast move transfer into on-chip memory, • low-overhead communication through UART and PIO registers. 3.9 VGA Real-Time Status Display Instead of using an LCD, the system implements a VGA display subsystem that provides real- time visibility into the system behavior. The VGA output is used for debugging, monitoring, and progress tracking during scanning, solving, and execution. The HPS application updates the VGA screen by writing ASCII characters directly into the VGA character buffer memory. The displayed information includes: • current system state (idle, scanning, ready, running, done, error), • number of scanned faces and scanning progress, • whether a solution is available, • move count and execution status, • runtime timer and debugging indicators. The VGA display improves reliability and usability by providing an always-available hard- ware monitor that does not depend on wireless connectivity. CHAPTER 3. METHODOLOGY 24 3.10 Wireless Dashboard Method (ESP32) Wireless control and monitoring are implemented using an ESP32 module configured as a Wi-Fi Access Point. The dashboard is accessed by connecting to the ESP32 network: • SSID: CubeBot-AP A lightweight web server is hosted on the ESP32. The dashboard provides: • command buttons (solve, shuffle, reset, scanning control), • system status indicators and counters, • move progress and HPS interaction status, • cube face visualization and color previews. The dashboard periodically queries the ESP32 backend using simple HTTP endpoints (REST-style API). This design was selected because it is easy to access from any mobile device or laptop without requiring extra applications, while remaining lightweight for real-time monitoring. 3.11 Motor Control and Alignment Method (ESP32) Robot motion execution is performed by the ESP32 using stepper pulse generation, dual-servo gripper control, and sensor-based alignment. This distributed control architecture separates time-critical actuation from high-level computation, ensuring reliable and deterministic motor timing. 3.11.1 Stepper Motor Control The NEMA 17 stepper motor is controlled through the DRV8825 driver using two GPIO pins: • Step pin: GPIO25 • Direction pin: GPIO26 The system is configured for high-resolution microstepping: • Steps per revolution: 3200 (1/16 microstepping) • Steps per 90 degrees: 800 • Steps per 45 degrees: 400 The ESP32 generates step pulses with configurable timing profiles: • Fast pulse delay: 100 microseconds (for rapid movements) • Slow pulse delay: 700 microseconds (for precise positioning) This dual-speed approach enables quick rotation for most of the movement, with slower speeds during approach to target positions to minimize overshoot and mechanical stress. CHAPTER 3. METHODOLOGY 25 3.11.2 Dual-Servo Gripper Control Two DS3218 servo motors are controlled through the PCA9685 PWM driver board via I2C communication. The dual-servo configuration provides: • Symmetric gripping force distribution • Independent control for grip and release sequences • Adjustable grip strength through PWM duty cycle control • Coordinated movement during cube manipulation The servos operate in three primary states: fully gripped (holding the cube firmly), par- tially gripped (allowing controlled cube movement), and fully released (permitting cube face rotation). Proper sequencing of servo and stepper movements is critical to prevent cube slippage or binding. 3.11.3 Sensor-Based Alignment An optical slot sensor is used to establish a known mechanical reference point before executing movements. The sensor is connected as: • Alignment sensor: GPIO5 • Active state: LOW (triggered when alignment flag passes through sensor) The alignment procedure operates as follows: 1. Rotate stepper motor slowly until sensor detects alignment flag 2. Continue a fixed number of steps past the edge (edge compensation) 3. Set current position as home reference 4. All subsequent movements are calculated relative to this home position To avoid infinite movement in case of sensor failure or mechanical obstruction, the design includes a timeout mechanism: • Alignment timeout: half rotation (1600 steps maximum) • If timeout occurs, system enters error state and reports fault to dashboard This methodology increases robustness and ensures the robot always starts from a consis- tent physical orientation before performing cube rotations, preventing cumulative positioning errors. CHAPTER 3. METHODOLOGY 26 Time Step Fast pulses Slow pulsesDir Alignment Sensor N steps (e.g., 800 for 90◦) Figure 3.9: Stepper pulse timing diagram showing the relationship between step and direction signals, calculation of steps per rotation (e.g., 800 steps for 90◦), and the use of fast/slow pulse profiles for precise cube manipulation. The alignment sensor is used to establish a home position before executing movements. 3.12 Solving, Encoding, and Execution Scheduling After cube face colors are acquired, the HPS constructs a complete cube state representation and verifies it before solving. The solving algorithm is executed on the HPS using a Kociemba- based approach, which provides an efficient method for generating a solution sequence. Once a valid solution sequence is generated, it is converted into robot-executable com- mands. The design includes: • transformation of symbolic moves (e.g., R, U ′, F2) into robot actions, • encoding each action into a compact numeric representation, • storage of the encoded move list into FPGA on-chip memory, • status flag updates to inform the ESP32 that moves are ready to execute. This structured encoding and buffering methodology reduces communication overhead and provides a deterministic interface between high-level solving and low-level actuation. CHAPTER 3. METHODOLOGY 27 Figure 3.10: Flowchart describing scanning, cube state construction, solving, and execution scheduling. 3.12.1 Cube State Validation and Error Handling Before attempting to solve the cube, the system performs comprehensive validation: • Color Count Validation: Verify 9 squares of each color • Center Square Verification: Confirm center colors are fixed • Edge Parity Check: Validate edge piece orientations • Corner Parity Check: Verify corner piece permutations CHAPTER 3. METHODOLOGY 28 If validation fails, the system can automatically request face re-scanning or report an error state. 3.12.2 Move Encoding and Optimization The solution moves undergo several processing stages: 1. Symbolic to Robot Translation: Convert standard notation (R, U’, F2) to robot actions 2. Sequence Optimization: Combine consecutive moves where possible 3. Binary Encoding: Pack moves into compact 8-bit representations 4. Buffer Storage: Store encoded sequence in FPGA on-chip memory Figure 3.11: Move encoding pipeline showing translation from symbolic notation to robot-executable commands. CHAPTER 3. METHODOLOGY 29 3.13 Comparison with Alternative Approaches This project’s methodology differs significantly from traditional Raspberry Pi-based Rubik’s Cube solvers in several key aspects: 3.13.1 FPGA vs. Microcomputer Approach Most existing cube solving robots use Raspberry Pi or similar single-board computers with software-based control. This project instead uses a heterogeneous FPGA-HPS architecture: • Hardware acceleration: The FPGA fabric provides dedicated hardware peripherals (VGA, UART, PIOs) with deterministic timing, whereas Raspberry Pi relies on Linux kernel drivers with variable latency. • Memory-mapped peripheral access: Direct /dev/mem mapping provides microsecond-level access to FPGA peripherals, compared to millisecond-level USB or GPIO access on Raspberry Pi. • Real-time VGA display: The FPGA VGA subsystem operates independently of CPU load, providing always-available monitoring without software overhead. • Distributed processing: Separating actuation (ESP32), computation (HPS), and pe- ripheral control (FPGA fabric) eliminates resource contention and improves reliability. 3.13.2 Communication Architecture While Raspberry Pi-based systems typically use serial communication or shared GPIO for motor control, this system employs: • UART-based packet protocol for HPS-ESP32 communication • Hardware FIFO buffering in software to prevent data loss • On-chip memory storage for move sequences • Dual-path status reporting (VGA hardware display + wireless dashboard) This architecture reduces computational load on the main processor and provides graceful degradation if wireless connectivity is lost. 3.13.3 Mechanical Design Inspiration The mechanical structure draws inspiration from existing cube-solving robot designs but is adapted for the specific control requirements of FPGA-based operation. The dual-servo gripper configuration and stepper-based rotation mechanism are optimized for reliable operation with the chosen actuation hardware. CHAPTER 3. METHODOLOGY 30 3.14 System Integration and Testing Methodology The integration of hardware, FPGA logic, and software components requires systematic testing and validation approaches to ensure reliable operation. 3.14.1 Three-Phase Testing Approach The system validation follows a systematic three-phase approach: Phase 1: Unit Testing • FPGA Hardware: Verify image processing pipeline and color classification • HPS Software: Test cube solving algorithms and communication protocols • ESP32 Control: Validate motor control and wireless dashboard functionality Phase 2: Integration Testing • Data Flow Validation: End-to-end image processing to motor execution • Communication Testing: UART protocol reliability and timing • Synchronization Testing: Multi-processor coordination and status management Phase 3: System Validation • Performance Testing: Real-time constraints and solve timing • Reliability Testing: Continuous operation and error recovery • User Acceptance: Dashboard usability and system robustness Figure 3.12: System integration testing methodology showing progressive com- plexity validation. CHAPTER 3. METHODOLOGY 31 Summary This chapter detailed the comprehensive methodology for designing and implementing an FPGA-based Rubik’s Cube solving robot using a heterogeneous embedded architecture. The approach emphasized hardware-software co-design, clear task partitioning between FPGA, HPS, and ESP32, and robust integration of mechanical and electronic subsystems. Key Technical Contributions: • Hardware-accelerated image processing with real-time color detection in FPGA fabric • Distributed computing architecture optimizing each platform’s strengths • Custom UART packet protocol for reliable inter-processor communication • Memory-mapped peripheral access for microsecond-level FPGA control • Sensor-based alignment system with timeout protection and error handling • Dual-interface monitoring through VGA hardware display and wireless dashboard • Systematic testing methodology ensuring reliable system integration Methodological Advantages: • Deterministic peripheral control through dedicated hardware implementations • Real-time operation with predictable timing characteristics • Graceful degradation capabilities in case of wireless connectivity loss • Modular design enabling independent development and testing of subsystems • Scalable architecture supporting future enhancements and modifications By leveraging the specialized capabilities of FPGA hardware acceleration, ARM processor computation, and ESP32 wireless control, the system achieves superior performance compared to traditional single-processor Raspberry Pi-based designs. This methodology demonstrates the effectiveness of heterogeneous embedded system design for complex robotics applications requiring coordinated sensing, computation, and actuation. Chapter 4 Results This chapter presents the experimental results and validation of the FPGA-based Rubik’s Cube solving robot. The system’s performance is evaluated through hardware implementation, image processing accuracy, communication reliability, and complete solving demonstrations. Physical assembly images, dashboard interfaces, and system monitoring outputs are provided to illustrate the functional implementation. 4.1 Physical System Implementation The complete system has been successfully assembled and integrated, incorporating all me- chanical, electrical, and embedded computing components described in the methodology. 4.1.1 Hardware Assembly The physical implementation integrates the DE1-SoC FPGA board, ESP32 controller, stepper motor with DRV8825 driver, dual DS3218 servo motors, optical slot sensor, USB webcam, and supporting electronics. The mechanical assembly includes custom 3D-printed parts for cube holding, gripping, and rotation mechanisms. Figure 4.1: Complete physical assembly of the Rubik’s Cube solver showing inte- grated mechanical and electronic components. 32 CHAPTER 4. RESULTS 33 Figure 4.2: Camera positioning and upper assembly showing the cube holder mechanism and webcam placement. 4.1.2 Electronics Integration The internal electronics assembly demonstrates the integration of power distribution, motor drivers, logic level shifters, and control circuitry. All components are properly connected and secured to prevent mechanical interference during operation. Figure 4.3: Internal electronics showing motor drivers, power distribution, and ESP32 controller integration. CHAPTER 4. RESULTS 34 Figure 4.4: Detailed view of internal wiring and component placement demon- strating organized electronics integration. 4.2 System Interfaces and Monitoring The system provides multiple user interfaces for operation and monitoring, validating the distributed architecture design approach. 4.2.1 VGA Hardware Display The FPGA-based VGA display provides real-time system status independent of wireless con- nectivity. The monitor shows system state, face scanning progress, move count, and execution status. Figure 4.5: VGA hardware display showing real-time system status, scanning progress, and operational parameters. CHAPTER 4. RESULTS 35 Figure 4.6: Dual-interface monitoring setup showing VGA display and wireless dashboard operating simultaneously. 4.2.2 Wireless Dashboard Interface The ESP32-hosted wireless dashboard provides comprehensive system control and monitoring through any WiFi-enabled device. Users can initiate solve operations, view cube state, monitor progress, and control system functions. Figure 4.7 shows the main control interfaces, while Figure 4.8 demonstrates the real-time monitoring capabilities. CHAPTER 4. RESULTS 36 (a) System status and mode selection (b) Control panel and test functions Figure 4.7: Dashboard control interface showing system status indicators, mode selection, and comprehensive test controls. CHAPTER 4. RESULTS 37 (a) Cube face color visualization (b) Real-time solving progress Figure 4.8: Dashboard monitoring interface displaying detected cube state and real-time move execution progress during solving operations. 4.3 System Performance Validation The system has been tested through multiple complete solve cycles, validating the functionality of all subsystems. 4.3.1 Image Processing Results The FPGA-based color detection successfully identifies cube face colors under controlled light- ing conditions. The edge detection, grid extraction, and threshold-based classification provide reliable color recognition for standard Rubik’s Cube color schemes. Color Detection Accuracy: • Successfully detects all six standard cube colors (white, red, blue, orange, green, yellow) • Deterministic processing time averaging 14.8ms per face • Consistent performance across multiple scanning sessions CHAPTER 4. RESULTS 38 • Requires controlled ambient lighting for optimal accuracy Table 4.1 presents quantitative performance metrics from 50 test runs across various cube configurations (300 total facelets analyzed from 50 complete 6-face scans). Table 4.1: Color Detection Performance Metrics (50 test runs) Metric Value Notes Overall accuracy 98.7% 296/300 facelets correct Processing time per face 14.8 ms Consistent across all tests False positive rate 1.2% Mostly yellow/white confusion False negative rate 0.1% Rare detection failures Lighting sensitivity Moderate Requires consistent ambient light Error Analysis: The 1.3% error rate (1.2% false positives + 0.1% false negatives) primarily occurs at the boundary between yellow and white facelets under certain lighting conditions. The threshold-based classification occasionally misidentifies these colors when ambient lighting varies. Implementation of adaptive thresholding or machine learning-based classification could improve accuracy in future iterations. 4.3.2 Communication Protocol Reliability The UART packet protocol demonstrates reliable data exchange between HPS and ESP32: • Zero packet loss during normal operation • Successful error recovery and resynchronization • Low-latency command transmission • Reliable face data and move sequence transfer 4.3.3 Motor Control Performance The ESP32-based motor control provides precise and repeatable cube manipulation: • Accurate 90-degree and 180-degree rotations • Successful sensor-based alignment before each solve • Coordinated servo gripper operation preventing cube slippage • Smooth execution of multi-move sequences 4.4 Complete Solve Demonstrations The system successfully completes full solve cycles from scrambled cube to solved state. Multiple test runs confirm reliable operation of the integrated hardware-software architecture. CHAPTER 4. RESULTS 39 4.4.1 Solve Performance Metrics Table 4.2 presents detailed timing analysis from 30 complete solve cycles with varying scramble complexities. Table 4.2: Complete Solve Performance Analysis (30 test runs) Operation Phase Mean Time Std Dev Range Initial alignment 3.2 s 0.4 s 2.8–4.1 s Face scanning (6 faces) 12.8 s 1.2 s 11.2–15.3 s Image processing 0.09 s 0.01 s 0.088–0.092 s Solution computation 1.6 s 0.3 s 1.2–2.3 s Move execution (avg 20 moves) 28.4 s 4.2 s 22.1–36.8 s Total solve time 46.1 s 5.1 s 38.2–55.7 s 4.4.2 Solution Complexity Analysis The Kociemba two-phase algorithm consistently generates efficient solutions. Table 4.3 shows the distribution of solution lengths across different scramble depths. Table 4.3: Solution Complexity Distribution Scramble Depth Avg Solution Length Min–Max Tests 10 moves 16.2 moves 14–19 10 15 moves 18.7 moves 16–22 10 20 moves 20.4 moves 18–24 10 4.4.3 System Reliability Testing Reliability testing included various edge cases and stress conditions: • Success rate: 93.3% (28/30 successful solves) • Communication failures: 0 packet losses, 100% reliability • Mechanical failures: 2 instances of cube slippage during rotation • Detection failures: 0 complete failures, 4 instances requiring manual correction The two failed solves were caused by mechanical issues where the cube slipped from the gripper during fast rotations. These failures highlight the need for improved gripper pressure calibration or enhanced feedback mechanisms. The dual-interface monitoring (VGA + wireless dashboard) provides comprehensive system visibility, enabling effective debugging and user interaction throughout the solving process. 4.5 Resource Utilization Summary The FPGA implementation achieves efficient resource usage while providing hardware accel- eration for critical functions: CHAPTER 4. RESULTS 40 Table 4.4: FPGA Resource Utilization Summary Resource Utilization Percentage Adaptive Logic Modules (ALMs) 8,628 27% Block Memory (BRAM) 75 KB 2% DSP Blocks 18 20% Registers 4,441 – Table 4.5: FPGA Resource Utilization Summary Resource Type Utilization Percentage Adaptive Logic Modules (ALMs) 8,628 / 32,070 27% Registers 4,441 – Block Memory 75 KB / 4 MB 2% DSP Blocks 18 / 87 20% This efficient resource usage leaves significant capacity for future enhancements such as advanced image processing algorithms, additional sensor integration, or expanded monitoring capabilities. 4.6 Summary of Key Results The experimental results validate the effectiveness of the FPGA-based heterogeneous archi- tecture for Rubik’s Cube solving: • Successful Hardware Integration: Complete mechanical and electrical assembly op- erational • Functional Image Processing: FPGA-accelerated color detection working reliably • Reliable Communication: UART protocol providing robust HPS-ESP32 data exchange • Precise Motor Control: Accurate cube manipulation through coordinated stepper and servo control • Dual-Interface Monitoring: VGA hardware display and wireless dashboard both oper- ational • Complete Solve Capability: System successfully solving scrambled cubes end-to-end • Efficient Resource Usage: FPGA resources utilized effectively with room for expansion These results demonstrate that the hardware-software co-design methodology achieves the project objectives, providing a functional and reliable Rubik’s Cube solving robot with deterministic performance characteristics and comprehensive monitoring capabilities. Chapter 5 Discussion This chapter provides an analysis and interpretation of the results presented in Chapter 4, discussing the effectiveness of the heterogeneous FPGA-based architecture for Rubik’s Cube solving. The discussion examines how well the system meets its design objectives, identifies key strengths and limitations of the implementation, and explores challenges encountered during development. Finally, potential improvements and future research directions are presented to guide further development of FPGA-accelerated robotic systems. 5.1 Achievement of Project Objectives The primary objective of this project was to design and implement an automated Rubik’s Cube solving system combining hardware acceleration, embedded processing, and robotic actuation in a complete end-to-end solution. The experimental results demonstrate that this objective has been successfully achieved, with the system performing reliable cube solving operations from initial scanning through solution execution. 5.1.1 Cube State Acquisition and Color Extraction The FPGA-based color extraction system achieved 98.7% accuracy across 50 test runs, suc- cessfully detecting all six standard cube colors with deterministic processing times averaging 14.8ms per face. This performance validates the decision to use hardware acceleration for perception tasks. The threshold-based classification approach, while simple, proved effective under controlled lighting conditions and provided the deterministic timing characteristics that were a primary motivation for FPGA implementation. Compared to software-based vision pipelines discussed in the literature review, the FPGA approach eliminates concerns about CPU scheduling overhead and non-deterministic execution times. The consistent processing time per face demonstrates the advantage of dedicated hardware logic for time-critical perception tasks, supporting the project’s emphasis on real- time deterministic performance. However, the 1.3% error rate (composed of 1.2% false positives and 0.1% false nega- tives), primarily occurring between yellow and white facelet classification, indicates that the fixed threshold approach has limitations. The system’s reliance on controlled ambient lighting is a practical constraint that reduces flexibility compared to more adaptive vision systems. This trade-off between implementation simplicity and environmental robustness is character- istic of embedded vision systems and represents a reasonable balance for a proof-of-concept implementation. 41 CHAPTER 5. DISCUSSION 42 5.1.2 High-Level Processing and System Coordination The HPS successfully fulfilled its role as the main system coordinator, managing state transi- tions, validating cube faces, interfacing with the Kociemba solver, and coordinating commu- nication between the FPGA and ESP32 modules. The solution computation time averaging 1.6 seconds demonstrates efficient integration of the solving algorithm, while the zero packet loss rate in UART communication validates the robustness of the designed communication protocol. The heterogeneous architecture successfully separated concerns between perception (FPGA), computation and coordination (HPS), and actuation (ESP32). This modular de- sign approach proved valuable during development and debugging, as each subsystem could be tested and validated independently before full integration. The ability to monitor system state through both VGA hardware display and wireless dashboard simultaneously provided excellent visibility into system operation during both development and demonstration phases. 5.1.3 Mechanical Manipulation and Motor Control The ESP32-based motor control system successfully executed accurate 90-degree and 180- degree rotations with proper gripper coordination. The sensor-based alignment mechanism ensured consistent cube positioning before each solve cycle, contributing to the 93.3% overall success rate. The mean solve time of 46.1 seconds with 20-move average solutions is reason- able for a research prototype, though slower than some high-performance commercial solving robots. The two mechanical failures (cube slippage) observed during reliability testing highlight the importance of mechanical design in robotic systems. These failures occurred during rapid rotation sequences and could be addressed through improved gripper design, enhanced pressure calibration, or force feedback mechanisms. The fact that no failures were attributed to control logic or communication errors validates the electronic and software architecture. 5.1.4 Wireless Monitoring and User Feedback The dual-interface monitoring approach exceeded initial expectations by providing complemen- tary views of system operation. The VGA hardware display offers immediate, always-available status information independent of network connectivity, while the wireless dashboard provides rich interactive control and detailed cube state visualization. This redundancy proved valuable during development and demonstrates the flexibility of the heterogeneous architecture. The WiFi dashboard’s ability to visualize detected cube faces and monitor real-time solving progress addresses user experience requirements that were not explicitly stated in the original objectives but emerged as important during system integration. The HTTP API design allows easy extension to additional monitoring tools or integration with other systems. 5.2 Comparative Analysis with Literature 5.2.1 Architecture Advantages The heterogeneous architecture implemented in this project addresses several limitations iden- tified in the literature review. Unlike monolithic microcontroller-based designs that must time- share CPU resources between vision processing, solving computation, and motor control, the distributed architecture allows concurrent operation of specialized subsystems. The FPGA CHAPTER 5. DISCUSSION 43 coprocessor handles perception with deterministic timing while the HPS computes solutions and the ESP32 manages motor control without resource conflicts. Compared to pure FPGA implementations that implement entire systems in hardware logic, this hybrid approach provides better flexibility for algorithm updates and debugging while still achieving hardware acceleration for time-critical perception tasks. The HPS Linux environment enables use of existing libraries (libjpeg, V4L2) and simplifies integration of complex algorithms like the Kociemba solver, which would be impractical to implement efficiently in pure hardware description language. 5.2.2 Performance Comparison The 46.1-second mean solve time is slower than some high-performance hobby robots (often under 10 seconds) but acceptable for a research prototype emphasizing architectural explo- ration over speed optimization. The bottleneck analysis from Table 4.2 shows that move execution time (28.4 seconds for 20 moves, approximately 1.4 seconds per move) dominates total solve time, suggesting that mechanical optimization and faster motor sequences could significantly improve performance without architectural changes. The 98.7% color detection accuracy compares favorably with software-based vision sys- tems reported in academic literature, though direct comparison is difficult due to varying test conditions and lighting scenarios. The key advantage of the FPGA approach is not necessarily higher accuracy but rather deterministic timing and reduced CPU load on the main processor. 5.3 System Strengths and Contributions 5.3.1 Hardware-Software Co-Design Success The project successfully demonstrates the practical application of hardware-software co-design principles to a complex robotics problem. The strategic partitioning of tasks across FPGA fabric, ARM processor, and ESP32 microcontroller maximizes the strengths of each platform. This approach can serve as a template for other embedded robotics applications requiring real-time sensing, complex computation, and precise actuation. The modular architecture facilitates maintenance and future enhancements. For example, upgrading the solving algorithm or implementing machine learning-based color classification would require changes only to the HPS software, not to FPGA logic or ESP32 firmware. Similarly, adding new motor control modes or sensors could be implemented in the ESP32 without affecting other subsystems. 5.3.2 Communication Protocol Robustness The UART packet protocol with state machine-based parsing and error recovery proved highly reliable, achieving zero packet loss during all testing. The protocol’s simplicity (start byte, type, length, data, end byte) balances robustness with implementation efficiency. The state machine’s ability to resynchronize after detecting invalid bytes provides fault tolerance without requiring complex error correction codes. This communication architecture could be adapted for other heterogeneous embedded systems requiring reliable inter-processor communication. The success of this relatively simple protocol suggests that complex communication stacks (e.g., full TCP/IP between processors) may be unnecessary for many embedded robotics applications. CHAPTER 5. DISCUSSION 44 5.3.3 Dual Monitoring Interface Value The combination of VGA hardware display and wireless dashboard proved more valuable than anticipated. During development, the VGA display provided immediate debugging information when wireless connectivity was unavailable or problematic. During demonstrations, the dual displays allowed simultaneous viewing of low-level system state (VGA) and high-level cube visualization (dashboard) without requiring display switching or complex multiplexing. This approach to user interface design in embedded systems — providing both hardware- based and wireless software-based displays — may be applicable to other applications where system transparency and debugging visibility are important. 5.4 Limitations and Challenges 5.4.1 Image Processing Limitations The threshold-based color classification approach, while effective under controlled conditions, represents a significant limitation. The 1.3% error rate and sensitivity to ambient lighting re- strict the system’s operational environment. Users must ensure consistent lighting conditions, and the system may fail to detect colors correctly under fluorescent lighting, direct sunlight, or shadowed conditions. The fixed RGB threshold values were determined empirically during development and may not generalize well to different cube brands, worn stickers, or faded colors. A production system would require either an adaptive calibration procedure or a more sophisticated classification algorithm such as machine learning-based color recognition. The sequential processing of nine cells per face (rather than parallel processing) increases detection time and represents a trade-off between FPGA resource utilization and performance. While the 14.8ms per-face processing time is acceptable for this application, more time-critical applications might require fully parallel processing architectures. 5.4.2 Mechanical Design Constraints The 6.7% failure rate due to cube slippage highlights the importance of mechanical design in robotic systems. The current gripper design, while functional, does not provide force feedback or adaptive grip strength adjustment. The servo motors operate in position control mode without torque sensing, making it difficult to ensure consistent grip force across different cube conditions (new vs. worn, tight vs. loose). The mechanical assembly’s reliance on 3D-printed parts introduces potential variability in part dimensions and strength. While adequate for a research prototype, a production system would benefit from precision-machined metal parts or injection-molded plastic components for improved consistency and durability. 5.4.3 Timing Performance Considerations While the 46.1-second mean solve time is acceptable for a research prototype, it is significantly slower than high-performance hobby robots. The per-move execution time of approximately 1.4 seconds could be reduced through several approaches: • Faster motor acceleration profiles with dynamic speed adjustment • Optimized move sequencing to reduce cube reorientations CHAPTER 5. DISCUSSION 45 • Overlapping scanning and computation phases • Pre-positioning for the next move during current move execution However, these optimizations would increase mechanical complexity and control algorithm sophistication, potentially compromising reliability. 5.4.4 Power Consumption and Portability The current system requires external 12V and 5V power supplies for motors and electron- ics, limiting portability and deployment flexibility. While the FPGA-based image processing provides power-efficient color extraction compared to CPU-based alternatives, the overall sys- tem power consumption is dominated by motor drivers during operation and the DE1-SoC development board’s supporting circuitry. The FPGA fabric operating at efficient clock frequencies for image processing consumes significantly less power than equivalent software implementations on general-purpose proces- sors that would require higher clock rates and continuous CPU usage. However, the complete development board includes additional components (voltage regulators, I/O buffers, SDRAM) that increase baseline power consumption. Integrating a battery power system would improve usability for portable or demonstration purposes but would require: • Careful power management and voltage regulation design • Power profiling to identify idle and active power states • Dynamic power management strategies (clock gating, power domains) • Efficient motor driver selection and operation modes • Battery capacity sizing based on typical solve cycle energy requirements Future implementations targeting battery operation could benefit from custom FPGA boards optimized for low power rather than feature-rich development platforms, potentially reducing baseline power consumption by 50-70%. 5.4.5 Software Dependency and Maintainability The HPS runs embedded Linux, which provides flexibility and ease of development but intro- duces complexity in terms of boot time, filesystem management, and potential for software errors. The system depends on specific kernel drivers (V4L2 for camera) and libraries (libjpeg) that must be correctly configured in the Linux environment. This software stack complexity contrasts with the deterministic simplicity of the FPGA logic and may complicate long-term maintenance. 5.5 Challenges Encountered During Development 5.5.1 Camera Interface Integration Integrating the USB webcam with the HPS presented initial challenges due to driver compati- bility and V4L2 API configuration. The camera’s default settings produced inconsistent image CHAPTER 5. DISCUSSION 46 quality, requiring extensive experimentation with exposure, gain, and white balance parame- ters. The MJPEG decompression using libjpeg added processing overhead that initially caused frame timing issues, which were resolved through proper buffering and pipeline optimization. The RGB565 conversion and transfer to FPGA memory required careful attention to mem- ory alignment and cache coherency to avoid artifacts in the processed images. These low-level hardware-software interface issues are characteristic of embedded systems development and required iterative debugging with logic analyzers and memory inspection tools. 5.5.2 FPGA Timing and Resource Constraints The image processing pipeline initially exceeded available FPGA logic resources when imple- mented with full parallel processing for all nine cells. Redesigning for sequential processing reduced resource usage but required careful state machine design to avoid timing violations. Meeting timing closure while maintaining correct functional behavior required multiple itera- tions of RTL optimization and constraint refinement. The VGA display controller sharing memory bandwidth with the image processing pipeline occasionally caused visual artifacts during simultaneous operation. This was resolved through proper arbitration logic and buffering, but highlighted the importance of memory bandwidth management in FPGA designs with multiple concurrent data streams. 5.5.3 Communication Protocol Development Achieving reliable UART communication between HPS and ESP32 required careful attention to baud rate tolerance, flow control, and error handling. Initial implementations experienced occasional byte corruption due to timing mismatches and buffer overflows. The final state machine-based protocol with explicit start and end bytes emerged after several iterations to balance simplicity, robustness, and parsing efficiency. Debugging communication issues was complicated by the distributed nature of the system — errors could originate in the HPS transmitter, the UART hardware interface, the ESP32 receiver, or even timing interactions between subsystems. Developing effective debugging strategies (logging, LED indicators, protocol analyzers) was essential for successful integration. 5.5.4 Motor Control Calibration Achieving accurate cube rotations required extensive calibration of stepper motor step counts, servo position ranges, and timing sequences. The optical slot sensor provided alignment feed- back but required careful threshold setting to reliably detect the alignment position. Variations in mechanical assembly tolerance meant that calibration parameters determined on one system might not transfer directly to another, complicating reproducibility. The coordination between gripper servos and rotation stepper motor required precise tim- ing to avoid cube damage or slippage. The sequence of gripping, rotating, and releasing must be executed with proper delays to ensure mechanical settling time, but delays that are too long unnecessarily increase total solve time. Finding the optimal balance required empirical testing and iterative refinement. CHAPTER 5. DISCUSSION 47 5.6 Future Work and Potential Improvements 5.6.1 Enhanced Image Processing Machine Learning-Based Color Classification Replacing the fixed threshold approach with a machine learning classifier could significantly improve color detection accuracy and robustness to lighting variations. A small convolutional neural network (CNN) could be trained on diverse cube images captured under varying lighting conditions, learning to recognize colors based on contextual features rather than absolute RGB values. Implementation options include: • Training a lightweight CNN model in Python (TensorFlow/PyTorch) and deploying it on the HPS • Implementing a hardware-accelerated neural network inference engine in FPGA fabric • Using the Intel/Altera OpenVINO toolkit for optimized inference on ARM processors Hardware acceleration of neural network inference in FPGA fabric would maintain the deterministic timing advantages while improving classification accuracy. Recent research in FPGA-based CNN accelerators provides architectures that could be adapted for this applica- tion. Adaptive Lighting Compensation Implementing automatic lighting normalization or histogram equalization in the FPGA pipeline could improve robustness to ambient lighting variations. Techniques such as: • Dynamic threshold adjustmen