INTRODUCTION: This project consisted of the design of several components which, when put together, constitute a complete computer system. The approach used was modular in such a way that it could be possible to replicate these components to design a scalable system. Each component employs a uniform set of protocols which then allows the components to be connected in any assortment and allow the system to be scalable and parallel.
This was done through a new RISC CPU architecture complete with the compiler and assembler-linker required to run decently complex programs on the CPU. Also a boot loader was designed as to allow the CPU to be programmed and used in a variety of situations. However, security was not a main focus since this was beyond the scope of my project.
The CPU design is a new network driven architecture (NDMA) which allows each CPU to network with others in such a way that allows scalable multiprocessing. This would allow systems to be designed with parallelism in mind. The architecture was designed in such a way that the network component could then be used to drive the CPU itself. Many designs use a network by passing messages which are then interpreted by the CPU. This has the advantage of seamlessly passing data quickly between different processors.
However, send a packet of data to a CPU which doesn’t know what to do with it and it will simply look like noise. The design implemented in this project does not send data but rather sends instructions where all data must be encoded in pieces of code which the receiving CPU will then execute allowing increased flexibility in how parallel programs are designed.
Several other systems had to be designed for it to be possible for the observation that the project is actually working. This included the integration chain for the NDMA processor which I called the NDMA Suite. The Suite consists of the NDMA compiler back end for LCC, NDMA Merge which allows multiple assembly files to be integrated, the NDMA Assembler-Linker, and the NDMA Bootloader. Also some APIs were written for the GPU, PS2 Input Buffer, and network layer. Throughout the system interfaces must be upheld for asynchronous operation since many of the systems lived in different clock domains. In fact, the network layer was designed in such a way that it was self timed while the data that it received would be put through a controller which would clock the data so that it could be correctly passed to the CPU. This was possible through the use of the same handshaking protocol utilized system wide.
It can be noticed in the design that over time the design tended to evolve. This resulted in some redundancy in the design but it was deemed negligible and if needed this could be very quickly cleaned up. Cleaning up this redundancy
would simplify the hardware design and make the design quicker to implement on an FPGA or increased simplicity of this design ever went to layout.
Since most of this design project was original there were not many alternatives to the design choices I made and their implementation details. On the other hand it would have been best if I could have found an open CPU core that provided the integration chain as well (such as bootloader, assembler etc). This was not easy to find and I justified my original design by the fact that I am in fact creating open source intellectual property through this
project which is potentially useful for others.
With the above said I did not go ahead and write the compiler from scratch but rather used LCC which is an open source retargetable ANSI C compiler. Since this was open source the decision was valid. The other option would have been to write my own compiler and this combined with all of the other work I had to do with not a realistic goal even in the time that I had to work on this project.
Overall looking back at the design decisions I think that considering the scope and size of this project that it was very well planned out and that all of the requirements and specifications were met that I originally set out to accomplish. The hardware and simulations very much agree while the only thing that was not achieved in hardware was not originally speculated in the original design and only realized in simulation. This is explained more in the following report and specifically in the last part of the Results and Analysis.
Rationale and Motivations
The original idea behind this project originated in the realization that current computer models today do not take much advantage of parallelism yet being parallel systems. The truth is that a common computer system consists of
a main CPU but many of the other components and ASICs in a computer system also have a certain amount of computing power that is not utilized during the normal average operation of the machine.
One of the best examples of this is of a theoretical system that consists of a video card, a sound card, and a central CPU. If the demands of the system are such that the sound card or the video card are only being used to their full potential half of the time then we can clearly see that there could be a similar system designed that could somehow parallelize the resources such that when the system does not require the video card for its designated purpose it could then the resource can be shifted and used for a different purpose. For example, using this scheme, it would be possible to design audio cards that expose their DSP capabilities and video cards that expose their SIMD capabilities through some kind of interface which allows the system to reconfigure itself on the fly.
With this kind of system in mind I designed a CPU Architecture that was network driven. This CPU is then capable of either executing it’s instruction memory as is initialized by a standard boot loader or it can also dispatch instructions to other CPUs and likewise receive instructions from other CPUs. The idea behind this is that a CPU can then be ”told” what to do and then reply with actions as well. These capabilities were also built into the assembler such that through standard coding practice these capabilities can be accessed although most of the APIs must be written in assembly since the data sent through the pipes should be instruction based since the network layer and the CPU are completely transparent to one another and the only way communication can occur is through full 32 bit instructions.