Saturday, October 14, 2017

Instruction Set and major decision change.

Good'ay Y'all!
The time has once again come for an update on White Blaze. First I have to get this out of the way. White Blaze will be little endian based: meaning that it will be like how our normal number system functions; 10 is greater than 01 as opposed to vise-versa: 01 being greater than 10. Funny thing happened. The best way I thought to organize the instruction set was through excel. It excelled at it! ;)
Here's an example.

NumOpCodeShortDescriptionFormatWasted Bytes
500000101CONSTInserts Constant into Register[OpCode, Constant, Unused, Address to]1

Although: One catch I found was that there were a lot of "Unused" or "Wasted" bytes, which started bugging me, and I thought about it some more and decided that I will use a variable length instruction set. The main reason for this is that because it's 8-bits and 256-bytes to share both RAM and Program memory, I am under some memory constraints and need any storage savings that I can get.
Based on that, this would be the updated pipeline cycle:

1. load OpCode
2. from OpCode, load N number of registers to a buffer
3. from buffer, execute command and repeat.

The pictures from the last post still apply as far as how the cycle will run. The only difference is that it will now have a maximum of 4 registers to load: but it will not be required to load that many.

Another unexpected design changed happened when I was entering in the instruction set. I realized that under the math category, both modulus (%) and division would require either an extremely complicated gate arrangement and/or multiple cycles from the clock. Trying to work out how to activate multiple cycles for one instruction was beyond what I wanted to do for AmberOS v.0. So for now it will be able to do addition, subtraction, and multiplication. Later for a future version I would like to implement modulus and division. In fact I think it would actually be a good exercise later to program in a division program to help get the process down.

Like I said I would last post, I was working for a while: and here are all the OpCodes that I have come up with so far. I might tweak/change this later. There are still a lot of unused OpCodes, but that's a good thing I believe: Leaves expansion room. But for now, this is what I'm designing White Blaze around.


There are 4 main categories that these instructions can be split up into on the right. These are math operations, data manipulation, program manipulation, and IO or input/output operations.
Math operations are pretty self explanatory; Data manipulation are operations that change data without being specifically logic or math; Program Manipulation will only update the line number; and Input/Output operations will move data to and from registers and other memory locations around the architecture.  Feel free to comment below if you believe I forgot an OpCode. 

Another thing I wanted to touch on today is what will not be specifically made out of logic gates for emulation. These will be mainly the driving clock and memory. 
As far as memory goes, the Arduino will store the flag bits responsible for keeping track of the pipeline stage, the 256-bytes of program memory, the 4-bytes of instruction set buffer, the line number register, and a 3 byte error code register which I will most likely cover a little later. All of these, though stored on the Arduino will be directly addressable to the logic gates and can in real hardware can be easily swapped with flip-flops or memory banks. 

Recently on my channel, I covered a different part of the project here in this video: 


I'm planning at some point in the project to also make a lighter weight simulator on the computer, probably from python, that won't actually use logic gates but will simulate the machine code and keep track of the memory. (This will allow easier code debugging before loading it onto the Arduino). I would like it also to allow me to write higher level Assembly that will be compiled down and be put into memory through the serial port of the Arduino. 

I'm excited to keep this project going! It's teaching me a lot and I hope this is entertaining for you guys too. 
See you here in two weeks! Or next week on my Youtube Channel: BitStream.

Sunday, October 1, 2017

Information Flow and Timings

Instead of starting at the logic gate level and moving outward. It's best I believe to start as a general overview. Almost like writing a novel. You have to step back and figure out a very generalized sense of the flow of things before you delve into the details. I've already outlined several points in the last post that I want to be a requirement.
Now the next stage is figuring out how I would like the processor to process that information. Here comes the pipeline. One major decision I had to make was whether I wanted to make a linear or non-linear pipeline. One advantage of having a non-linear pipeline is that the processor can have variable length instruction sets: so there can be more flexibility and potential memory saving. This is great, but can be more complicated both to design and program on the machine language level. I have decided for simplicity sake to go with a linear pipeline to allow easier design and programming.
Another thing I had to think about was instruction set size. It seemed logical to me to make it in lengths that are multiples of 8-bits because this is an 8 bit machine. There are 4 things minimum a computer must know to fulfill an instruction. Think of language. One can make a sentence from 2 words. "James walk." This is a noun and a verb. In a general sense a computer manipulates data based on other data: so there must be 2 "nouns" and 1 "verb". Then once that data is finished processing it needs to be put somewhere, and the processor needs to know where. So that's 3 "nouns" and 1 "verb". So in the end we have 4 pieces of data.
[Instruction, Data 1, Data 2, Result location] 
Which in more technical terms is
[opcode, register/constant 1, register/constant 2, register for output]
Beautiful. In the end that is 4 instructions of length 8 bit. So therefore we have a 32-bit instruction set. This may be re-ordered later, and depending on how many opcodes/instructions will be laid out, there may be room to squeeze some bits for different memory bank locations there. But for now we'll put that to the side.

Moving on though: I remember the day when those synapses' connected and it suddenly clicked how the clocks work on a computer. This discovery came when I was trying to figure out how to make logic gates move from one operation to another and was really frustrated. Then I thought "I need some kind of signal that changes regularly to activate the circuit and allow things to settle before changing again. This was until all of a sudden in a single burst it came to me: That's what a computer clock does! Then I quickly drew a diagram like this.

When the signal from the clock goes high, it follows the top path and based on the line number determines the operation to be performed. Once it finishes it's operation, it updates the line number for when the clock goes high. The clock then goes high and the cycle continues but in vice versa.

One thing that limits a computer is bandwidth. I was really wanting to design this computer with the notion of using multi-port memory. (Memory that can be written/read by multiple channels). But alas, reality has finally settled in and I am forced to design to the constraints of available equipment. If multi-port memory was widely available: I could design the CPU to take a half clock pulse and in that time read from memory, manipulate the data, and output all in one shot. But again, that's not reality. This 8 bit processor must read 8 bits at a time and output 8 bits at a time.

I then had to come up with a realistic flow and update the diagram. I at first was thinking of a tree like structure that uses flags to follow a path based on the bytes coming in, but I realized that this would get complicated real fast. That structure is better suited for a variable length instruction set. I then got the brilliant idea to load the chunks in first, then perform the operation in one swoop with the loaded bytes stored in temporary registers.


Based on that flow, this is the updated diagram


Here it says load chunk: This is synonymous with loading the byte.

I'm excited to be at this point. From here I need to design an instruction set and the layout more specifically memory and get finer details laid out for the White Blaze architecture