Grainuum Architecture
This is the architectural overview of GrainuumUSB.
There are lots and lots of books and sites available that describe the top
part of this document, but not too many go into detail on the lower
part.
In this chart, the USB-PHY-LL is the only part written in assembler, and
could be replaced with something else if we wanted to port this project
to a different CPU or architecture. Everything else is C.
We'll start below the bottom of the stack there, with USB electrical work.
Then we'll move up to the USB low-level PHY, describe some of the USB
PHY, mention the State, and generally gloss over the User Code.
USB Electrical
Usually four wires (+5V, D+, D-, GND)
LS vs FS vs HS vs SS
LS has a pull-up resistor across D-
Data lines on LS are 3.3V (oops!)
“A USB transceiver is required to withstand a continuous short circuit of
D+ and/or D- to V BUS, GND, other data line, or the cable shield at the connector,
for a minimum of 24 hours without degradation.” (7.1.1)
Palawan USB Schematic
Palawan USB Schematic
Palawan USB Traces
Detecting USB
If we fab out this board and plug it into a USB port without
loading any firmware on it, the pins float and that pullup
resistor takes effect. This configuration is enough to
convince the host that we're a real device, and it tries to
talk to it.
You can see the host detected this board as a USB Low Speed device.
It then gives Error -32 messages. If we look that up, we see
that it's EPIPE, which means a broken pipe.
Indeed, the host is trying to access the Endpoint 0 pipe, which
doesn't exist because there is no firmware yet. This error means
that the hardware is looking good, and it's time to start on the
firmware.
USB Signaling
Now that we know what's doing the signalling, we need to know
what signals are being sent across the wires.
States
Last part of a USB packet
This is the end part of a USB packet. Conveniently, it shows
us the three electrical states we'll encounter.
States
K-state
K-state is when both D+ and D- are opposites of one another.
States
J-state
J-state is the opposite of K-state. When programming the
USB stack, it's very easy to mix up D+ and D- because K-state
and J-state are complete opposites, and often times it doesn't
matter which one is which.
States
SE0-state
SE0 is where both lines are driven low. It indicates the
end of a packet. It is the only time we need to consult
both the D+ and D- lines.
States
SE1-state?
There is no SE1 state. If SE1 occurs, it's due to a bug,
or more likely something shorting one of the data lines.
There's no scope trace because this state is never seen.
Decoding States
Transition
Value
J → J
0
J → K
1
K → J
1
K → K
0
To decode the bit value, compare the state with the previous state.
This is much like an XNOR gate. We don't have an XNOR opcode,
so when the time comes to decode we'll need to NOT and then XOR.
8-bit preamble / sync at start of packet
Ages to respond to an interrupt
Data section
SE0 signals EOP
Bit stuffing
Phy: What's important?
Important
Not Important
Timing and sync
Bit stuffing
Errors:
keepalive
framing
overflow
6 bits to read sync
Only check one wire
Miss up to 3 packets
What about signal integrity?
Long, uneven traces are fine
What about signal integrity?
Integrity still really good [eye diagram]
What about signal integrity?
Faster is not always better
I sent this photo to Bunnie, because it was failing the eye diagram
test. Turns out the outputs defaulted to "high slew rate", which
caused the signal to overshoot and head into the red parts. You can
see that on the top and bottom of this photo I took with my phone.
Changing it to "slow slew rate" got rid of the overshoot.
Tip: anything under about 15 MHz is "slow", and you should use a
slow slew rate for your output.
What about signal integrity?
If you look very very closely at the output, you'll see that the
bits don't even come out at exactly the same time! This turns
out to be okay, because our skewed bits are still within spec (7.1.2.1)
Using a 32.768 kHz Crystal
32768 × 1464 = 47.972352 MHz
Close enough!
For one project, we were souring the clock from a 32.768 kHz watch crystal,
because it turns out affordable 32 kHz crystals are hard to find, whereas
everyone sells 32.768 kHz watch crystals.
This chip had a fixed multiplier available of 1464, which if you do the math
yields 47.972352 MHz.
Turns out that's close enough for LS to work.
Development System Setup
I'm going to take a break for a moment to describe the hardware
that I use to develop this project.
This is my development environment. It is a Raspberry Pi
running OpenOCD. There are three wires running to the device
being debugged: SWD, Clock, and a reset line. The reset line
is optional, but it makes it easier to program devices that
are brand-nes and unprogrammed.
There aren't even any passive components, it's just a straight
connection between the two devices.
This is the cheapest solution we've found to do development on
small embedded devices. Even better, it's cheap enough that
if we want to have someone else help with development, this
setup is cheap enough to mail to someone along with the hardware
to dev on.
The Raspberry Pi contians enough horsepower to run a C compiler
in addition to OpenOCD, meaning all the person has to do to get
up to speed on developing for an embedded device is to plug in
a keyboard, mouse, and TV, and they're done. No drivers to set
up, no proprietary JTAG boxes, no download links to compilers.
If it works for me, it'll work for them.
Other Hardware
Other Hardware
Other Hardware
Other Hardware
Low-Level API
int usbPhyReadI(struct GrainuumUSB *usb,
uint8_t samples[11]);
void usbPhyWriteI(struct GrainuumUSB *usb,
const uint8_t buffer[11],
uint32_t count);
grainuum-phy-ll.s
void grainuumCaptureI(struct GrainuumUSB *usb,
uint8_t packet[12]);
grainuum-phy.c
This is the low-level API. It consists of two functions, one that writes
values out to the pins as a USB packet, and one that reads a USB packet
from pins.
These functions take care of packing and unpacking the K and J states, and
must be called with interrupts disabled because they're very timing-sensitive.
They could be replaced on other chips to get a portable solution.
Run from RAM to get cycle-accuracy
.section .ramtext
grainuum-phy-ll.s
.data :
{
. = ALIGN(4);
*(.data .data.*)
. = ALIGN(4);
*(.ramtext)
} > ram AT > flash
KL02Z32-app.ld
Use registers for storing data
mov wpaddr, wdpclrreg // 1 cycle
ldr wnaddr, [sp, #0] // 2 cycles
ldr rval, [rreg] // 1 cycle (Single-cycle port)
Hook the writer to the reader for testing
FPGA designers have test benches
Being able to test corner cases is handy
Bit stuffing, word boundaries
USB signal generator
Low-Level C API
void grainuumConnect(struct GrainuumUSB *usb);
void grainuumDisconnect(struct GrainuumUSB *usb);
void grainuumReceivePacket(struct GrainuumUSB *usb); // (weak)
void grainuumCaptureI(struct GrainuumUSB *usb,
uint8_t packet[12]);
grainuum-phy.c
USB Packets
PID
Handshake packet (NAK, ACK)
Token packets (IN, OUT, SETUP)
Data packets (DATA0, DATA1)
LS has a maximum of 8-byte packets
Some sort of CRC (CRC-16 or CRC-5)
Reading: Responding
6.5 bit-times to respond (7.1.18.1)
Examine first byte of data:
IN? Send data, or send NAK if we have no data
OUT, SETUP, DATA0 or DATA1? Send ACK right away
Otherwise, we can worry about it later
Tip: Clever array [mis]-alignment
Use GRAINUUM_BUFFERs
PID is only useful in determining response
Actual data begins in byte 1
Actual data ends up on word boundary
CRC-16 ends up on word boundary
OUT/IN packet ends up on word boundary
GRAINUUM_BUFFER()
Frames: What's important?
Important
Not Important
Quick response
Frame sequence
Outgoing CRC-16
DATA0 vs DATA1
NAK allows for deferring
No need to generate CRC-5
We can ignore CRC-16
Can ignore "address"
USB State Machine
USB 2.0 spec figure 8-32, © 2000 Compaq Computer Corporation, Hewlett-Packard Company, Intel Corporation, Lucent Technologies Inc., Microsoft Corporation, NEC Corporation, Koninklijke Philips Electronics N.V.
USB State Machine API
int grainuumSendData(struct GrainuumUSB *usb,
int epnum,
const void *data,
int size);
void grainuumProcess(struct GrainuumUSB *usb,
const uint8_t packet[12]);
grainuum-state.c
User Code
struct GrainuumConfig {
get_usb_descriptor_t getDescriptor;
usb_set_config_num_t setConfigNum;
usb_get_buffer_t getReceiveBuffer;
usb_data_in_t receiveData;
usb_data_out_start_t sendDataStarted;
usb_data_out_finish_t sendDataFinished;
void *data;
};
Important
Not Important
getDescriptor
getReceiveBuffer
receiveData
setConfigNum
sendDataStarted
sendDataFinished
*data
Descriptors, etc.
LS is limited to Interrupt and Control endpoint types
Only two additional EPs beyond EP0 (8.3.2.2)
Setup process
Annoyance: Breaking into the debugger sometimes kills USB
Tip: Global variables are handy
int loop_count;
void loop(void) {
loop_count++;
}
(gdb) load
'minimal.elf' has changed; re-reading symbols.
Loading section .text, size 0x1b0 lma 0x5900
Loading section .data, size 0x3a8 lma 0x5ab0
Start address 0x5938, load size 1368
Transfer rate: 5 KB/sec, 684 bytes/write.
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000141c in ?? ()
(gdb) p loop_count
$1 = 91
(gdb)
Tip: Easier debugging
(gdb) b chSysLock
Breakpoint 1 at 0x3c (49 locations)
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y <MULTIPLE>
1.1 y 0x0000003c <_vectors+60>
1.2 y 0x00002030 in radioSend at ../os/ext/CMSIS/include/core_cmFunc.h:342
1.3 y 0x000021ca in usb_worker_thread at ../os/ext/CMSIS/include/core_cmFunc.h:342
1.4 y 0x00003018 in lockSystem at ../os/ext/CMSIS/include/core_cmFunc.h:342
1.5 y 0x0000309a in suspendThread at ../os/ext/CMSIS/include/core_cmFunc.h:342
1.6 y 0x000030da in suspendThreadTimeout at ../os/ext/CMSIS/include/core_cmFunc.h:342
...
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
Cannot insert hardware breakpoint 1.
...
Cannot insert hardware breakpoint 1.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
(gdb)
Borrow from Javascript
#define debugger asm("bkpt #0")
uint8_t USB_Available(uint8_t ep) {
if (rx_buffer_tail == rx_buffer_head) {
debugger;
return 0;
}
if (ep != rx_buffer_eps[rx_buffer_tail]) {
debugger;
return 0;
}
return 1;
}