USB Protocol Specification

Introduction

This chapter deals with the way of organising the data flow between a computer equipped with the USB hub and various USB devices connected to it. USB is a serial bus meaning that there is only one line transmitting signals, so only 1 bit can be sent at one instance of time. Therefore, in order to provide error checking, flow control and to synchronize the devices, information is organised in the form of packets and frames. This, in turns, forces a standard header and a tail of each packet indicating the portion of data in between them. Since all types of information are exchanged using packets, they must be differentiated accordingly to their function and version of the protocol. The division into a number of fields of specific meaning defined by the specification allows easy identification and unambiguous interpretation.

Compatibility with the previous standards is also one of the determining features of the protocol. In the 2.0 standard, the maximal speed of the transmission was increased drastically but the older devices were still able to work with the new hubs because of the protocol, which enabled split or isochronous transmission and the microframes 8 times shorter than the normal frames. Therefore, one protocol can handle all of the transmission band classes: low, full and high.

The concept of the polled bus

The features of the USB protocol are forced mainly by its design solution - the polled bus. Every time the initiative to transfer data, configure a device, etc. comes from the root hub controller and is messaged to the clients by packages. It also makes it easy and cheap to implement, since the root hub is responsible for negotiations and control and very little amount of processing is left to the USB devices. The data sent via the bus are ordered according to Intel's Little Endian specification, i.e. bytes are written and read from the least significan bit (LSB) to the most significant bit (MSB). Therefore, the packet diagrams presented here should be read from the left hand side because this is the order in which they are sent. All packets are also subjected to some transparrent encoding/decoding procedures: NRZI and bit-stuffing.

Data Encoding/Decoding

The USB employs NRZI (Non Return to Zero Invert) data encoding when transmitting packets. In NRZI encoding, a 1 is represented by no change in level and a 0 is represented by a change in level. A string of zeros causes the NRZI data to toggle each bit time. A string of ones causes long periods with no transitions in the data. NRZI Scheme
NRZI Data encoding scheme [2]

Bit Stuffing

In order to ensure adequate signal transitions, bit stuffing is employed by the transmitting device when sending a packet on USB. A zero is inserted after every six consecutive ones in the data stream before the data is NRZI encoded, to force a transition in the NRZI data stream. This gives the receiver logic a data transition at least once every seven bit times to guarantee the data and clock lock. Bit stuffing is enabled beginning with the Sync Pattern. The data one that ends the Sync Pattern is counted as the first one in a sequence. Bit stuffing by the transmitter is always enforced, except during high-speed EOP. If required by the bit stuffing rules, a zero bit will be inserted even if it is the last bit before the End-of-Packet (EOP) signal. The receiver must decode the NRZI data, recognize the stuffed bits, and discard them.

Addressing structure

The structure of an USB system from the protocol point of view is a bit different than the physical construction. A device connected to a hub (here called a "host") is reffered to as "function" (with the following definition from the specification: "Function - a USB device that provides a capability to the host, such as an ISDN connection, a digital microphone, or speakers.")[2]. A function can possess couple of "endpoints" (" Device Endpoint - a uniquely addressable portion of a USB device that is the source or sink of information in a communication flow between the host and device.")[2]. A fuller description can be found at the end of this section in the chapter "Stages of transactions".

Packet fields

Some fields, like SYNC and PID are standard for all packets, whereas the others are specific for a particular group of packets, e.g. FrameNumber in the Start-of-Frame packets. Packet bit definitions are displayed in unencoded format. The effects of NRZI coding and bit stuffing have been removed for the sake of clarity. All packets have distinct Start- and End-of-Packet delimiters.

SYNC Field

All packets begin with a synchronization (SYNC) field, which is a coded sequence designed to provide a maximal transition density. It is used by the input circuitry to align incoming data with the local clock. A SYNC from an initial transmitter is defined to be eight bits in length for full/low-speed and 32 bits for high-speed. SYNC serves only as a synchronization mechanism and is not shown in the following packet diagrams. The last two bits in the SYNC field are a marker that is used to identify the end of the SYNC field and the start of the PID.

Packet Identifier Field

A packet identifier (PID) immediately follows the SYNC field of every USB packet. A PID consists of a four-bit packet type field followed by a four-bit check field as shown below. The PID indicates the type of packet and, by inference, the format of the packet and the type of error detection applied to the packet. The four-bit check field of the PID ensures reliable decoding of the PID so that the remainder of the packet is interpreted correctly. The PID check field is generated by performing a one s complement of the packet type field. A PID error exists if the four PID check bits are not complements of their respective packet identifier bits.

PID Field
PID Field [2]

PID Field
PIDs available in the USB 2.0 protocol [2]

Address Fields

Function endpoints are addressed using two fields: the function address field and the endpoint field. A function needs to fully decode both address and endpoint fields. Address or endpoint aliasing is not permitted, and a mismatch on either field must cause the token to be ignored. Accesses to non-initialized endpoints will also cause the token to be ignored.

Address Field. The function address (ADDR) field specifies the function, via its address, that is either the source or destination of a data packet, depending on the value of the token PID. As shown in the figure below, a total of 128 addresses are specified as ADDR<6:0>. The ADDR field is specified for IN, SETUP, and OUT tokens and the PING and SPLIT special token. By definition, each ADDR value defines a single function. Upon reset and power-up, a function s address defaults to a value of zero and must be programmed by the host during the enumeration process. Function address zero is reserved as the default address and may not be assigned to any other use.

Address field
Addres field of a packet [2]

Endpoint Field .An additional four-bit endpoint (ENDP) field permits more flexible addressing of functions in which more than one endpoint is required. Except for endpoint address zero, endpoint numbers are function-specific. The endpoint field is defined for IN, SETUP, and OUT tokens and the PING special token. All functions must support a control pipe at endpoint number zero (the Default Control Pipe). Lowspeed devices support a maximum of three pipes per function: a control pipe at endpoint number zero plus two additional pipes (either two control pipes, a control pipe and a interrupt endpoint, or two interrupt endpoints). Full-speed and high-speed functions may support up to a maximum of 16 IN and OUT endpoints.

Endpoint Field
Endpoint address field [2]

Frame Number Field

The frame number field is an 11-bit field that is incremented by the host on a per-frame basis. The frame number field rolls over upon reaching its maximum value of 7FFH and is sent only in Start-of-Frame tokens at the start of each (micro)frame. The framing in 1.1 as well as 2.0 (microframes) standard are shown in the picture below

Frames and Microframes
Comparison of normal- and microframes [2]

Data Field

The data field may range from zero to 1,024 bytes and must be an integral number of bytes. The diagram below shows the format for multiple bytes. Data bits within each byte are shifted out LSb first. Data Field Format Data packet size varies with the transfer type, eg. interuption transfer, control transfer or isochronous transfer.

Data Field
Data field (multiple bytes) [2]

Cyclic Redundancy Checks

Cyclic redundancy checks (CRCs) are used to protect all non-PID fields in token and data packets. In this context, these fields are considered to be protected fields. The PID is not included in the CRC check of a packet containing a CRC. All CRCs are generated over their respective fields in the transmitter before bit stuffing is performed. Similarly, CRCs are decoded in the receiver after stuffed bits have been removed. Token and data packet CRCs provide 100% coverage for all single- and double-bit errors. A failed CRC is considered to indicate that one or more of the protected fields is corrupted and causes the receiver to ignore those fields and, in most cases, the entire packet. For CRC generation and checking, the shift registers in the generator and checker are seeded with an allones pattern. For each data bit sent or received, the high order bit of the current remainder is XORed with the data bit and then the remainder is shifted left one bit and the low-order bit set to zero. If the result of that XOR is one, then the remainder is XORed with the generator polynomial. When the last bit of the checked field is sent, the CRC in the generator is inverted and sent to the checker MSb first. When the last bit of the CRC is received by the checker and no errors have occurred, the remainder will be equal to the polynomial residual. A CRC error exists if the computed checksum remainder at the end of a packet reception does not match the residual. Bit stuffing requirements must be met for the CRC, and this includes the need to insert a zero at the end of a CRC if the preceding six bits were all ones.

Token CRCs A five-bit CRC field is provided for tokens and covers the ADDR and ENDP fields of IN, SETUP, and OUT tokens or the time stamp field of an SOF token. The PING and SPLIT special tokens also include a five-bit CRC field. The generator polynomial is:

G(X) = X5 + X2 + 1.

Data CRCs The data CRC is a 16-bit polynomial applied over the data field of a data packet. The generating polynomial is:

G(X)= X16 + X15 + X2 + 1

Types of packets

Token Packets

Figure below shows the field formats for a token packet. A token consists of a PID, specifying either IN, OUT, or SETUP packet type and ADDR and ENDP fields. The PING special token packet also has the same fields as a token packet. For OUT and SETUP transactions, the address and endpoint fields uniquely identify the endpoint that will receive the subsequent Data packet. For IN transactions, these fields uniquely identify which endpoint should transmit a Data packet. For PING transactions, these fields uniquely identify which endpoint will respond with a handshake packet. Only the host can issue token packets. An IN PID defines a Data transaction from a function to the host. OUT and SETUP PIDs define Data transactions from the host to a function. A PING PID defines a handshake transaction from the function to the host. Token and SOF packets are delimited by an EOP after three bytes of packet field data. If a packet decodes as an otherwise valid token or SOF but does not terminate with an EOP after three bytes, it must be considered invalid and ignored by the receiver.

Token packet

Handshake Packets

Handshake packets, as shown below, consist of only a PID. Handshake packets are used to report the status of a data transaction and can return values indicating successful reception of data, command acceptance or rejection, flow control, and halt conditions. Only transaction types that support flow control can return handshakes. Handshakes are always returned in the handshake phase of a transaction and may be returned, instead of data, in the data phase. Handshake packets are delimited by an EOP after one byte of packet field. If a packet decodes as an otherwise valid handshake but does not terminate with an EOP after one byte, it must be considered invalid and ignored by the receiver.

Handshake packet
Handshake packet [2]

Types of handshake pockets
ACK indicates that the data packet was received without bit stuff or CRC errors over the data field and that the data PID was received correctly. ACK may be issued either when sequence bits match and the receiver can accept data or when sequence bits mismatch and the sender and receiver must resynchronize to each other (refer to Section 8.6 for details). An ACK handshake is applicable only in transactions in which data has been transmitted and where a handshake is expected. ACK can be returned by the host for IN transactions and by a function for OUT, SETUP, or PING transactions.
NAK indicates that a function was unable to accept data from the host (OUT) or that a function has no data to transmit to the host (IN). NAK can only be returned by functions in the data phase of IN transactions or the handshake phase of OUT or PING transactions. The host can never issue NAK. NAK is used for flow control purposes to indicate that a function is temporarily unable to transmit or receive data, but will eventually be able to do so without need of host intervention.
STALL returned by a function in response to an IN token or after the data phase of an OUT or in response to a PING transaction (see Figure 8-30 and Figure 8-38). STALL indicates that a function is unable to transmit or receive data, or that a control pipe request is not supported. The state of a function after returning a STALL (for any endpoint except the default endpoint) is undefined. The host is not permitted to return a STALL under any condition.
NYET a high-speed only handshake that is returned in two circumstances. It is returned by a highspeed endpoint as part of the PING protocol described later in this chapter. NYET may also be returned by a hub in response to a split-transaction when the full-/low-speed transaction has not yet been completed or the hub is otherwise not able to handle the split-transaction. See Chapter 11 for more details.
ERR a high-speed only handshake that is returned to allow a high-speed hub to report an error on a full-/low-speed bus. It is only returned by a high-speed hub as part of the split transaction protocol. See Chapter 11 for more details.

Start-of-Frame Packets

Start-of-Frame (SOF) packets are issued by the host at a nominal rate of once every 1.00 ms ±0.0005 ms for a full-speed bus and 125 µs ±0.0625 µs for a high-speed bus. SOF packets consist of a PID indicating packet type followed by an 11-bit frame number field as illustrated below.

SOF Packet
SOF Packet [2]

The SOF token comprises the token-only transaction that distributes an SOF marker and accompanying frame number at precisely timed intervals corresponding to the start of each frame. All high-speed and fullspeed functions, including hubs, receive the SOF packet. The SOF token does not cause any receiving function to generate a return packet; therefore, SOF delivery to any given function cannot be guaranteed.

Data Packets

A data packet consists of a PID, a data field containing zero or more bytes of data, and a CRC as shown below. There are four types of data packets, identified by differing PIDs: DATA0, DATA1, DATA2 and MDATA. Two data packet PIDs (DATA0 and DATA1) are defined to support data toggle synchronization. All four data PIDs are used in data PID sequencing for high bandwidth high-speed isochronous endpoints. Three data PIDs (MDATA, DATA0, DATA1) are used in split transactions. Data must always be sent in integral numbers of bytes. The data CRC is computed over only the data field in the packet and does not include the PID, which has its own check field. The maximum data payload size allowed for low-speed devices is 8 bytes. The maximum data payload size for full-speed devices is 1023. The maximum data payload size for high-speed devices is 1024 bytes.

Data packet
Data Packet [2]

PING packets

Ping packet is a class of packets used only with high speed devices. It is used to det the transation rate.

Stages of transactions

Handshake

Handshake procedures are different for different connection types: a host can send a IN or OUT query to a function. Depending on the state of the device, response can allow the host to write data to the its buffer or cancel the transaction. The same situation is possible in the inverse situation. All posibilities are described by the following tables:

Data transactions 1
Data transactions 1 [2]

Data transactions 2
Data transactions 2 [2]

Data transactions 3
Data transactions 3 [2]

Error Correction

An error can be detected thanks to CRC fields of a packet (described in the chapters referring to packet types) but some fields have their own error checking methods. The PID field has the negated duplicant bits and the error can also be detected if the bit-stuff convention is broken. The following table summarizes the procedures applied if an error is detected:

Error Correction procedures
Error checking responses [2]

Notice, however, that in case of a isosynchronous transaction, which takes place unidirectionally there is no room for sending NAK packets and re-receiving data, therfore this kind of transmission is reserved for streaming devices where error control is not important (cameras, etc.)

Split transaction

Split transactions enhance the performance of an USB 2.0 host working with a compliant hub, to which a collection of low/medium and high speed devices is connected. Using special token packets, the transmission can be split up into a high speed and normal band, simultaneously being transparent to the old devices. High-speed split transactions for interrupt and isochronous transfers must be allocated by the host from the 80% periodic portion of a microframe. A high-speed split transaction has two parts: a start-split and a complete-split. Split transactions are only defined to be used between the host controller and a hub. No other high-speed or full-/low-speed devices ever use split transactions. The scheme of split IN transaction is shown in the picture below: Split transaction
Split IN transaction scheme [2]

Abstract level of the protocol

The transmission of data between the hub and a endpoint provided by a function is represented by "pipes". It is a logical abstraction representing the association between an endpoint on a device and software on the host. A pipe has several attributes; for example, a pipe may transfer data as streams (stream pipe) or messages (message pipe). Pipe 0 is reserved and must be provided in every device's software.

Stream Pipe - a pipe that transfers data as a stream of samples with no defined USB structure.

Message Pipe - a bi-directional pipe that transfers data using a request/data/status paradigm. The data has an imposed structure that allows requests to be reliably identified and communicated.

It can be graphically represented in the following way:

[Protocol layers]
Protocol layers of the USB 2.0 standard [1]

References:

  1. USB.org Developers Resources [www.usb.org/developers]
  2. USB 2.0 Specification available at www.usb.org.org/developers/usb_20.zip