Thursday, May 8, 2014

All About NVME

NVMe stands for Non-Volatile Memory over PCIe. Designed for SSD and for low latency response.

Architecture of NVME on linux looks like  this

NVMe controller register provides BAR0 and BAR1 for mapping internal control register.

NVMe HCI model has concept of Completion Queue, Submission  Queue and Doorbell register.

There are 2 type of Queues
1) Admin Queues
2) I/O Queues

Host Software creates Admin Queue first (Admin Queue Structure Initialization etc..)

Host uses Admin Commands (Submitted to Admin Queue) to create I/O queue pair (Submission and Completion Queue)

Below is the layout of Control Register of NVMe. Host writes to Admin SQ (0x28h) and CQ Base (0x30h) Address in local memory mapped address.

Important Registers
Admin Queue Attributes (AQA: 0x24h)  ASQ0 Size/ACQ0 Size.

Assign base address to ASQ and ACQ based on ASQ and ACQ size to submit any admin command.

Host create I/O Submission and Completion Queue by putting Admin command in new Admin Queue.

Some of the Admin Commands are
1) Delete IO SQ
2) Create IO SQ
3) Create IO CQ
4) Delete IO CQ
5) Identify,
6) Firmware Activate/Image Download

Multiple I/O Submission Queues are possible
1) Load Distribution across CPU cores
2) One CQ serving multiple SQ.
3) Avoid locking overhead.
4) Queue priority

Once Submission Queue is created host can submit I/O Commands

Support IO Commands are
1) Flush
2) Write
3) Read

Submitting IO Command Host places address of data buffer into submission queue and trigger SQ tail Doorbell register.

NVME Doorbell follows a Producer/Consumer model

Host acts as
1) Producer of commands -> updates SQ Tail Pointer
2) Consumer of completions -> updates CQ Head Pointer

Controller acts as
1) Consumer of Commands ->update SQ Head Pointer
2) Producer of completions -> updates tail of CQ pointer

Lets consider a scenario

Initial State

SQ1 = { empty }
CQ1 = { empty }
SQ1TailDB = {0}
SQ1HeadDB = {0}
CQ1TailDB = {0}
CQ1HeadDB = {0}

Host add 3 commands

SQ1= {CMD0, CMD1, CMD2, ..... };
SQ1TailDB = {3}

Controller Fetches 3 commands
SQ1HeadDB = {3}
SQ1= {empty}  //marked empty

Controller Post completions (Let's say it post 2 completions)
CQ1 = {CMD0, CMD1, empty ......}
CQ1TailDB = {2}

Host is interrupt when CQ1TailDB is updated
Host reads CQ1 and update CQ1HeadDB.

CQ1  = {empty}

 Each command submitted to SQ is 64bytes in size. Command DW0, NSID, Metadata pointer, PRP Entry 1 and PRP Entry 2 have common definitions for all Admin Commands and NVM commands.

Command DW0 format is defined in below figure.