# HD Video Data Transaction using Bus Mastering DMA through PCI express

# Santosh Kumar B<sup>1</sup>, Dr.E Krishna kumar<sup>2</sup>

<sup>1</sup>Electronics and Communication Engineering, East Point College of Engineering and Technology, Bengaluru, India <sup>2</sup>Electronics and Communication Engineering, East Point College of Engineering and Technology, Bengaluru, India

<sup>1</sup>santoshkumarcsenhce@gmail.com, <sup>2</sup>krishnakumar.epcet@gmail.com

Article History Received: 10 January 2021; Revised: 12 February 2021; Accepted: 27 March 2021; Published online: 28 April 2021

Abstract— The PCI express protocol is a novel high-speed transmission bus protocol in the industry. This paper describes a bus mastering implementation of the PCI Express protocol using a Xilinx FPGA for smooth HD video transactions [1]. The suite contains DMA controller hardware IPs, test benches, Windows driver and user application for testing transactions. Measured experimental results confirm that Xilinx Bus mastering DMA produces measured transfer speeds up to 1508MB/s (write) and 930MB/s (read) using the Xilinx ML605 development board. We hope that this project will help the additional PCI associated invention. Technologies are becoming smarter and compact daily, so we hope PCI protocol and PCI interface will add a new dimension to that development. This interface will play an important role with its significant speed and reliability. Keywords-PCI express, Bus Mastering DMA, Xilinx FPGA, Simulation.

#### 1. Introduction

Real time HD video processing usually lead to the production of larger amounts of data[2]. This requires improved mechanisms for entering data into the processing units and peripheral devices. As the transfer rates are increasing, peripheral devices can gain from becoming masters on the peripheral bus (in our case PCIe), thus save crucial time by reducing protocol overhead and extra mappings required. Hence the peripheral device reduces the load on the Host DMA and in the same time operates more efficiently. This is the principle behind Bus Mastering DMA[3]. We have developed a communication suite that contains hardware IPs, test benches, Windows driver and user application [4].

#### 2. **Pci Express and Bus Mastering Dma**

The PCI Express (PCIe) bus represents the third generation of the PCI (Peripheral Component Interconnect) bus, with high performance, reliability and software compatibility compared to the previous generations PCI and PCI-X. PCI and PCI-X generations, which are shared parallel buses, whereas the PCIe is a serial bus, which uses a serial point-to-point interconnect for communication between two peripheral devices. A serial interconnect eliminates the disadvantages of a parallel bus, especially the difficulty of synchronization between multiple data lines due to the asymmetrical signal propagation (skew). The performance of PCIe bus is scalable, which is obtained by implementing a variable number of communication lanes per interconnect, based on performance requirements for that interconnect. For the serial interconnect a packet-based communication protocol is used. Instead of special signals for various functions, such as interrupt signaling, error handling, or power management, both data and commands are transmitted in packets. By this the pin count of devices and their cost are reduced [4].

The term Bus Master, used in the context of PCI Express, indicates the ability of a PCIe Endpoint device to initiate PCIe transactions for the movement of data to (Memory Writes) and from (Memory Reads) system memory. For large HD video data transfers, DMA implementations result in higher data throughputs, offloads the CPU from directly transferring the data, resulting in better overall system performance through lower CPU utilization[5].

### 3. Implementaion

Fig. 1 shows typical system architecture that includes a Host with a system memory, root complex and an Integrated Endpoint block for PCI express. A DMA transaction either transfers data from an integrated endpoint block for PCI Express buffer into system memory or from system memory into the integrated endpoint block for PCI express buffer.



The main loop of the control state machine runs down the left side of Fig. 2. A number of conditions are polled within the loop, until one of those conditions is triggered by an event. The first condition tests any interrupt has happened. The next check is for any pending interrupts. If any, these must be processed first in order to avoid a deadlock condition. Pending interrupts can be one of types: Memory Write, Memory Read or Read Reply. If it is Memory Write, integrated End point block transfers HD video data from buffer to the host memory through root complex, else if it is Memory Read, HD video data transferred from system memory through root complex, and read reply is initiated as an acknowledgment for Memory Read. If it is read reply, TAG is read for address and HD video data transferred to integrated endpoint buffer. The last two checks in the main loop of the state machine are for generating new DMA transactions on the bus. These are last in order to avoid stalling of previous DMA operation over DMA generation.DMA transactions in both directions is initiated by the integrated endpoint which fills in the source or destination address for the transaction, and the number of DWORDs that are to be transferred .



In order to verify the functionality of Bus mastering DMA transaction through PCI express, simulation was performed using Xilinx simulator and Chip-Scope Pro and simulation results as shown in Fig. 3.



Fig. 3. Simulation results.

After functional verification, it will be burnt to the Xilinx FPGA ML605 development board, the memory read/write speed was tested under Windows operating system, experiments show that, our Bus mastering DMA transaction method for PCI Express devices can reach 1508MBytes/s WRITE speed and 930MBytes/s READ speed as shown in Fig. 4. The design proposed reasonably satisfies the demands for smooth HD video data transactions.



Fig. 4. GUI for DMA Tansactions on Windows XP.

## 5. Conclusion

This paper has presented one implementation which drives the study and application of PCI express using the Xilinx PCI express core for smooth HD video data transactions. The suggested idea could lead to measured improvements in read and write transactions that are more closely approximate the theoretical limit of 1000 MByte/Sec for smooth HD video data transactions.

## References

- 1. A. F. Harvey, 1991, "DMA Fundamentals on Various PC Platforms", National Instruments Application note 011.
- 2. Bittner, Ray, Speedy Bus Mastering PCI Express, 22nd International Conference on Field Programmable Logic and Applications (FPL 2012).
- 3. Santosh Kumar B," Review of Bus Mastering DMA through PCI express", Journal of Advanced Research in Dynamical and Control Systems JARDCS, 1943023X, Vol. 11, Special Issue-08, Page 2442-2445, 2019.
- 4. Hossein Kavianipour, Steffen Muschter and Christian Bohm, "High Performance FPGA based DMA Interface for PCIe", IEEE 2012 978-1-4673-1084-0/12.
- 5. Shanley, T., Anderson, D., PCI System Architecture, Fourth Edition, MindShare Inc., Addison-Wesley Developer's Press, 1999.
- 6. Jason Lawley, "Bus Master Performance Demonstration Reference Design for the Xilinx Endpoint PCI Express Solutions", Xilinx Application Note 1052 (V3.3) April 3, 2015.
- 7. NorthWest Logic [Online]. Available: http://nwlogic.com/products/ pci-express-solution/.

- Xilinx LogiCORE<sup>™</sup> IP Endpoint Block Plus vl.9 for PCI Express User Guide [J]. America, 2008.
  PCI-SIG, "PCI Express Base Specification Revision 3.0", November 10, 2010.