The Cadence Tensilica Vision DSP

Home: IP Portfolio > Tensilica IP > Vision DSPs for Imaging and Vision

Vision DSPs for Imaging and Vision


IP for next generation image/video processing

Today’s applications processors are not equipped to handle the complex embedded imaging and vision digital signal processing functions in mobile handsets, drone, automotive, surveillance, and augmented reality (AR) / virtual reality (VR) markets. The Cadence®Tensilica® Vision digital signal processor (DSP) family offers a much-needed breakthrough in terms of energy efficiency and performance that enables applications never before possible in a programmable device.

The Tensilica Vision DSP family offers four Vision products. 

  • The Vision Q6 DSP is the latest DSP for embedded vision and AI built on a new, faster processor architecture. The fifth-generation Vision Q6 DSP offers 1.5X greater performance than its predecessor, the Vision P6 DSP, and 1.25X better power efficiency at the Vision P6 DSP’s peak performance.
  • The Vision P6 DSP, introduced in 2016, set a new standard in AI performance for a general-purpose embedded vision DSP by offering 4X the peak performance compared to the Vision P5 DSP.
  • The Vision P5 DSP, introduced in 2015, has been highly successful in the mobile market. It offers up to 4X-100X the performance relative to traditional mobile CPU+GPU systems at a fraction of the power/energy.

The Tensilica Vision DSP family covers a wide range of markets. It offers general-purpose imaging and vision products that were designed for the complex algorithms in imaging and computer vision, including innovative multi-frame noise reduction, video stabilization, high dynamic range (HDR) processing, object and face recognition and tracking, low-light image enhancement, digital zoom, and gesture recognition, plus many more. The Tensilica Vision DSP family also offers outstanding performance while running AI.

Offload the Host CPU for Intensive Vision and AI Apps

The Tensilica Vision DSP family offloads the host CPU for lower energy consumption running intensive imaging and vision apps. Multi-core host CPUs can’t handle these power-hungry, bandwidth-demanding applications, hardwired accelerators are restricted to a fixed set of functions, and GPUs offer pipelines that are not required or not efficient in image- and vision-processing applications. Now, the Tensilica Vision DSP family provides an imaging and vision specific programmable solution that is an ideal complement to the CPU/GPU. Imaging and vision algorithms can run on a DSP that’s specifically optimized for the imaging and vision functions required.

Vision DSP Family

Programmable and Customizable

The Tensilica Vision DSPs are synthesizable processors, with the configurability and extensibility that users have come to value from Cadence. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-, and 32-bit pixel processing for all Vision P5, P6, and Q6 DSPs. The Tensilica Vision DSP family is available as licensable, synthesizable IP with rich libraries and advanced software tools, allowing you to write your code in C/C++—no assembly code required. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-, and 32-bit pixel processing. The Vision DSP family was also architected to be used in solutions requiring multiple Vision DSPs to provide higher performance if required.

Processor Optimization

Because the Vision DSP family is built on our proven Tensilica Optimization Platform, further optimizations can be made to target your specific application. Please see the Xtensa®section  for all of the options available. All processors come with a complete hardware design and matching software tools, including a mature, world-class auto-vectorizing compiler, a cycle-accurate SystemC®-compatible instruction set simulator (ISS), and a full industry-standard GNU toolchain.

Vision DSP Family 

Vision P5 DSP

Vision P6 DSP

Vision Q6 DSP

Use Case Vision Vision and AI up to 256GMAC/sec Vision and AI up to 384GMAC/sec
MACs (higher MAC = higher compute) 8 x 8 64 256
8 x 16 64 128
16 x 16 32 64
Vector Floating Point Unit 16b half precision No 32-way SIMD (optional)
32b single precision 16-way SIMD (optional)
Max SIMD Width 64-way 8-bit

SuperGather

Yes
Data Rearrangement — Efficient switch between vectorization schemes Limited
Coefficient Decompression — Saves memory bandwidth No Yes
AXI interface — More AXIs = less sharing and higher memory bandwidth

2 AXIs 

128-bit bus bandwidth for instruction and data

 

5 AXIs 

128-bit bus bandwidth for instruction and data

Vision Q6, P6, and P5 DSPs for Vision and AI

Vision Q6 DSP Features and Benefits

A deeper, 13-stage processor pipeline and system architecture designed for use with large local memories enable the Vision Q6 DSP to achieve 1.5GHz peak frequency and 1GHz typical frequency at 16nm, in the same floorplan area as the Vision P6 DSP. As a result, designers using the Vision Q6 DSP can develop high-performance products that meet increasing vision and AI demands and power-efficiency needs.

  • An enhanced DSP instruction set results in up to 20 percent fewer cycles than the Vision P6 DSP for embedded vision applications/kernels such as Optical Flow, Transpose, and warpAffine, and for commonly used filters such as Median and Sobel
  • 2X system data bandwidth with separate master/slave AXI interfaces for data/instructions and 2-channel DMA alleviates memory bandwidth challenges in vision and AI applications, and also reduces latency and overhead associated with task switching and DMA setup
  • Backwards compatibility with the Vision P6 DSP, so customers can preserve their software investment for an easy migration 
  • Optional vector floating-point unit (VFPU) also supports half precision (FP16)

                                                       Vision Q6 DSP Block Diagram

Vision P6 DSP Features and Benefits

With new instructions, increased math throughput, and other enhancements, the Vision P6 DSP sets a new standard in imaging and computer vision benchmarks, increasing the performance by up to 4X compared to the highly successful Vision P5 DSP. For AI applications, the Vision P6 DSP boosts performance by up to 4X with quadruple the available MAC horsepower, which is a major computation block for AI applications. Compared to commercially available GPUs, the Vision P6 DSP will achieve twice the frame rate at much lower power consumption on a typical AI implementation. For a wide range of other key vision functions, such as convolution, FIR filters, and matrix multiplies, the Vision P6 DSP increases performance by up to 2X with its improved 8-bit and 16-bit arithmetic.

  • Processes 9728 bits per cycle 
  • Offers 256 MACs: 4X compared to Vision P5 DSP
  • Enhanced instruction set and instruction slotting
  • Fully software compatible with Vision P5 DSP
  • Optional VFPU with single-precision 32-bit and/or half-precision 16-bit floating-point support offers performance and flexibility for porting existing GPU code

 

                                                       Vision P6 DSP Block Diagram

Vision P5 DSP Features and Benefits

  • Offers up to 13X vision-processing performance improvement over the previous-generation Vision DSP
  • Processes 7168 bits per cycle
  • Optional vector floating-point unit (VFPU) with single-precision 32-bit floating-point support offers flexibility to provide high-precision math at a minimal area penalty

                                                       Vision P5 DSP Block Diagram

 VFPU

The Vision Q6, P6, and P5 DSPs provide an optional VFPU for those applications that need this precision or as a quick way to port existing code. The VFPU offers significant performance improvement with a very little area increase. The Vision P6 and Q6 DSPs offer optional support for a 32-way VFPU with half-precision (FP16) format.

Wide-Vector SIMD Data Processing for Superior Performance

The VLIW issue of vector operations gives an almost arbitrary mix of loads, stores, multiplies, and ALU operations, resulting in a rich set of pixel computations. Up to 320 operations can be issued per cycle and 256 of these can be ALU operations.

SuperGather

The Vision Q6, P6, and P5 DSPs integrate the highly sophisticated Tensilica SuperGather technology, which provides the ability to quickly and efficiently read/write from non-contiguous local memory locations. The SuperGather unit enables the full utilization of the available SIMD capabilities for algorithms such as warping, lens distortion correction, and canny edge tracing.

Imaging Instructions

The Vision Q6, P6, and P5 DSPs include many imaging-specific operations that accelerate 8-, 16-, and 32-pixel data types and video operation patterns. Some examples of these instructions are arithmetic operations (ADD, SUB, COMPARE, MUL, DIVIDE), bit manipulation operations, and data reorganization operations. 

 Vision P5 DSPVision Q6/P6 DSP
Number of bits processed per cycle 7168 9728
MACs 64 256
16-bit (FP16) VFPU support (optional) No Yes
32-bit (FP16) VFPU support (optional) Yes Yes

Highly Energy Efficient

The Vision Q6, P6, and P5 DSPs are highly energy efficient compared to CPUs or GPUs for all kinds of pixel operations.

High Performance

The Vision Q6, P6, and P5 DSPs offer a 5-way VLIW architecture, where each VLIW slot can perform 64-way SIMD 8-bit operations. The Vision family is designed to provide 320 operations per clock cycle.

The Vision Q6 and P6 DSPs can achieve even higher efficiency with its wide SIMD multiply-accumulates, offering significantly enhanced performance for the pixel filtering and image-analysis features common in computer vision applications.

Libraries, Software and Third-Party Support

OpenCV-Like Library Support

The Vision Q6, P6, and P5 DSPs come with over 1700 OpenCV-like functions. These functions are highly optimized to achieve the best performance on these DSPs. While OpenCV has over 2500 functions, Cadence has chosen the most common 1700 functions to optimize. Cadence continues to add more functions with quarterly library updates.

OpenVX 1.1

The Vision Q6, P6, and P5 DSPs are the first imaging/vision DSPs to pass Khronos™ Group’s conformance tests for the OpenVX™ 1.1 specification. Application developers can now take advantage of  Vision P5 and P6 DSP functionality without detailed knowledge of the hardware architecture and still achieve high performance. Cadence provides an application programming kit (APK) that supports all 40 library functions required by OpenVX 1.1. All of these functions are already fully optimized on the Vision P5, P6 and Q6 DSPs. Applications developed using the standard OpenVX 1.1 API can be compiled and run on Vision P5, P6, and Q6 DSPs without any code changes. Cadence's OpenVX framework automatically schedules and executes the appropriate DMA transfers for efficient memory access, and runs highly optimized DSP vision-processing kernels in parallel with the DMA transfers.

For more information on OpenVX 1.1 please contact us.

AI Software Support

The Vision Q6 and P6 DSPs support AI applications developed in the Caffe, TensorFlow, and TensorFlowLite frameworks through the Tensilica Neural Network Compiler. The Tensilica Neural Network Compiler maps neural networks into executable and highly optimized high-performance code for the target DSP, leveraging a comprehensive set of optimized neural network library functions. The Vision Q6 and P6 DSPs also support the Android Neural Network API (ANN) for on-device AI acceleration in Android-powered devices.

Rich Third-Party Application Software Support

Along with math library support, Cadence also supports a very rich set of third-party applications targeting the Vision DSP family. Some of these third-party companies offer video WDR, image stabilization, super resolution, CNN, and various ADAS applications. These applications are ported and optimized on our DSPs for fast time to market. 

See our list on our Partners page

Comprehensive Hardware and Software Design Tools

Our Proven, Comprehensive Hardware and Software Design Environment

Processor design process

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom processor or DSP along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software development process

For Software Developers

When you need to develop application code for a Tensilica processor, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. Cadence's Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

FPGA Platform

Cadence has developed a complete camera system, display system and Vision P6 DSP on a FPGA platform. The FPGA platform can be used to develop various vision and imaging applications. It has a CMOS sensor based camera connected over a MIPI interface and an LCD panel connected over another MIPI interface. It also has an HDMI input and output which provides a highly flexible platform for developing imaging and vision applications. Cadence has already developed various applications including face detection and people detection on this FPGA platform.

Vision DSP Family Literature and Other Resources

Press Releases

ArcSoft and Cadence Partner to Develop AI and Vision Applications

Cadence Boosts Vision and AI Performance with New Tensilica Vision Q6 DSP IP

Cadence Unveils Industry’s First Neural Network DSP IP for Automotive, Surveillance, Drone and Mobile Markets

Cadence Tensilica Vision P-Series DSPs are Industry’s First Imaging/Vision DSPs Certified by Khronos as OpenVX 1.1 Conformant

Related Topics

Learn more about Convolutional Neural Networks (CNN) and download presentations from our Embedded Neural Network Summit.

Chalk Talk

Podcast

Read Blogs on Vision DSP

 A New Era Needs a New Architecture: The Tensilica Vision Q6 DSP

 The Road Ahead for Neural Networks in Embedded Systems

 Q&A: Drones, Robots, and the New Tensilica Imaging/Vision DSP

Articles

SemiWiki: CPU, GPU, H/W Accelerator or DSP to Best Address CNN Algorithms?

eeNews Europe: Neural Network processor boosts performance of radar, lidar, vision applications

Embedded Computing Design: Neural net DSP IP pushes the performance envelope

Electronic Design: DSP Takes on Deep Neural Networks

BDTi: Next-Gen Cadence Tensilica Processor Core Claims Big Performance, Energy Consumptions Gains

EEJournal Chalk Talk: Cadence Tensilica Vision P5

Watch Videos on Vision

In this video, Pulin Desai talks about Cadence Demonstration of 360° Surround View. This video is © 2018 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of Deep Learning-based Image Classification Using XNNC-generated Code. This video is © 2018 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of YOLO Deep Learning-based People Detection. This video is © 2018 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of the AlexNet Convolutional Neural Network. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of Intelligent Image Up-Resolution. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of Stereo Camera-based Depth Mapping. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.