The Cadence Tensilica Vision DSP

Home: IP Portfolio > Tensilica IP > Vision DSPs for Imaging and Neural Networks

Tensilica Vision DSPs for Imaging, Computer Vision, and Neural Networks

Built for Next-Generation Imaging/Vision/Neural Network Requirements

IP for next generation image/video processing

Today’s applications processors are not equipped to handle the complex imaging, computer vision, and neural network (NN) digital signal processing functions in mobile handsets, tablets, DTVs, drone and automotive, video game, and high-end wearables. The Cadence® Tensilica® Vision digital signal processor (DSP) family offers a much-needed breakthrough in terms of energy efficiency and performance that enables applications never before possible in a programmable device.

The Tensilica Vision DSP family offers three Vision products. 

The Vision P5 DSP, introduced in 2015, has been highly successful in the mobile market. It offers up to 4X-100X the performance relative to traditional mobile CPU+GPU systems at a fraction of the power/energy. 

The Vision P6 DSP, introduced in 2016, set a new standard in NN performance for a general-purpose imaging and computer vision DSP by offering 4X the peak performance compared to the Vision P5 DSP. 

The Vision C5 DSP, introduced in 2017, is the industry’s first standalone, self-contained NN DSP IP core optimized for vision, radar/lidar, and fused-sensor applications with high-availability NN computational needs. Targeted for the automotive, surveillance, drone and mobile/wearable markets, the Vision C5 DSP offers 1TMAC/sec computational capacity to run all NN computational tasks.

The Tensilica Vision DSP family covers a wide range of markets. It offers general-purpose imaging and vision products that were designed for the complex algorithms in imaging and computer vision, including innovative multi-frame noise reduction, video stabilization, high dynamic range (HDR) processing, object and face recognition and tracking, low-light image enhancement, digital zoom, gesture recognition, plus many more. The Tensilica Vision DSP family also offers outstanding performance while running NNs, including the stand-alone Vision C5 DSP for always-on NN applications.

Offload the Host CPU for Intensive Imaging and Vision Apps

The Tensilica Vision DSP family offloads the host CPU for lower energy consumption running intensive imaging and vision apps. Multi-core host CPUs can’t handle these power-hungry, bandwidth-demanding applications, hardwired accelerators are restricted to a fixed set of functions, and GPUs offer pipelines that are not required or not efficient in image- and video-processing applications. Now, the Tensilica Vision DSP family provides an imaging-specific programmable solution that is an ideal complement to the CPU/GPU. Imaging and vision algorithms can run on a DSP that’s specifically optimized for the imaging and vision functions required.

Vision DSP Family

Programmable and Customizable

The Tensilica Vision DSPs are synthesizable processors, with the configurability and extensibility that users have come to value from Cadence. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-bit pixel processing for all Vision P5, P6, and C5 DSPs, and 32-bit pixel processing for Vision P5 and P5 DSPs. The Tensilica Vision DSP family is available as licensable, synthesizable IP with rich libraries and advanced software tools, allowing you to write your code in C/C++—no assembly code required. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-, and 32-bit pixel processing. The Vision DSP family was also architected to be used in solutions requiring multiple Vision DSPs to provide higher performance if required.

Processor Optimization

Because the Vision DSP family is built on our proven Tensilica Optimization Platform, further optimizations can be made to target your specific application. Please see the Xtensa section  for all of the options available. All processors come with a complete hardware design and matching software tools, including a mature, world-class auto-vectorizing compiler, a cycle-accurate SystemC®-compatible instruction set simulator (ISS), and a full industry-standard GNU toolchain.

Vision DSP Family 

Vision P5 DSP

Vision P6 DSP

Vision C5 DSP

Use Case Imaging Imaging and Low-End NN

Mid- and High-End NN (Always-On NN)

MACs (higher MAC = higher compute) 8 x 8 64 256 1024
8 x 16 64 128
16 x 16 32 64 512
Vector Floating Point Unit 16b half precision No 32-way SIMD (optional) No
32b single precision 16-way SIMD (optional) No
Max SIMD Width 64-way 8-bit 128-way 8-bit

SuperGather

Yes No
Data Rearrangement — Efficient switch between vectorization schemes Limited Extensive
Coefficient Decompression — Saves memory bandwidth No Yes Yes
AXI interface — More AXIs = less sharing and higher memory bandwidth

2 AXIs

128-bit bus bandwidth for instruction and data

Vision C5 DSP for Neural Networks

Vision C5 DSP Features and Performance

The Vision C5 DSP offers class-leading NN performance in a self-contained engine:

  • 1TMAC/s computational capacity (4X greater throughput than the Vision P6 DSP) in less than 1mm2 silicon area provides very high computation throughput on deep learning kernels
  • 1024 8-bit MACs or 512 16-bit MACs for exceptional performance at both 8-bit and 16-bit resolutions
  • VLIW SIMD architecture with 128-way, 8-bit SIMD or 64-way, 16-bit SIMD
  • Architected for multi-core designs, enabling a multi-TMAC solution in a small footprint
  • Integrated iDMA and AXI4 interface
  • Uses the same proven software toolset as the Vision P5 and P6 DSPs
  • Compared to commercially available GPUs, the Vision C5 DSP is up to 6X faster in the well-known AlexNet CNN performance benchmark and up to 9X faster in the Inception V3 CNN performance benchmark

                                                    Vision C5 DSP for Neural Networks Block Diagram

NN DSP vs. a NN Accelerator

Camera-based vision systems in automobiles, drones, and security systems require two fundamental types of vision-optimized computation. First, the input from the camera is enhanced using traditional computational photography/imaging algorithms. Second, NN-based recognition algorithms perform object detection and recognition. Existing NN accelerator solutions are hardware accelerators attached to imaging DSPs, with the NN code split between running some network layers on the DSP and offloading convolutional layers to the accelerator. This combination is inefficient and consumes unnecessary power.

Architected as a dedicated NN-optimized DSP, the Vision C5 DSP accelerates all NN computational layers (convolution, fully connected, pooling and normalization), not just the convolution functions. This frees up the main vision/imaging DSP to run image enhancement applications independently while the Vision C5 DSP runs inference tasks. By eliminating extraneous data movement between the NN DSP and the main vision/imaging DSP, the Vision C5 DSP provides a lower power solution than competing neural network accelerators. It also offers a simple, single-processor programming model for NNs.

Vision P5 and P6 DSPs

Vision P5 DSP Features and Benefits

  • Offers up to 13X vision-processing performance improvement over the previous-generation Vision DSP
  • Processes 7168 bits per cycle
  • Optional vector floating-point unit (VFPU) with single-precision 32-bit floating-point support offers flexibility to provide high-precision math at a minimal area penalty

                                                       Vision P5 DSP Block Diagram

Vision P6 DSP Features and Benefits

With new instructions, increased math throughput, and other enhancements the Vision P6 DSP sets a new standard in imaging and computer vision benchmarks, increasing the performance by up to 4X compared to the highly successful Vision P5 DSP. For NN applications, the Vision P6 DSP boosts performance by up to 4X with quadruple the available MAC horsepower, which is a major computation block for NN applications. Compared to commercially available GPUs, the Vision P6 DSP will achieve twice the frame rate at much lower power consumption on a typical NN implementation. For a wide range of other key vision functions, such as convolution, FIR filters, and matrix multiplies, the Vision P6 DSP increases performance by up to 2X with its improved 8-bit and 16-bit arithmetic.

  • Processes 9728 bits per cycle 
  • Offers 256 MACs: 4X compared to Vision P5 DSP
  • Enhanced instruction set and instruction slotting
  • Fully software compatible with Vision P5 DSP
  • Optional VFPU with single-precision 32-bit and/or half-precision 16-bit floating-point support offers performance and flexibility for porting existing GPU code

 

                                                       Vision P6 DSP Block Diagram

VFPU

The Vision P5 and P6 DSPs also provide an optional VFPU for those applications that need this precision or as a quick way to port existing code. The VFPU offers significant performance improvement with a very little area increase. The Vision P6 DSP offers optional support for a 32-way VFPU with half-precision (FP16) format.

Wide-Vector SIMD Data Processing for Superior Performance

The VLIW issue of vector operations gives an almost arbitrary mix of loads, stores, multiplies, and ALU operations, resulting in a rich set of pixel computations. Up to 320 operations can be issued per cycle and 256 of these can be ALU operations.

SuperGather

The Vision P5 and P6 DSPs also integrates the highly sophisticated Tensilica SuperGather™ technology, which provides the ability to quickly and efficiently read/write from non-contiguous local memory locations. The SuperGather unit enables the full utilization of the available SIMD capabilities for algorithms such as warping, lens distortion correction, and canny edge tracing.

Imaging Instructions

The Vision P5 and P6 DSPs include many imaging-specific operations that accelerate 8-, 16-, and 32-pixel data types and video operation patterns. Some examples of these instructions are arithmetic operations (ADD, SUB, COMPARE, MUL, DIVIDE), bit manipulation operations, and data reorganization operations. 

 Vision P5 DSPVision P6 DSP
Number of bits processed per cycle 7168 9728
MACs 64 256
16-bit (FP16) VFPU support (optional) No Yes
32-bit (FP16) VFPU support (optional) Yes Yes

Highly Energy Efficient

The Vision P5 and P6 DSPs are highly energy efficient compared to CPUs or GPUs for all kinds of pixel operations.

High Performance

The Vision P5 and P6 DSPs offer a 5-way VLIW architecture, where each VLIW slot can perform 64-way SIMD 8-bit operations. The Vision family is designed to provide 320 operations per clock cycle.

The Vision P6 DSP can achieve even higher efficiency with its wide SIMD multiply-accumulates, offering significantly enhanced performance for the pixel filtering and image-analysis features common in computer vision applications.

Libraries, Software and Third-Party Support

OpenCV-Like Library Support

The Vision P5 and P6 DSPs come with over 1000 OpenCV-like functions. These functions are highly optimized to achieve the best performance on these DSPs. While OpenCV has over 2500 functions, Cadence has chosen the most common 1000 functions to optimize. Cadence continues to add more functions with quarterly library updates.

OpenVX 1.1

The Vision P5 and P6 DSPs are the first imaging/vision DSPs to pass Khronos™ Group’s conformance tests for the OpenVX™ 1.1 specification. Application developers can now take advantage of  Vision P5 and P6 DSP functionality without detailed knowledge of the hardware architecture and still achieve high performance. Cadence provides an application programming kit (APK) that supports all 40 library functions required by OpenVX 1.1. All of these functions are already fully optimized on the Vision P5 and Vision P6 DSPs. Applications developed using the standard OpenVX 1.1 API can be compiled and run on Vision P5 and P6 DSPs without any code changes. Cadence's OpenVX framework automatically schedules and executes the appropriate DMA transfers for efficient memory access, and runs highly optimized DSP vision-processing kernels in parallel with the DMA transfers.

For more information on OpenVX 1.1 please contact us.

Neural Network Mapper Toolset

The Vision C5 and the Vision P6 DSPs also come with the Cadence neural network mapper toolset, which will map any neural network trained with tools such as Caffe and TensorFlow into executable and highly optimized code for the Vision C5 DSP, leveraging a comprehensive set of hand-optimized neural network library functions.

Rich Third-Party Application Software Support

Along with math library support, Cadence also supports a very rich set of third-party applications targeting the Vision DSP family. Some of these third-party companies offer video WDR, image stabilization, super resolution, CNN, and various ADAS applications. These applications are ported and optimized on our DSPs for fast time to market. 

See our list on our Partners page

Comprehensive Hardware and Software Design Tools

Our Proven, Comprehensive Hardware and Software Design Environment

Processor design process

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom processor or DSP along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software development process

For Software Developers

When you need to develop application code for a Tensilica processor, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. Cadence's Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

FPGA Platform

Cadence has developed a complete camera system, display system and Vision P6 DSP on a FPGA platform. The FPGA platform can be used to develop various vision and imaging applications. It has a CMOS sensor based camera connected over a MIPI interface and an LCD panel connected over another MIPI interface. It also has an HDMI input and output which provides a highly flexible platform for developing imaging and vision applications. Cadence has already developed various applications including face detection and people detection on this FPGA platform.

Vision DSP Family Literature and Other Resources

Documentation and Literature

Product Literature

Vision DSP Family Product Brief

White Paper

Choosing the Right DSP for High-Resolution Imaging in Mobile and Wearable Applications

Please contact us for datasheets and more relevant documentation.

Hardware/Software Design Tools

Xtensa Processor Developer's Toolkit

Xtensa Software Developer's Toolkit

 

Press Releases

Cadence Unveils Industry’s First Neural Network DSP IP for Automotive, Surveillance, Drone and Mobile Markets - Read press release and contact us for more information.

Cadence Tensilica Vision P-Series DSPs are Industry’s First Imaging/Vision DSPs Certified by Khronos as OpenVX 1.1 Conformant - Read press release.

Related Topics

Learn more about Convolutional Neural Networks (CNN) and download presentations from our Embedded Neural Network Summit.

Read Blogs on Vision DSP

 Vision C5 DSP for Standalone Neural Network Processing

 The Road Ahead for Neural Networks in Embedded Systems

 Q&A: Drones, Robots, and the New Tensilica Imaging/Vision DSP

Articles

BDTi: Next-Gen Cadence Tensilica Processor Core Claims Big Performance, Energy Consumptions Gains

EEJournal Chalk Talk: Cadence Tensilica Vision P5

Watch Videos on Vision

In this video, Pulin Desai talks about Cadence Demonstration of the AlexNet Convolutional Neural Network. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of Intelligent Image Up-Resolution. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
In this video, Pulin Desai talks about Cadence Demonstration of Stereo Camera-based Depth Mapping. This video is © 2017 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.
Chris Rowen, Chief Technical Officer at Cadence, presents Designing and Selecting Instruction Sets for Vision at the May 2015 Embedded Vision Summit. This video is © 2015 Embedded Vision Alliance and is used with permission. For more embedded vision information, please visit www.embedded-vision.com.