The Cadence Tensilica Vision DSPThe Cadence Tensilica Vision DSPThe Cadence Tensilica Vision DSP

Home: IP Portfolio > Tensilica IP > Vision DSPs for Imaging and Vision

Vision DSPs for Imaging and Vision



Today’s applications processors are not equipped to handle the complex embedded imaging and vision digital signal processing functions in mobile handsets, drone, automotive, robotics, surveillance, and augmented reality (AR) / virtual reality (VR) markets. The Cadence® Tensilica® Vision digital signal processor (DSP) family offers a much-needed breakthrough in terms of energy efficiency and performance that enables applications never before possible in a programmable device.

The Tensilica Vision DSP family offers three Vision products.

  • The Vision Q7 DSP is the latest DSP for embedded vision and AI built on a new, faster processor architecture. The Vision Q7 DSP offers up to 1.7X higher TOPS in the same area as Vision Q6 DSP. The Vision Q7 DSP offers up to 2X performance for vision/AI applications, including floating point. The Vision Q7 DSP delivers up to 2X performance on SLAM kernels with special ISA and special SLAM hardware package.
  • The Vision Q6 DSP is the latest DSP for embedded vision and AI built on a new, faster processor architecture. The fifth-generation Vision Q6 DSP offers 1.5X greater performance than its predecessor, the Vision P6 DSP, and 1.25X better power efficiency at the Vision P6 DSP’s peak performance.
  • The Vision P6 DSP, introduced in 2016, set a new standard in AI performance for a general-purpose embedded vision DSP by offering 4X the peak performance compared to the Vision P5 DSP.

The Tensilica Vision DSP family covers a wide range of markets. It offers general-purpose imaging and vision products that were designed for the complex algorithms in imaging and computer vision, including innovative multi-frame noise reduction, video stabilization, high dynamic range (HDR) processing, object and face recognition and tracking, low-light image enhancement, digital zoom, and gesture recognition, plus many more. The Tensilica Vision DSP family also offers outstanding performance while running AI.

Request Further Information

Offload the Host CPU for Intensive Vision and AI Apps

The Tensilica Vision DSP family offloads the host CPU for lower energy consumption running intensive imaging and vision apps. Multi-core host CPUs can’t handle these power-hungry, bandwidth-demanding applications, hardwired accelerators are restricted to a fixed set of functions, and GPUs offer pipelines that are not required or not efficient in image- and vision-processing applications. Now, the Tensilica Vision DSP family provides an imaging and vision specific programmable solution that is an ideal complement to the CPU/GPU. Imaging and vision algorithms can run on a DSP that’s specifically optimized for the imaging and vision functions required.

Vision DSP Family

Programmable and Customizable

The Tensilica Vision DSPs are synthesizable processors, with the configurability and extensibility that users have come to value from Cadence. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-, and 32-bit pixel processing for all Vision P6, Q6, and Q7 DSPs. The Tensilica Vision DSP family is available as licensable, synthesizable IP with rich libraries and advanced software tools, allowing you to write your code in C/C++—no assembly code required. The instruction set, memory system, and data types have all been optimized for high-throughput 8-, 16-, and 32-bit pixel processing. The Vision DSP family was also architected to be used in solutions requiring multiple Vision DSPs to provide higher performance if required.

Processor Optimization

Because the Vision DSP family is built on our proven Tensilica Optimization Platform, further optimizations can be made to target your specific application. Please see the Xtensa® section  for all of the options available. All processors come with a complete hardware design and matching software tools, including a mature, world-class auto-vectorizing compiler, a cycle-accurate SystemC®-compatible instruction set simulator (ISS), and a full industry-standard GNU toolchain.

Vision DSP Family 

Vision P6 DSP Vision Q6 DSP Vision Q7 DSP
Use Case Vision and AI up to 256GMAC/sec Vision and AI up to 384GMAC/sec Vision and AI up to 786GMAC/sec

MACs

8 x 8 256 512
8 x 16 128
16 x 16 64 128
VFPU 16b half precision 32-way SIMD (optional) 2X 32-way SIMD
32b single precision 16-way SIMD (optional) 2X 16-way SIMD
Max SIMD Width 64-way 8-bit
SuperGather Yes
Coefficient Decompression - Saves memory bandwidth Yes

AXI interface - More AXIs = less sharing and higher memory bandwidth

3 AXIs 5 AXIs 3 AXIs

Vision P6, Q6, and Q7 DSPs for Vision and AI

Vision Q7 DSP Features and Benefits

A deeper, 13-stage processor pipeline and system architecture designed for use with large local memories enable the Tensilica Vision Q7 DSP to achieve 1.5GHz peak frequency. As a result, designers using the Vision Q7 DSP can develop high-performance products that meet increasing vision and AI demands and power-efficiency needs.

The Vision Q7 DSP is specifically optimized for simultaneous localization and mapping (SLAM), a technique commonly used in the robotics, drone, mobile, and automotive markets to automatically construct or update a map of an unknown environment, and in the AR/VR market for inside-out tracking. The Vision Q7 DSP delivers up to 1.82 tera operations per second (TOPS), 1.7X higher TOPS compared to the Vision Q6 DSP in the same area. To address the increasing computational requirements for embedded vision and AI applications, the sixth-generation Vision Q7 DSP provides up to 2X greater AI and floating-point performance in the same area compared to its predecessor, the Vision Q6 DSP.

  • An enhanced instruction set supporting 8/16/32-bit data types and optional VFPU support for single- and half-precision enables up to 2X faster performance on SLAM kernels compared to the Vision Q6 and Vision P6 DSPs
  • Delivers up to 2X improvement in floating-point operations per mm2(FLOPS/mm2) for both single-precision (FP32) and half-precision (FP16) compared to the Vision Q6 and Vision P6 DSPs 
  • Up to 2X greater AI performance in the same area compared to the Vision Q6 DSP results in up to 2X improvement in GMAC/mm2compared to the Vision Q6 DSP
  • Support for on-the-fly decompression of weights

For AI applications, the Vision Q7 DSP provides a flexible solution delivering 512 8-bit MACs, compared to 256 MACs for the Vision Q6 DSP. For greater AI performance, the Vision Q7 DSP can be paired with the Tensilica DNA 100 processor. In addition to computational performance, the Vision Q7 DSP boasts a number of iDMA enhancements including 3D DMA, compression, and a 256-bit AXI interface. The Vision Q7 DSP is a superset of the Vision Q6 DSP, which preserves customers’ existing software investment and enables an easy migration from the Vision Q6 or Vision P6 DSPs. 

 Vision Q7 DSP Block Diagram

Vision Q6 DSP Features and Benefits

A deeper, 13-stage processor pipeline and system architecture designed for use with large local memories enable the Vision Q6 DSP to achieve 1.5GHz peak frequency and 1GHz typical frequency at 16nm, in the same floorplan area as the Vision P6 DSP. As a result, designers using the Vision Q6 DSP can develop high-performance products that meet increasing vision and AI demands and power-efficiency needs.

  • An enhanced DSP instruction set results in up to 20 percent fewer cycles than the Vision P6 DSP for embedded vision applications/kernels such as Optical Flow, Transpose, and warpAffine, and for commonly used filters such as Median and Sobel
  • 2X system data bandwidth with separate master/slave AXI interfaces for data/instructions and 2-channel DMA alleviates memory bandwidth challenges in vision and AI applications, and also reduces latency and overhead associated with task switching and DMA setup
  • Backwards compatibility with the Vision P6 DSP, so customers can preserve their software investment for an easy migration 
  • Optional vector floating-point unit (VFPU) also supports half precision (FP16)

Vision Q6 DSP Block Diagram

Vision P6 DSP Features and Benefits

With new instructions, increased math throughput, and other enhancements, the Vision P6 DSP sets a new standard in imaging and computer vision benchmarks, increasing the performance by up to 4X compared to the highly successful Vision P5 DSP. For AI applications, the Vision P6 DSP boosts performance by up to 4X with quadruple the available MAC horsepower, which is a major computation block for AI applications. Compared to commercially available GPUs, the Vision P6 DSP will achieve twice the frame rate at much lower power consumption on a typical AI implementation. For a wide range of other key vision functions, such as convolution, FIR filters, and matrix multiplies, the Vision P6 DSP increases performance by up to 2X with its improved 8-bit and 16-bit arithmetic.

  • Processes 9728 bits per cycle 
  • Offers 256 MACs: 4X compared to Vision P5 DSP
  • Enhanced instruction set and instruction slotting
  • Fully software compatible with Vision P5 DSP
  • Optional VFPU with single-precision 32-bit and/or half-precision 16-bit floating-point support offers performance and flexibility for porting existing GPU code


Vision P6 DSP Block Diagram

VFPU

The Vision P6, Q6, and Q7 DSPs provide an optional VFPU for those applications that need this precision or as a quick way to port existing code. The VFPU offers significant performance improvement with a very little area increase. The Vision P6 and Q6 DSPs offer optional support for a 32-way VFPU with half-precision (FP16) format and a 16-way VFPU with full-precision format. The Vision Q7 DSP doubles both single-precision and half-precision vFPU MAC operations compared to the Vision Q6 DSP.

Wide-Vector SIMD Data Processing for Superior Performance

The VLIW issue of vector operations gives an almost arbitrary mix of loads, stores, multiplies, and ALU operations, resulting in a rich set of pixel computations. Up to 320 operations can be issued per cycle and 256 of these can be ALU operations.

SuperGather

The Vision P6, Q6, and Q7 DSPs integrate the highly sophisticated Tensilica SuperGather technology, which provides the ability to quickly and efficiently read/write from non-contiguous local memory locations. The SuperGather unit enables the full utilization of the available SIMD capabilities for algorithms such as warping, lens distortion correction, and canny edge tracing.

Imaging Instructions

The Vision Q6, P6, and P5 DSPs include many imaging-specific operations that accelerate 8-, 16-, and 32-pixel data types and video operation patterns. Some examples of these instructions are arithmetic operations (ADD, SUB, COMPARE, MUL, DIVIDE), bit manipulation operations, and data reorganization operations.

Highly Energy Efficient

The Vision P6, Q6, and Q7 DSPs are highly energy efficient compared to CPUs or GPUs for all kinds of pixel operations.

High Performance

The Vision P6, Q6, and Q7 DSPs offer a 5-way VLIW architecture, where each VLIW slot can perform 64-way SIMD 8-bit operations. The Vision family is designed to provide 320 operations per clock cycle.

The Vision P6, Q6, and Q7 DSPs can achieve even higher efficiency with its wide SIMD multiply-accumulates, offering significantly enhanced performance for the pixel filtering and image-analysis features common in computer vision applications.

Libraries, Software and Third-Party Support

OpenCV-Like Library Support

The Vision P6, Q6, and Q7 DSPs come with over 1700 OpenCV-like functions. These functions are highly optimized to achieve the best performance on these DSPs. While OpenCV has over 2500 functions, Cadence has chosen the most common 1700 functions to optimize. Cadence continues to add more functions with quarterly library updates.

OpenVX 1.1

The Vision P6 and Q6 DSPs are the first imaging/vision DSPs to pass Khronos™ Group’s conformance tests for the OpenVX™ 1.1 specification. Application developers can now take advantage of  Vision P5 and P6 DSP functionality without detailed knowledge of the hardware architecture and still achieve high performance. Cadence provides an application programming kit (APK) that supports all 40 library functions required by OpenVX 1.1. All of these functions are already fully optimized on the Vision P6 and Q6 DSPs. Applications developed using the standard OpenVX 1.1 API can be compiled and run on Vision P6 and Q6 DSPs without any code changes. Cadence's OpenVX framework automatically schedules and executes the appropriate DMA transfers for efficient memory access, and runs highly optimized DSP vision-processing kernels in parallel with the DMA transfers.

For more information on OpenVX 1.1 please contact us.

AI Software Support

The Vision P6, Q6, and Q7 DSPs support AI applications developed in the Caffe, TensorFlow, and TensorFlowLite frameworks through the Tensilica Neural Network Compiler. The Tensilica Neural Network Compiler maps neural networks into executable and highly optimized high-performance code for the target DSP, leveraging a comprehensive set of optimized neural network library functions. The Vision P6, Q6, and Q7 DSPs also support the Android Neural Network API (ANN) for on-device AI acceleration in Android-powered devices.

Rich Third-Party Application Software Support

Along with math library support, Cadence also supports a very rich set of third-party applications targeting the Vision DSP family. Some of these third-party companies offer video WDR, image stabilization, super resolution, CNN, and various ADAS applications. These applications are ported and optimized on our DSPs for fast time to market. 

See our list on our Partners page

Comprehensive Hardware and Software Design Tools

Our Proven, Comprehensive Hardware and Software Design Environment

Processor design process

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom processor or DSP along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software development process

For Software Developers

When you need to develop application code for a Tensilica processor, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. Cadence's Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

FPGA Platform

Cadence has developed a complete camera system, display system and Vision P6 DSP on a FPGA platform. The FPGA platform can be used to develop various vision and imaging applications. It has a CMOS sensor based camera connected over a MIPI interface and an LCD panel connected over another MIPI interface. It also has an HDMI input and output which provides a highly flexible platform for developing imaging and vision applications. Cadence has already developed various applications including face detection and people detection on this FPGA platform.

Vision DSP Family Literature and Other Resources

Press Releases

New Cadence Tensilica Vision Q7 DSP IP Doubles Vision and AI Performance for Automotive, AR/VR, Mobile and Surveillance Markets

ArcSoft and Cadence Partner to Develop AI and Vision Applications

Cadence Boosts Vision and AI Performance with New Tensilica Vision Q6 DSP IP

Cadence Unveils Industry’s First Neural Network DSP IP for Automotive, Surveillance, Drone and Mobile Markets

Cadence Tensilica Vision P-Series DSPs are Industry’s First Imaging/Vision DSPs Certified by Khronos as OpenVX 1.1 Conformant

Related Topics

Learn more about Convolutional Neural Networks (CNN) and download presentations from our Embedded Neural Network Summit.

Chalk Talk

Podcast

Listen to Pulin Desai, starting from 4:30 of the podcast, on why the Tensilica Vision Q7 DSP is excellent for processing SLAM algorithms.

Read Blogs on Vision DSP

 Vision Q7 DSP: Real-Time Vision and AI at the Edge

 A New Era Needs a New Architecture: The Tensilica Vision Q6 DSP

 The Road Ahead for Neural Networks in Embedded Systems

 Q&A: Drones, Robots, and the New Tensilica Imaging/Vision DSP

Articles

EETimes: Tensilica’s New Vision/AI DSP Guns for SLAM

AnandTech: Cadence Announces Tensilica Vision Q7 DSP

SemiWiki: CPU, GPU, H/W Accelerator or DSP to Best Address CNN Algorithms?

eeNews Europe: Neural Network processor boosts performance of radar, lidar, vision applications

Embedded Computing Design: Neural net DSP IP pushes the performance envelope

Electronic Design: DSP Takes on Deep Neural Networks

BDTi: Next-Gen Cadence Tensilica Processor Core Claims Big Performance, Energy Consumptions Gains

EEJournal Chalk Talk: Cadence Tensilica Vision P5

Watch Videos on Vision