Home: IP Portfolio > Tensilica IP > DNA Processor Family for On-Device AI

Tensilica DNA Processor Family for On-Device AI

Built for AI processing with industry-leading performance and power efficiency

Enabling On-Device AI Across a Wide Range of Inference from 0.5 to 100s of TMACs

Neural networks are now being developed and deployed in a wide range of markets, from IoT to surveillance to automotive. The computational, power, and memory requirements to process this data are continuously increasing, with new networks and new ways to approach deep learning every day. Optimized for vision, audio, radar/lidar, and fused-sensor applications, the Cadence®Tensilica® DNA processor family offers a much-needed breakthrough in terms of energy efficiency and performance to meet the requirements of on-device artificial intelligence (AI).

Our newly announced DNA 100 processor is well suited for on-device neural network inference applications spanning autonomous vehicles (AVs), ADAS, surveillance, robotics, drones, augmented reality (AR) /virtual reality (VR), smartphones, smart home and IoT. The DNA 100 processor delivers up to 4.7X better performance and up to 2.3X more performance per watt compared to other solutions with similar multiplier-accumulator (MAC) array sizes.

Tensilica DNA Processor Family

Tensilica DNA 100 Processor

DNA 100 Processor

The Tensilica DNA 100 processor is an easily scalable processor comprising of specialized hardware engines and a tightly coupled Tensilica DSP. Deep neural networks are constituted of inherent sparsity (presence of zeros) in both weights and activations. The DNA 100 processor’s specialized hardware engines eliminate both loading and storing of zeros and applying compute on them, allowing this sparsity to be leveraged for power efficiency, bandwidth, and compute reduction. Retraining of neural networks can further increase the sparsity in the networks and achieve maximum performance from the DNA 100 processor’s sparse compute engine. This enables the DNA 100 to leverage sparsity for performance boost (through compute reduction), enhanced power efficiency and bandwidth reduction. Retraining of neural networks can further increase the sparsity in the networks and achieve maximum performance from the DNA 100 processor’s sparse compute engine. As a result, the DNA 100 processor delivers both high performance and power efficiency across a full range of compute from 0.5 TeraMAC (TMAC) to 100s of TMACs.

 

Key Features

  • Specialized design to take advantage of sparsity in weights and activations for compute and bandwidth reduction
  • Industry-leading performance and power efficiency
    • Up to 4.7X performance for similar array sizes
    • Up to 2.3X power efficiency for similar array sizes
  • Wide and configurable AXI bus width to sustain various ranges of neural network tasks
  • Generic AI processor that runs all neural network layers
  • Architected to serve wide range of compute requirements 
    • Single DNA 100 processor scales from 0.5 to 12 effective 8b TMACs 
    • Multiple DNA 100 processors can be stacked to achieve 100s of TMACs 
  • Programmable and extensible 
    • Flexible and future proof
  • Complete AI software platform 
    • Tensilica Neural Network Compiler for offline automated code generation
    • Android Neural Network API (ANN) support for dynamic automated network deployment

 

Performance

  • Industry-leading performance and power efficiency 
    • 2550FPS for ResNet50 in a 4K MAC base array configuration 
    • Up to 3.4TMACs/W in 16nm 
  • Architected to serve wide range of compute requirements
    • Scalable from 0.5 to 100s TMACs

 

Tensilica Vision C5 DSP

Tensilica Vision C5 DSP

 

The Vision C5 DSP, introduced in 2017, is the industry’s first standalone, self-contained AI DSP IP core optimized for vision, radar/lidar, and fused-sensor applications with high-availability AI computational needs. Targeted for the automotive, surveillance, drone, and mobile/wearable markets, the Vision C5 DSP offers 1TMAC/sec computational capacity to run all AI computational tasks.

Key Features

  • 1TMAC/s computational capacity in less than 1mm2 silicon area provides very high computation throughput on deep learning kernels 

  • 1024 8-bit MACs or 512 16-bit MACs for exceptional performance at both 8-bit and 16-bit resolutions 

  • VLIW SIMD architecture with 128-way, 8-bit SIMD or 64-way, 16-bit SIMD 

  • Architected for multi-core designs, enabling a multi-TMAC solution  

  • Integrated iDMA and AXI4 interface 

  • Uses the same proven software toolset


Software Ecosystem

Tensilica DNA Processor family is enabled with a complete AI software platform to simplify and expedite the development and deployment of neural networks on DNA processors. Based on application, Tensilica sees a need for both offline and online code generation tools. For pre-defined networks with need for most optimum solution, Tensilica provides our offline code generation tool: Tensilica Neural Network Compiler. For dynamic app development and most convenient porting solution, Tensilica provides the online code generation tool: Tensilica Android Neural Network (ANN) API.

 

Tensilica Neural Network Compiler

Tensilica Neural Network Compiler is an offline code generator to automatically map pre-trained neural networks into highly optimized platform specific executable. It delivers superior performance on our processors in a minimal amount of time, resulting in speedy development for our customers.

  

Key Features

  • Supports wide range of frameworks, network types and layers 

  • Enables customer specified layer definition through custom layer support  

  • Tensilica Neural Network Compiler takes in pre-trained floating-point models and convert them into quantized fixed-point code (8b data and weights) 

  • Tensilica Neural Network Compiler's custom quantization technique achieves negligible accuracy loss over wide range of networks 

  • Uses target specific optimized NN functions for convolution and non-convolution layers 

  • Inclusion of performance enhancement features like selection of most optimum library function, kernel fusion, kernel rejection, DMA and tile management 

 

Android Neural Network API

The Android Neural Network API (ANN) is a dynamic code generator enabling easy deployment of neural networks on Android platforms. ANN efficiently distributes workload across all available devices on a system, enabling app developers to directly use higher level frameworks or libraries to deploy trained models on device.

  

Key Features

  • Executes graph/sub-graph/layer 

  • Gets best runtime optimization using tile, DMA management, data rearrangement 

  • Use of hand-optimized ML library enables highest performance at lowest power penalties 

 

Our Proven, Comprehensive Hardware and Software Design Environment

Processor design process

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom processor or DSP along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software development process

For Software Developers

When you need to develop application code for a Tensilica processor, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. Cadence's Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

Literature and Other Resources 

Product Literature 

White Paper 

Hardware/Software Design Tools 

Press Releases 

Related Topics 

Learn more  about Neural Networks (NN) and download presentations from our Embedded Neural Network Summit. 

 

Read Blogs

Articles 

Electronic Design:  DSP Takes on Deep Neural Networks