Home: IP Portfolio > Tensilica IP > Xtensa Customizable Processors

Xtensa Customizable Processors

Xtensa DPUs excell in dataplane processingMake a Dataplane Processor Uniquely Your Own

What are the WOW factors you need in your SoC design? For next-generation mobile devices and home entertainment products, you need efficient, high-performance functional blocks that are programmable to keep up with the latest standards. Use our proven, automated processor generator to customize a Cadence® Tensilica® Xtensa® DPU, and create more competitive and differentiated features with the lowest possible power.

  • Create a single product for multiple markets.
  • Reduce development time and cost by using pre-verified DPUs instead of custom logic.
  • Extend product life cycles. Change the software to add new functions without a re-spin.

See How Customizable Processors Can Help Offload Your Apps Processor

Features of a Great Architecture with the Widest Range of Customization Options

The Best Architecture and Widest Range of Customization Options

The Xtensa architecture is extremely flexible, by design. You can use an Xtensa DPU as anything from a small, low-power cache-less controller to a high-performance 16-way SIMD, 3-issue VLIW DSP core. Take advantage of these approaches to make Xtensa processors uniquely your own:

Configurations

Configurability

Xtensa offers a menu of checkbox and drop-down options so you can pick just the features you need. Once you've determined the best implementation, our automated Xtensa Processor Generator creates, in a matter of minutes, pre-verified RTL and a complete matching software toolchain, including models for system integration and EDA scripts for production.

Extensibility

Extensibility

Add your own instructions, registers, register files, and much more using the Tensilica Instruction Extension (TIE) methodology. You can specify the functional behavior of the new data path elements in our Verilog-like TIE language, and the RTL and tool chain will be generated for you automatically.

The Most Flexible and Easy-to-Use Customization Options

Automatically Generated Hardware with Matching Software Tool Chain

Use our Eclipse-based integrated development environment (IDE) to create and test out your customizations. When you’re ready for production, the Xtensa Processor Generator automatically creates pre-verified RTL and a complete software tool chain, including a compiler, debugger, insruction set simulator, profiler, power estimator, system models, EDA tool scripts, and more. Your complete development toolchain is automatically adapted to all options and any custom extensions.

Xtensa ISA—Optimized for the Dataplane

The Xtensa instruction set architecture (ISA) is designed to meet the diverse requirements of dataplane processing. This 32-bit architecture features a compact 16- and 24-bit instruction set with modeless switching for maximum power efficiency and performance. The base architecture has 80 RISC instructions and includes a 32-bit ALU, up to 64 general-purpose 32-bit registers, and six special-purpose registers. Using this architecture, you can expect significant code size reductions that result in higher code density and better power dissipation.

Customize and Differentiate Your Design

You can start with a base Xtensa processor, less than 20K gates, and add what you need to customize and differentiate your design. Many high-level building blocks such as HiFi audio, ConnX DSPs, floating point, and Linux MMU are available as pre-designed blocks. Just click to add an option to your processor design. You can fine-tune performance, power, and area by simply selecting the size, type, width, and access latency of memories. You can also set load/store unit characteristics, select the number of general-purpose registers and the number and priority level of interrupts, and much more.

Our automated tools help you make smart decisions about what to change and what not to change in your design in order to meet all of your performance, power, and area requirements. Your changes can easily and immediately be tested so you can see the results—without all the guesswork.

Customization Using a Simple Verilog-Like Language

Using our Tensilica Instruction Extension (TIE) language, you can improve the performance of your application by creating one TIE instruction that does the work of multiple instructions of a general-purpose processor. Several techniques can be used to combine multiple operations into one. Using TIE you can add inputs and outputs, scratchpad memories, simple single- or multi-cycle instructions, SIMD for vectorization, or our Flexible Length Instruction Extensions (FLIX) for parallel operations.

Accelerate Hot Spots in Applications

You don't have to go to higher MHz to improve performance. By adding instructions in our Verilog-like language (TIE), it's possible to accelerate hot spots in your applications. You can pump data through our cores with up to two 512-bit-wide data load/stores per cycle, or bypass the bus entirely with our unique GPIO and FIFO queues.

Reduce Verification Time and Effort in the Dataplane

You can significantly reduce verification time and effort using an Xtensa DPU to map the control FSM to software on the processor instead of RTL for new blocks. An Xtensa DPU delivers automatic RTL generation with fine-grained clock gating, saving you from months of design effort in RTL. And DPUs can be reprogrammed to adapt to upgrades and bugs in algorithms—no hardware change required. You can also create datapaths similar to hardwired using multi-cycle, complex functional units, and build custom, high-bandwidth data/control connections to other blocks with predictable latencies.

Preserve Backwards Compatibility

All Tensilica processors use a common base architecture that assures you of backward compatibility. This highly-efficient 32-bit RISC/DSP architecture has a base configuration under 20K gates. Our base instruction set includes powerful branch instructions including compare and branch and zero-overhead loops. For bit manipulation, funnel shift, bit test and branch, and field extract operations are available.

The Xtensa architecture is flexible by design. You can use Xtensa processors as anything from a small, low-power cache-less controller to a high-performance 64-MAC DSP core. Configurability of an Xtensa processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third-party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes, and ICE solutions, and always come with an automatically generated, complete software development toolchain.

Complete with Matching Software Tool Chain

To ensure the availability of a robust ecosystem of third-party application software and development tools, Xtensa DPUs are always compatible with major operating systems, debug probes, and ICE solutions, and always come with an automatically generated, complete software development toolchain that matches all configuration options and added instructions.

Highest Code Density

Our 24/16-bit ISA is 25-50% smaller than 32/16-bit architectures, giving you an immediate head start with the best code density. When you decide to use our VLIW capabilities, you can use up to 128-bit-wide instruction without the code bloat of conventional VLIW processors because only those specified instructions are that wide.

Low Power

Xtensa DPUs consistently consume less power than other licensable embedded CPUs at equivalent gate counts. To reduce power consumption, DPUs employ techniques that are either built into the base hardware or into the configuration options, giving you more control over your system and memory resources.

Innovative I/O Bypasses the Bus for Maximum Speed

With Xtensa DPUs, you are no longer limited to the processing that can go through the system bus. An Xtensa DPU can quickly communicate control and status information or transfer streaming data without buffering. No load/store required. Our unique RTL-like ports act like GPIO and are wires that directly connect two Xtensa processors or an Xtensa processor to external RTL. Input and output queues act like FIFOs. With their high bandwidth and low control overhead, queues allow the Xtensa LX DPU to be used in applications with extreme data rates. If you need a high-speed interface to memory, check out our Lookup interfaces for connecting RAMs for table lookups or connecting long-latency hardware computation units without going through the bus.


FLIX for Parallel Execution

The FLIX architecture turns the Xtensa LX into a VLIW processor that executes 2 to 30 parallel execution units when needed. Wide 32/64/128-bit FLIX instruction formats are seamlessly intermixed with the base Xtensa 16/24-bit instructions so there is no mode switch penalty.

With FLIX, the Xtensa LX processor can deliver the ultra-high performance characteristics of an ultra-wide insruction word processor without the negative code size implications typically found in VLIW or UVLIW processors. In fact, Xtensa LX processors with FLIX can deliver higher performance and smaller code size at the same time. This performance comes with very little overhead, adding only 2,000 gates to processor size for instruction decode and control.

 

Features Comparison

What are the differences between Xtensa 10 and Xtensa LX5?

Xtensa 10 and Xtensa LX5 Comparison Chart

 Xtensa 10Xtensa LX5
MAJOR ISA CONFIGURATION OPTIONS
Max16 Yes Yes
MUL16/MUL32 Yes Yes
IEEE 754-compliant floating point Yes Yes
Vectra LX Not Available Yes
HiFi audio/voice DSPs Not Available Yes
ConnX D2 DSP engine Not Available Yes
ConnX BBE16, BBE32, and BBE64 Not Available Yes
Linux MMU Yes Yes
PIPELINE/ARCHITECTURE OPTIONS
Pipeline Stages 5 5/7
FLIX Technology Not Available Yes
GPIO32 option (two 32-wire ports) Yes Yes
QIF32 option (two 32-bit queue interfaces) Yes Yes
PROCESS INTERFACE OPTIONS
PIF and XLMI Yes Yes
Load/store units One One or Two
Designer-defined Ports and Queues Limited Config Options Yes
OTHER OPTIONS
3-way 64-bit VLIW configuration Not Available Yes
5- or 7-stage pipeline depth Not Available Yes
Second load/store unit option Not Available Yes
Lookup interfaces to connect to RAMS Not Available Yes
Virtually unlimited bandwidth I/O options Not Available Yes
ARM® CoresightTM-compatible debug Yes Yes
Memory bank RAM support Yes Yes
Dual load/store with caches Not Available Yes
Multiple FLIX widths supported Not Available Yes
Performance counters Yes Yes
Dynamic and leakage power reduction features Yes Yes

 

Customize to Get Highest Performance, Lowest Power, Most Efficient Processor for Your Application

Discover How Easy It Is to Customize Xtensa Processors

Markets

You can use Xtensa DPUs as 32-bit RISC controllers with minimal customization for memories and interfaces. Or you can join other designers who are taking advantage of the incredible possibilities beyond simple customizations. See our Features page to explore the many options to unleash the power of Xtensa processors as DSPs or to enable other functions to match your requirements.

By selecting and configuring pre-defined elements of the architecture and by inventing completely new instructions and hardware execution units, your Xtensa DPU can deliver performance levels that are orders of magnitude more efficient than other 32-bit processors. And you can do this in a fraction of the time it takes to develop and verify an RTL-based solution.

The Cadence family of Tensilica processors is designed from the start to be basic building blocks in system-on-a-chip (SoC) designs.

Consume Less Power by Adding Custom Instructions

Tensilica DPUs can deliver performance comparable to an RTL accelerator block while running at low operating frequencies, thus consuming less power.

A focus on total energy consumption is key. A designer can add a few custom instructions and that extension will increase the DPU’s size, which in turn increases the power dissipation per clock cycle (increase in the mW/MHz). However, if the custom instructions dramatically cut the total clock cycles required to perform a given workload (the target C-code application), then the total energy consumed (power-per-cycle multiplied by total cycle time) can be substantially reduced.

Example: a 20% increase in power dissipated per clock cycle, offset by a 3X speed up in task execution, actually reduces energy consumed by 60%.

Many Benefits of Using a Unique DPU Design

By using a unique DPU, you make it much harder for competitors to copy your ideas. You get a version of a processor that no one else can buy. No one else can get the matching software tool chain unless you provide it to them so no one can program the processors in your ASIC unless you allow it. In addition, your optimized DPU will deliver better performance, operate at lower clock rates, and consume less energy than the industry-standard, fixed-ISA microprocessor cores.

DPUs for DSP

Many designs use a standard 32-bit processor coupled with a separate core to accelerate digital signal processing (DSP). However, using two processors means that data must transfer between the processor and DSP core over some sort of interconnect, usually a standard bus, which slows performance.

Xtensa DPUs don’t require a separate DSP core because DSP functions can be built into the processor itself, eliminating inter-processor data transfers over a slow processor bus. See our Audio, Voice and Speech section and our Baseband and RF Signal Processing section for examples of how we've customized our Xtensa processors for these intensive DSP functions.

DPUs as RTL Alternatives

DPUs can be used as alternatives to hand-coded RTL blocks by adding the same datapath elements as implemented in RTL accelerator blocks. These datapath elements include deep pipelines, parallel execution units, task-specific state registers, and wide data buses to local and global memories. This allows DPUs to sustain the same high computation throughput and to support the same data interfaces as RTL hardware accelerators.

However, control of DPU datapaths is very different from their RTL counterparts. Cycle-by-cyle control of a DPU’s datapaths is not frozen in the hardware FSM’s state transitions. Instead, the FSM is implemented in firmware, which greatly reduces the effort needed to fix an algorithm bug or add new features. In a firmware-controlled FSM, control-flow decisions occur in branches, load and store operations implement memory accesses, and computations become explicit sequences of general-purpose and application-specific instructions.

An Automated Development Process Speeds Customization

Cadence has fine-tuned the patented Tensilica  processor customization process, making it as foolproof and secure as possible.

Innovative I/O

Automated Customization Processor Overview

Fully Automated Hardware and Software Tools Generation

Use our Eclipse-based integrated development environment (IDE) to create and test out your customizations. When you’re ready for production, our Xtensa Processor Generator automatically creates pre-verified RTL and a complete software tool chain, including a compiler, debugger, insruction set simulator, profiler, power estimator, system models, EDA tool scripts and more.

Your complete development toolchain is automatically adapted to all options and any custom extensions.

Highly Automated Design Tools Speed Your Adoption and Integration Processes

The Industry's Most Powerful and Complete Design Environment



For Processor Designers

Cadence Tensilica tools automate the process of generating a custom Xtensa DPU along with matching software tools. These patented tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, we have the tools you need to create successful products.

View the complete set of tools for processor designers.

For Software Developers

When you're ready to develop application code for an Xtensa DPU, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. The Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

Literature and Other Resources—Learn More About Xtensa Processors

If You're Considering Xtensa DPUs for Your Next SoC Design and Want to Learn More, Read On.

Explore these helpful resources:

Product Literature

Xtensa 11 Data Sheet
Xtensa LX6 Data Sheet

Hardware/Software Design Tools

Xtensa Processor Developer's Toolkit
Xtensa Software Developer's Toolkit

White Papers