Home: Knowledge Center > Tensilica Processors > Benefits of Customization

Benefits of Customization

How You Will Benefit by Customizing Your DPUs

More than 1000 different DPU designs have been put into production using the Cadence® Tensilica® automated processor-generation system. Here's what our customers have told us are the major benefits of customizing their DPUs:

Differentiation

You make the changes. The DPU production process is totally automated, so no one else (not even Tensilica employees) see your option choices or the TIE instructions you add. You get a DPU that's yours and yours alone. It's not just like the CPU or DSP your competitor just licensed. And it will be virtually impossible for anyone to copy, making your design very secure.

You also get a full software tool chain, totally matched to all of the optimizations you made to your DPU. No one else can get the matching software tool chain unless you provide it to them, so no one can program the processors in your SoC unless you allow it. This gives you both differentiation and product control.

Reduced Time to Market

You get to market much faster using Tensilica DPUs no matter how you compare them to other solutions.

Compared to RTL design: It only takes minutes to make simple option choices for your Tensilica DPU. Or you can spend more time, checking the effects of different changes on your required performance, power, and area. Any way you look at it, customizing a Tensilica DPU is much faster than designing a new RTL block for the same function. On top of the shorter initial design time, the verification time is cut even more. Every Tensilica DPU comes with pre-verified RTL. You only need to confirm that the functionality matches your specification.

Compared to standard CPUs and DSP cores: If your standard processor is not customized, you're probably not getting the best possible performance, power, and area. So you need to offload certain functions to RTL blocks. To design those blocks you run into the same time challenges mentioned above.

Flexibility

Instead of a hard-wired block, you have a programmable processor-based solution, so you can make changes, even after tapeout, via the software.

DPUs can be used instead of RTL blocks by adding the same datapath elements as implemented in RTL accelerator blocks. These datapath elements include deep pipelines, parallel execution units, task-specific state registers, and wide data buses to local and global memories. This allows DPUs to sustain the same high computational throughput and support the same data interfaces as RTL hardware accelerator blocks.

The big difference is in the control of the datapaths. With RTL, you freeze the control in the FSM (finite state machine). In a Tensilica DPU, the processor-based FSM is implemented in firmware, giving you maximum flexibility to add new features or make necessary adjustments.

Best Performance, Power, and Area

Optimizing your DPU enables much more efficient implementation than standard CPUs and DSPs—often 10X or more. Designers can add precisely the computing resources they need to achieve the desired algorithmic performance—nothing more, nothing less. Because Cadence's Tensilica DPUs were designed for the fastest possible data processing, the performance increases can be amazing because we allow the data to bypass the main system bus and stream right into the processor's execution units.

Performance improvements have very beneficial effects on overall power consumption and area. A designer can add a few custom instructions to marginally increase the core's size, which in turn marginally increases the average power dissipation per clock cycle. However, if that custom instruction dramatically cuts the total clock cycles required to perform a given workload, then the total energy consumed (power-per-cycle multiplied by total cycle time) can be substantially reduced.

Example: A 20% increase in power dissipated per clock cycle, offset by a 3X speed-up in task execution, actually reduces energy consumption by 60%. The reduction in required task-execution cycles allows the system either to spend much more time in a low-power sleep state or to reduce the processor’s clock frequency and core operating voltage, leading to further reductions in both dynamic and leakage power.