The Cadence Tensilica ConnX D2 DSP provides approximately 20% higher performance than similar dual-MAC architectures. You benefit from the flexibility of C programming with assembly-level performance. It's an ideal solution for wireless communications, disk drives (including SSD), home entertainment devices, and computer peripherals—anything that requires a highly efficient 16-bit fixed-point DSP.
The ConnX D2 DSP adds dual 16-bit MAC units and a 40-bit register file to the base Tensilica processor. It utilizes two-way SIMD (single instruction, multiple data) instructions to provide high performance on vectorizable C code. It also delivers dual-MAC performance using 64-bit very long instruction word (VLIW) instructions for code that cannot be vectorized.
The ConnX D2 DSP is tightly integrated with advanced Tensilica XCC compiler technology. The XCC compiler efficiently maps C algorithms to the ConnX D2 ISA (instruction set architecture) from native C and C intrinsic code, removing the need for time-consuming assembly code optimization.
Many high-performance DSPs are large SIMD engines that run vector data through at maximum bandwidth. These DSPs rely upon compiler vectorization of C code to hit their peak performance levels. However, if a loop isn't vectorizable, then the SIMD engine degenerates into a single-MAC DSP, and non-vectorizable code is commonplace.
By contrast, the dual MACs in ConnX D2 DSPs can be fully saturated with either SIMD instructions or VLIW instructions, delivering maximum performance on all types of C code.
The ConnX D2 instruction set is specifically optimized for the demanding numeric computations required for DSP, with 275 DSP-specific optimizing instructions. ConnX D2 DSPs efficiently perform 16-, 32-, and 40-bit fixed point additions, subtractions, and multiplies with rounding and saturation. It uses seven DSP-centric addressing schemes and adds data manipulation instructions including shifting, swapping, and logical operations to provide outstanding performance on DSP algorithms.
In addition to supporting major DSP addressing modes, ConnX D2 DSPs add specific DSP acceleration instructions such as Add-Compare-Exchange (used with Viterbi algorithms), Add Modulo, and Add Subtract. Additional instructions perform vector base loads and stores to support multiple data widths and SIMD data register loading orders, which can be aligned or unaligned.
ConnX D2 DSPs use dual-port write technology that allows two results to be written to the register files in one instruction. This can give a maximum of three writes to the register files per cycle within the VLIW implementation.
Designers can further customize and optimize the ConnX D2 DSP using the flexible Xtensa Processor Generator (XPG). You can easily add multi-cycle execution units, registers, register files, and much more. You can also choose from a wide range of pre-verified configuration options.
The ConnX BBE16 Baseband Engine combines an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces. This high-performance Cadence Tensilica DSP is built around a core vector pipeline made of 16 18bx18b MACs. These multipliers and associated adder and multiplexer trees enable operations such as FFT butterflies, parallel complex multiple operations, and signal filter structures. The results of these operations can be full precision or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations.
The instruction set has been optimized for performance of DSP kernel operations such as FFT and FIR as well as matrix multiplies. Acceleration has been added for a wide range of key wireless functions to deliver very high performance in wireless applications.
The ConnX BBE16 DSP is optimized for wireless communication, particularly in LTE and 4G cellular radios and multi-standard broadcast receivers. The high computation requirements of these applications require innovative architectures with a high degree of parallelism and efficient I/O. The ConnX BBE16 DSP meets these needs by combining an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces.
Like all ConnX DSPs, the ConnX BBE16 DSP is fully programmable in C with a vectorizing compiler. Automatic vectorization of scalar C and full support for vector data types allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.
The ConnX BBE16 DSP is an option for the Xtensa LX processor. It adds a highly customized DSP and baseband instruction set.
A wide variety of Load/Store operations support nine different addressing modes with support for 16b/32b scalar and vector data types. Unaligned Load/Stores with masking deliver full bandwidth Loads and Stores for unaligned data. Vector data management is supported with data packing and shifting.
Multiply operations include complex and scalar 18bx18b multiply, multiply-round, multiply-add, and multiply-subtract functions. Complex-number functions include support for conjugate arithmetic and magnitude operations as well as full precision arithmetic and saturated/ rounded outputs. The ConnX BBE16 DSP is capable of performing up to 16 multiplies per operation. BBE16 includes extended precision with guard bits on all register data and full support of double precision data, and 40-bit accumulation on all MAC operations without performance penalty. A wide variety of arithmetic, logical and shift operations are supported for up to eight data words per cycle. There is full support for matrix multiplication with acceleration for OFDM matrix operations.
Our ConnX BBE16 DSPs also support single-cycle radix-4 and radix-8 butterfly operations for efficient high-speed FFT implementations. Support for a single-cycle 4-tap FIR filter with complex taps and single-cycle 16-tap FIR filter with real taps enables efficient filtering operations. Special instructions supporting radix 3/5 FFT are also provided. Symmetric filters on real and complex data at double rate, e.g., 32 real taps/cycle.
Connx BBE16 DSPs support custom Ports (general-purpose wire interfaces) and Queues (FIFO) for efficient connection to coprocessors. These custom interfaces can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two ConnX BBE16 DSPs or between a ConnX BBE16 DSP and an RTL block can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools.
Local memories can be connected directly to a ConnX BBE16 DSP using the Lookup interface, bypassing the processor memory bus. This allows efficient implementation of functions that require storage of multiple intermediate datasets.
ConnX BBE16 DSPs can also be modified and extended by defining new instructions, registers, and execution units to augment the existing instruction set. With Cadence, you can choose from a wide range of configuration options.
A complete set of tools are available to support ConnX BBE16 DSPs. A comprehensive instruction set simulator (ISS) allows developers to quickly simulate and evaluate performance. The fast, functional TurboSim™ simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C and C-based system modeling can aid in full-chip simulations.
The tool set incudes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in ConnX BBE16 DSPs. This comprehensive tool set also includes the linker, assembler, debugger, profiler, and graphic visualization tools. All major EDA flows are supported. See our Knowledge Center for more details on the tools.