DSP IP Cores for Baseband and RF Signal Processing | Cadence IP

Home: IP Portfolio > Tensilica IP > ConnX DSPs for Baseband/Communications

ConnX DSPs for Baseband/Communications

Tackling the Hard Tasks in the Dataplane

Baseband and RF Signal Processing IP

Cadence offers more ways to perform complex signal processing than any other IP company. Cadence offers a full range of processors and DSPs for the best combination of high performance, low power, and small area, exactly tailored to your application.

From the lightweight dual-MAC ConnX D2 to the super-high-performance 64-MAC ConnX BBE64EP, these designs are ready to run. Go with the industry's best performing and most compact low-power DSPs for applications from SmartGrid to 802.11 AC modems and LTE-Advanced. Cadence offers several special-function processors so you don't have to design these common functions, and you can accelerate your design effort.

ConnX Family

No matter which Tensilica solution you choose, remember that it's based on our 32-bit Xtensa® RISC processor and toolset. Unlike a traditional fixed-configuration DSP core, all Tensilica DSPs and processors are fully:

  • Configurable—Select the pre-built functions you need with full C language, library, and verification support.
  • Extensible—Add custom instructions using our Verilog-like TIE language. The results are automatically integrated into the programming tools with full verification support. Custom ports allow your hardware accelerators to be directly integrated into the core, appearing to the programmer as a standard instruction.
  • Scalable—Configurable I/O ports and memory allow you to easily scale your performance from a simple single-core design to a sophisticated direct ported multi-core solution.

Whether your need is for a single core, a homogeneous multi-core solution, or a highly optimized heterogeneous mix of processors, DSPs, and hardware accelerator blocks, our ConnX family of DSPs, s is the ideal place to get started on your communications platform design.

ConnX D2 Dual-MAC, 16-bit Fixed-Point Communications DSP

A Flexible 2-MAC DSP, Programmable in C

The Cadence Tensilica ConnX D2 DSP, used with the Xtensa LX processor core, provides approximately 20% higher performance than similar dual-MAC architectures. You benefit from the flexibility of C progamming with assembly-level performance.  It's an ideal solution for wireless communications, disk drives, home entertainment devices, and computer peripherals—anything that requires a highly efficient 16-bit fixed-point DSP.

The ConnX D2 option adds dual 16-bit MAC units and a 40-bit register file to the base Xtensa LX processor. It utilizes two-way SIMD (single instruction, multiple data) instructions to provide high performance on vectorizable C code. It also delivers dual-MAC performance using 64-bit very long instruction word (VLIW) instructions for code that cannot be vectorized.

Exceptional Out-of-the-Box Performance

The ConnX D2 DSP is tightly integrated with advanced Tensilica XCC compiler technology. The XCC compiler efficiently maps C algorithms to the ConnX D2 ISA (instruction set architecture) from native C and C intrinsic code, removing the need for time-consuming assembly code optimization. 

Consistent Performance—Even When Vectorization Is Not Possible

Many high-performance DSPs are large SIMD engines that run vector data through at maximum bandwidth. These DSPs rely upon compiler vectorization of C code to hit their peak performance levels.  However, if a loop isn't vectorizable, then the SIMD engine degenerates into a single-MAC DSP, and non-vectorizable code is commonplace.

By contrast, the dual MACs in ConnX D2 DSPs can be fully saturated with either SIMD instructions or VLIW instructions, delivering maximum performance on all types of C code.

Advanced DSP Instruction Set

The ConnX D2 instruction set is specifically optimized for the demanding numeric computations required for DSP, with 275 DSP-specific optimizing instructions. ConnX D2 DSPs efficiently perform 16-, 32-, and 40-bit fixed point additions, subtractions, and multiplies with rounding and saturation. It uses seven DSP-centric addressing schemes and adds data manipulation instructions including shifting, swapping, and logical operations to provide outstanding performance on DSP algorithms.

In addition to suporting major DSP addressing modes, ConnX D2 DSPs add specific DSP acceleration instructions such as Add-Compare-Exchange (used with Viterbi algorithms), Add Modulo, and Add Subtract. Additional instructions perform vector base loads and stores to support multiple data widths and SIMD data register loading orders, which can be aligned or unaligned.

ConnX D2 DSPs use dual-port write technology that allows two results to be written to the register files in one instruction. This can give a maximum of three writes to the register files per cycle within the VLIW implementation.

Easily Further Customized

Because the ConnX D2 DSP engine is a click-box option for the Xtensa processor, designers can further customize the processor using the flexible Xtensa Processor Generator.  You can easily add multi-cycle execution units, registers, register files, and much more. You can also chose from a wide range of configuration options.

ConnX BBE16 16-MAC VLIW Baseband DSP

A full-featured high-performance 16-MAC DSP, programmable in C

The ConnX BBE16 baseband DSP combines an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces. This high-performance Cadence Tensilica DSP is built around a vector pipeline made of 16 18-bit x 18-bit MACs. These multipliers and associated adder and multiplexer trees enable operations such as FFT butterflies, parallel complex multiply operations, and signal filter structures. The results of these operations can be full precision or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations.

The instruction set has been optimized for performance of DSP kernel operations such as FFT and FIR as well as matrix multiplies. Acceleration has been added for a wide range of key wireless functions to deliver very high performance in wireless applications.

The ConnX BBE16 DSP is optimized for wireless communication, particularly in LTE and 4G cellular radios and multi-standard broadcast receivers. The high computation requirements of these applications require innovative architectures with a high degree of parallelism and efficient I/O. The ConnX BBE16 DSP meets these needs by combining an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces.

Like all ConnX DSPs, the ConnX BBE16 DSP is fully programmable in C with a vectorizing compiler. Automatic vectorization of scalar C and full support for vector data types allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.

Instruction set optimized for DSP

The ConnX BBE16 DSP is an option for the Xtensa LX processor. It adds a highly customized DSP and baseband instruction set.

A wide variety of Load/Store operations support nine different addressing modes with support for 16-bit/32-bit scalar and vector data types. Unaligned Load/Stores with masking deliver full bandwidth loads and stores for unaligned data. Vector data management is supported with data packing and shifting.

Multiply operations include complex and scalar 18-bit x 18-bit multiply, multiply-round, multiply-add, and multiply-subtract functions. Complex-number functions include support for conjugate arithmetic and magnitude operations as well as full precision arithmetic and saturated/ rounded outputs. The ConnX BBE16 DSP is capable of performing up to 16 multiplies per operation. BBE16 includes extended precision with guard bits on all register data and full support of double precision data, and 40-bit accumulation on all MAC operations without performance penalty. A wide variety of arithmetic, logical and shift operations are supported for up to eight data words per cycle. There is full support for matrix multiplication with acceleration for OFDM matrix operations.

Our ConnX BBE16 DSPs also support single-cycle radix-4 and radix-8 butterfly operations for efficient high-speed FFT implementations. Support for a single-cycle 4-tap FIR filter with complex taps and single-cycle 16-tap FIR filter with real taps enables efficient filtering operations. Special instructions supporting radix 3/5 FFT are also provided. Symmetric filters on real and complex data at double rate, e.g., 32 real taps/cycle.

Innovative I/O using Port, Queue, and Lookup interfaces

Connx BBE16 DSPs support custom Ports (general-purpose wire interfaces) and Queues (FIFOs) for efficient connection to coprocessors. These custom interfaces can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two ConnX BBE16 DSPs or between a ConnX BBE16 DSP and an RTL block can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools.

Local memories can be connected directly to a ConnX BBE16 DSP using the Lookup interface, bypassing the processor memory bus. This allows efficient implementation of functions that require storage of multiple intermediate datasets. 

Extensible—modify further for your requirements

ConnX BBE16 DSPs can also be modified and extended by defining new instructions, registers, and execution units to augment the existing instruction set. With Cadence, you can choose from a wide range of configuration options.

Supported by a complete set of hardware and software tools

A complete set of tools are available to support ConnX BBE16 DSPs. A comprehensive instruction set simulator (ISS) allows developers to quickly simulate and evaluate performance. The fast, functional TurboSim™ simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C and C-based system modeling can aid in full-chip simulations.

The tool set incudes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in ConnX BBE16 DSPs. This comprehensive tool set also includes the linker, assembler, debugger, profiler, and graphic visualization tools. All major EDA flows are supported. See our Knowledge Center for more details on the tools.

ConnX BBE16EP, BBE32EP and BBE64EP DSPs for scalable baseband processing

16-, 32- and 64-MAC DSPs for demanding baseband processing

The latest additions to the product line, the Cadence Tensilica ConnX BBE16EP, BBE32EP and BBE64EP enhanced performance DSPs for baseband applications are optimized for complex number processing. The 16-MAC BBE16EP, 32-MAC BBE32EP and the 64-MAC BBE64EP offer significant improvements in maximum frequency and algorithmic performance while reducing both silicon area and power consumption versus earlier generations of DSPs. They provide unprecedented flexibility in implementing systems at power consumption levels that significantly reduce the need for hardware accelerators. With identical architectures, N-way programming model compatibility, the BBExxEP family of DSPs provide you with significant design flexibility and an easy upgrade path when needed.

The ConnX BBE EP DSPs are suitable for both infrastructure and user equipment applications. The ConnX BBE64EP is suited for multiple RF stream processing applications such as LTE-Advanced, 5G and other high-throughput MIMO systems such as 802.11ac. All of them can be easily optimized through check-box options. 

Instruction set features

  • 16-way (BBE16EP), 32-way (BBE32EP) or 64-way (BBE64EP) multiplier-accumulator (MAC), dual 8/16/32-way arithmetic logic unit (ALU) single instruction, multiple data (SIMD) engines
  • 5-issue very long instruction word (VLIW) for parallel load/store, MAC, and ALU ops
  • 32-bit scalar ALU
  • Advanced precision for matrix inversion and divide operations
  • Optimized instructions for complex arithmetic, polynomial evaluation, matrix multiplication, block floating point, bit-oriented operations, and vector compression and expansion
  • Predicated vector instructions
  • Wide memory bandwidth—128/256/512-bit load/store and 128/256/512-bit load units
  • 10-stage DSP pipeline
  • High-performance C/C++ compiler with automatic vectorization of scalar C and full support for vector data
  • TI intrinsic support, rich application libraries

The instruction set and architecture is tuned to meet the performance and computation requirements of advanced wireless systems. Compared with the typical user equipment DSP that offloads many of the computationally-intense operations to hardware acceleration blocks, ConnX BBE EP DSPs offer a more complete instruction set plus options for accelerating key algorithms while still remaining fully programmable. 

Load/store operations support five standard addressing modes and two specialized modes: bit reverse for FFTs and circular for functions like circular buffering. The addressing modes support a variety of data formats including scalar and vector, real, and complex data types.

Configurable, extensible, scalable

The ConnX BBE EP DSPs provide 13 pre-built vector options, which are included/excluded as checkboxes when defining a DSP from within the tools. These checkboxes result in seamless integration of a feature into the hardware, the compiler, the modeling tools, and the verification scripts. Using these capabilities, you can build a custom DSP without the large development schedule impact that a change in hardware design would normally involve.

Integrating an optimized FFT solution is as simple as checking a box when configuring a ConnX processor. All of the verification and tool support is provided automatically as part of the tool chain. The ConnX BBE EP DSPs can be extended to support custom ports (general-purpose wire interfaces) and queues (FIFOs) for efficient connection to offload accelerators. These custom interfaces can be defined to match the interfaces of existing third-party IP. Buffered communication between two ConnX DSPs or between a ConnX DSP and an offload accelerator can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools. These interfaces are dedicated to the offload accelerator and offer single-cycle access. Thus, ConnX BBE EP DSPs can access hardware offload accelerators in a single-cycle deterministic operation, greatly reducing power consumption and without impacting the shared system bus.

Local memories can be connected directly to a ConnX DSP, bypassing the system memory bus and allowing efficient implementation of functions that require storage of multiple intermediate datasets.

Application space—LTE, WCDMA/HSPA+, Wi-Fi, and Beyond

ConnX BBE16EP, BBE32EP and BBE64EP DSP baseband engines are high-performance DSPs designed for next-generation communication systems such as LTE Advanced, 802.11ac, and DVB. Advanced precision options are specifically designed to meet the precision and performance requirements associated with advanced MIMO systems. In addition to vector-based filtering, FFT, and matrix capabilities, a fully-featured instruction set includes a full range of bit-oriented operations used in 3G systems such as UMTS, cdma2000, and 1xEV-DO. In this way, ConnX BBE EP DSPs excel at multi-standard physical layer processing, providing opportunities for hardware savings and a broader scope of applications than a dedicated fixed-hardware solution can provide.

As physical layer (PHY) system developers move to advanced standards such as LTE-Advanced, they face the need for dramatic increases in performance from their processing platforms. ConnX DSPs meet this challenge with highly parallel vector engines. When processing needs scale beyond that of a single DSP, the ConnX BBE EP family provides smooth support for multi-core solutions. Multi-core solutions may involve other DSPs from the ConnX BBE family or extend into other Tensilica DSP processors.

System designers also face considerable uncertainty as to the algorithmic implementation that will deliver the best performance. In fact, as systems become more diverse with wide-scale deployment of heterogeneous networks, a solution that works best for a microcell operating on a bullet train in Japan may be very different from one that will work best for a similar microcell operating in a subterranean pedestrian mall in Montreal. With a fully programmable software-based solution using the ConnX BBE64EP core, you could implement both solutions on a single platform, permitting it to evolve without going back for a re-spin of silicon for new functionality or bug fixes.

The configurability and extensibility of ConnX DSPs also allows you to optimize the hardware for specific algorithms without the typical development delays associated with an ASIC design.

Using ConnX BBE16EP/32EP/64EP DSPs, you can deliver a working solution in less time than with a traditional hardware or a hybrid hardware/DSP design. You can also take advantage of the hardware platform for a broader range of applications, over a longer period of time. Ultimately, this helps reduce design time and costs, helping you to finish faster and be more competitive in the marketplace.

Supported by a complete set of hardware and software tools

Our complete set of tools includes a comprehensive instruction set simulator (ISS), which allows developers to quickly simulate and evaluate performance. The fast, functional TurboSimTM simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C and C-based system modeling can aid in full-chip simulations.

The tool set incudes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline. This comprehensive tool set also includes the linker, assembler, debugger, profiler, and graphic visualization tools. All major EDA flows are supported. See our Knowledge Center for more information on our tools and the hardware/software development process

Specialized Baseband Processors

Specialized Processors for Complex Baseband Functions

Complementing the ConnX BBE DSPs are special-function processors that target resource-intensive algorithms. These processors offer programmable, low-power, high-performance alternatives to hard-coded ASIC accelerators that otherwise would limit the overall flexibility of your system design. All are based on the proven Tensilica Xtensa architecture.

ConnX BSP3—Bit-Stream Processor

The ConnX BSP3 (Bit-Stream Processor) is designed for use in baseband PHY systems found in LTE and HSPA+ cellular radios and multi-standard broadcast receivers. It is specifically optimized for processing and manipulating bit streams, including operations for CRC, interleavers, scramblers, and more.

The ConnX BSP3 offers an architecture and optimized instruction set with the parallel execution of a 3-issue VLIW machine. The dual 32-bit wide data path combined with the 3-issue VLIW allows single-cycle load, computer, and store. Additionally, it can load four vectors in one cycle. This is all done in a small-size processor giving very high performance per area and power.

  • Dual 32b load/store supports up to 4MB addressable region
  • Optimized for 16-, 20-, 32-, and 40-bit vector operations
  • 128-bit-wide vector files, allowing the loading, computation, and storing of four 32-bit words, eight 16-bit words, or sixteen 8-bit words at a time
  • Very high performance for small area and low power for bit computation

ConnX SSP16—Soft-Stream Processor

The ConnX SSP16 (Soft-Stream Processor) is specifically optimized for processing streams of soft bits, which are 4- to-8 bit representations of transmitted bits. Soft bits are generated by the demodulator in the receive chain and used in HARQ pre-processing and header decoding. The ConnX SSP16 meets these needs by combining a 16-way SIMD, 3-slot VLIW processing pipeline optimized for 10-bit and 8-bit processing (10-bit supports the required precision for multiple operations on 8-bit data).

The dual 128-bit wide data path allows 16-way loading and operations for higher performance. The ConnX SSP16 also supports specialized functions such as the transpose memory module and the Viterbi accelerator module.

  • Supports 3-bit, 8-bit, and 16-bit scalar data types, and 8-bit vector data types that use 10-bit internal representation per element providing two guard bits
  • Extensible interfaces with custom-designed Port, Queue, and Lookup interfaces
  • Dual 128-bit load/store unit supports up to 4MB addressable region
  • Optimized for small size and low power

Build Your Own DSPs

Customize Your Signal Processing DSPs

See some interesting ideas, but want something slightly different? That's the beauty of the Cadence Tensilica approach to IP design. From the start, we designed our IP to be customizable. We used that same technology to create these innovative baseband IP cores.

Why Cadence?

  • Ultra-low power consumption and size—With optimized cores that reduce required system clock frequency
  • Flexibility—A scalable platform to fit all performance, power, and area budgets that can be further customized to meet your needs
  • Reduced development cost and development risk—All programmable in C, backed by a world-class development tool suite and multi-core support
  • Low-risk solution with a large ecosystem—Supports all Tensilica products

We recommend two approaches to get you quickly to the exact product you need:

  • Start with one of our standard ConnX products—Modifying an extisting product will save you a lot of design work and effort
  • Start with our Xtensa processor—Starting with a clean slate means you can design everything just the way you'd like it

For digital signal processing (DSP) applications, with unique datapaths, processing requirements, algorithms, and memory requirements, this customization process is often essential to get the smallest, most energy-efficient core possible.

Either way, our automated tools will help you through the design process, making sure the design is correct by construction, and helping you make sure you get the right mix of power, performance, and area. And when you're done, our automated Xtensa processor generator will make sure you get not only the hardware for your new design, but also a complete matching software tool chain.

Accelerate Hot Spots in Applications

You don't have to go to higher MHz to get higher performance. By adding instructions in TIE, our Verilog-like language, you can accelerate hot spots in your applications. You can pump data through our cores with up to two 512-bit-wide data load/stores per cycle, or bypass the bus entirely with our unique GPIO and FIFO Queues. Here are some ways you can customize our DSPs:

Data paths

  • The width of data load/store, computation execution, and register files can all be tailored to your specific application

SIMD widths

  • Some applications may greatly benefit from vectorizing computation through a SIMD machine
  • The size of SIMD and vector "strides" can be customized to optimum performance per power/area for the application

Custom instructions

  • Create instructions that perform application-specific tasks
  • Create "incredible performance" for application, reduce instruction memory footprint

Parallel instruction execution

  • VLIW architecture to enable parallel computation of instructions
  • Example: use one instruction to perform load, execute, store

Tools, Software, Libraries for DSPs

Tools, Software, Libraries—We Have What You Need to Complete Your Design Quickly

For digital signal processing (DSP) applications with unique datapaths, processing requirements, algorithms, and memory requirements, the Cadence customization process is often essential to get the smallest, most energy-efficient core possible. No matter what changes you make, you'll find our tools and software will help you be more efficient.

Hardware design 

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom processor or DSP along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software design

For Software Developers

When you need to develop your application software, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. The Cadence Tensilica Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.


Libraries and Existing DSP Code Base Support

We do everything we can to make it was easy as possible to port your existing DSP code to our DSPs. Our Xtensa C/C++ Compiler efficiently maps C algorithms to our DSPs, no assembly coding required.

We also provide a range of DSP libraries already tailored to our products, so you can speed your design process.

Literature and Other Resources

Learn More About Our Baseband Processors and DSPs 

Seriously considering using a ConnX DSP in your next SoC design but want to learn more? Here are some things you should explore:

Product Literature

ConnX BBE16 (Baseband Engine)
ConnX BBE32EP Data Sheet
ConnX D2 DSP Engine

Hardware/Software Design Tools

Xtensa Processor Developer's Toolkit
Xtensa Software Developer's Toolkit

White Papers

An Efficient, High-Performance DSP Architecture for W-CDMA Receivers