The leader in Dataplane processing

Home: IP Portfolio > Tensilica IP > Baseband and RF Signal Processing

Baseband and RF Signal Processing

Tackling the Hard Tasks in the Dataplane

Baseband and RF Signal Processing IP

When you want to put that WOW factor into your SoC designs, look to Cadence® Tensilica® DPUs. We offer more ways to perform complex signal processing than any other company. Cadence offers a full range of DPUs and DSPs for the best combination of high performance, low power, and small area, exactly tailored to your application.

From the lightweight dual-MAC ConnX D2 to the super-high-performance 64-MAC ConnX BBE64EP, these designs are ready to run. Go with the industry's best performing and most compact low-power engines for applications from SmartGrid to 802.11 AC modems and  LTE-Advanced. Cadence offers several special-function DPUs so you don't have to design these common functions, and you can accelerate your design effort.

Samsung and LG are using Cadence's Tensilica HiFi Audio in their DTVs.

ConnX Family

No matter which Tensilica solution you choose, remember that it's based on our 32-bit Xtensa® RISC processor and toolset. Unlike a traditional fixed-configuration DSP core, all Tensilica DSPs and DPUs are fully:

  • Configurable—Select the pre-built functions you need with full C language, library, and verification support.
  • Extensible—Add custom instructions using our Verilog-like TIE language. The results are automatically integrated into the programming tools with full verification support. Custom ports allow your hardware accelerators to be directly integrated into the core, appearing to the programmer as a standard instruction.
  • Scalable—Configurable I/O ports and memory allow you to easily scale your performance from a simple single-core design to a sophisticated direct ported multi-core solution.

Whether your need is for a single core, a homogeneous multi-core solution, or a highly optimized heterogeneous mix of DSPs, DPUs, and hardware accelerator blocks, our ConnX family of DSPs and DPUs supported by the Xtensa tool chain is the ideal place to get started on your baseband platform design.

ConnX D2 Dual-MAC, 16-bit Fixed-Point Communications DSP

A Flexible 2-MAC DSP, Programmable in C

The Cadence Tensilica ConnX D2 DSP engine, used with the Xtensa LX processor core, provides approximately 20% higher performance than similar dual-MAC architectures. You benefit from the flexibility of C progamming with assembly-level performance.  It's an ideal solution for wireless communications, disk drives, home entertainment devices, and computer peripherals—anything that requires a highly efficient 16-bit fixed-point DSP.

The ConnX D2 option adds dual 16-bit MAC units and a 40-bit register file to the base Xtensa LX processor. It utilizes two-way SIMD (single instruction, multiple data) instructions to provide high performance on vectorizable C code. It also delivers dual-MAC performance using 64-bit  very long instruction word (VLIW) instructions for code that cannot be vectorized.

Exceptional Out-of-the-Box Performance

The ConnX D2 DSP engine is tightly integrated with advanced Tensilica XCC compiler technology. The XCC compiler efficiently maps C algorithms to the ConnX D2 ISA (instruction set architecture) from native C and C intrinsic code, removing the need for time-consuming assembly code optimization. 

Consistent Performance—Even When Vectorization Is Not Possible

Many high-performance DSPs are large SIMD engines that run vector data through at maximum bandwidth. These DSPs rely upon compiler vectorization of C code to hit their peak performance levels.  However, if a loop isn't vectorizable, then the SIMD engine degenerates into a single-MAC DSP, and non-vectorizable code is commonplace.

By contrast, the dual MACs in ConnX D2 DSPs can be fully saturated with either SIMD instructions or VLIW instructions, delivering maximum performance on all types of C code.

Advanced DSP Instruction Set

The ConnX D2 instruction set is specifically optimized for the demanding numeric computations required for DSP, with 275 DSP-specific optimizing instructions. ConnX D2 DSPs efficiently perform 16-, 32-, and 40-bit fixed point additions, subtractions, and multiplies with rounding and saturation. It uses seven DSP-centric addressing schemes and adds data manipulation instructions including shifting, swapping, and logical operations to provide outstanding performance on DSP algorithms.

In addition to suporting major DSP addressing modes, ConnX D2 DSPs add specific DSP acceleration instructions such as Add-Compare-Exchange (used with Viterbi algorithms), Add Modulo, and Add Subtract. Additional instructions perform vector base loads and stores to support multiple data widths and SIMD data register loading orders, which can be aligned or unaligned.

ConnX D2 DSPs use dual-port write technology that allows two results to be written to the register files in one instruction. This can give a maximum of three writes to the register files per cycle within the VLIW implementation.

Easily Further Customized

Because the ConnX D2 DSP engine is a click-box option for the Xtensa processor, designers can further customize the processor using the flexible Xtensa Processor Generator.  You can easily add multi-cycle execution units, registers, register files, and much more. You can also chose from a wide range of configuration options.

ConnX BBE16 16-MAC VLIW Baseband DSP

A Full-Featured High-Performance 16-MAC DSP, Programmable in C

The ConnX BBE16 Baseband Engine combines an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces. This high-performance Cadence Tensilica DSP is built around a core vector pipeline made of 16 18bx18b MACs. These multipliers and associated adder and multiplexer trees enable operations such as FFT butterflies, parallel complex multiple operations, and signal filter structures. The results of these operations can be full precision or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations.

The instruction set has been optimized for performance of DSP kernel operations such as FFT and FIR as well as matrix multiplies. Acceleration has been added for a wide range of key wireless functions to deliver very high performance in wireless applications.

The ConnX BBE16 DSP is optimized for wireless communication, particularly in LTE and 4G cellular radios and multi-standard broadcast receivers. The high computation requirements of these applications require innovative architectures with a high degree of parallelism and efficient I/O. The ConnX BBE16 DSP meets these needs by combining an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces.

Like all ConnX DSPs, the ConnX BBE16 DSP is fully programmable in C with a vectorizing compiler. Automatic vectorization of scalar C and full support for vector data types allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.

Instruction Set Optimized for DSP

The ConnX BBE16 DSP is an option for the Xtensa LX processor. It adds a highly customized DSP and baseband instruction set.

A wide variety of Load/Store operations support nine different addressing modes with support for 16b/32b scalar and vector data types. Unaligned Load/Stores with masking deliver full bandwidth Loads and Stores for unaligned data. Vector data management is supported with data packing and shifting.

Multiply operations include complex and scalar 18bx18b multiply, multiply-round, multiply-add, and multiply-subtract functions. Complex-number functions include support for conjugate arithmetic and magnitude operations as well as full precision arithmetic and saturated/ rounded outputs. The ConnX BBE16 DSP is capable of performing up to 16 multiplies per operation. BBE16 includes extended precision with guard bits on all register data and full support of double precision data, and 40-bit accumulation on all MAC operations without performance penalty. A wide variety of arithmetic, logical and shift operations are supported for up to eight data words per cycle. There is full support for matrix multiplication with acceleration for OFDM matrix operations.

Our ConnX BBE16 DSPs also support single-cycle radix-4 and radix-8 butterfly operations for efficient high-speed FFT implementations. Support for a single-cycle 4-tap FIR filter with complex taps and single-cycle 16-tap FIR filter with real taps enables efficient filtering operations. Special instructions supporting radix 3/5 FFT are also provided. Symmetric filters on real and complex data at double rate, e.g., 32 real taps/cycle.

Innovative I/O Using Port, Queue, and Lookup Interfaces

Connx BBE16 DSPs supports custom Ports (general-purpose wire interfaces) and Queues (FIFO) for efficient connection to coprocessors. These custom interfaces can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two ConnX BBE16 DSPs or between a ConnX BBE16 DSP and an RTL block can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools.

Local memories can be connected directly to a ConnX BBE16 DSP using the Lookup interface, bypassing the processor memory bus. This allows efficient implementation of functions that require storage of multiple intermediate datasets. 

Extensible—Modify Further for Your Requirements

ConnX BBE16 DSPs can also be modified and extended by defining new instructions, registers, and execution units to augment the existing instruction set. With Cadence, you can choose from a wide range of configuration options.

Supported by a Complete Set of Hardware and Software Tools

A complete set of tools are available to support ConnX BBE16 DSPs. A comprehensive instruction set simulator (ISS) allows developers to quickly simulate and evaluate performance. The fast, functional TurboSim™ simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C and C-based system modeling can aid in full-chip simulations.

The tool set incudes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in ConnX BBE16 DSPs. This comprehensive tool set also includes the linker, assembler, debugger, profiler, and graphic visualization tools. All major EDA flows are supported.

ConnX BBE32EP and BBE64EP DSPs for Baseband Processing

32- and 64-MAC DSPs for Demanding Baseband Processing

The latest additions to the product line, the Cadence Tensilica ConnX BBE32EP and BBE64EP enhanced performance DSP IP cores for baseband processing are optimized for complex number processing. Both the 32-MAC ConnX BBE32EP and the 64-MAC ConnX BBE64EP offer significant improvements in maximum frequency and algorithmic performance while reducing both silicon area and power consumption versus earlier generations of DSPs. Both provide unprecedented flexibility in implementing systems at power consumption levels that significantly reduce the need for hardware accelerators. With identical architectures, both cores provide you with significant design flexibility and an easy upgrade path when needed.

The ConnX BBE32EP is suitable for both infrastructure and user equipment applications. The ConnX BBE64EP is suited for multiple RF stream processing applications such as LTE-Advanced and other high-throughput MIMO systems such as 802.11ac. Both can be easily optimized through check-box options. 

Instruction Set Features

  • 32-way (BBE32EP) or 64-way (BBE64EP) multiplier-accumulator (MAC), dual 16/32-way arithmetic logic unit (ALU) single instruction, multiple data (SIMD) engines
  • 5-issue very long instruction word (VLIW) for parallel load/store, MAC, and ALU ops
  • 32-bit scalar ALU
  • Advanced Precision for matrix inversion and divide operations
  • Optimized instructions for complex arithmetic, polynomial evaluation, matrix multiplication, block floating point, bit-oriented operations, and vector compression and expansion
  • Predicated vector instructions
  • Wide memory bandwidth—256/512-bit load/store and 256/512-bit load units
  • 10-stage DSP pipeline
  • High-performance C/C++ compiler with automatic vectorization of scalar C and full support for vector data
  • TI intrinsic support, rich application libraries

The instruction set and architecture is tuned to meet the performance and computation requirements of advanced wireless systems. Compared with the typical user equipment DSP that offloads many of the computationally-intense operations to hardware acceleration blocks, ConnX BBE32EP and BBE64EP cores offer a more complete instruction set plus options for accelerating key algorithms while still remaining fully programmable. 

Load/store operations support five standard addressing modes and two specialized modes: bit reverse for FFTs and circular for functions like circular buffering. The addressing modes support a variety of data formats including scalar and vector, real, and complex data types.

Configurable, Extensible, Scalable

ConnX BBE32EP and BBE64EP DSP cores provide 13 pre-built vector options, which are included/excluded as checkboxes when defining a core from within the tools. These checkboxes result in seamless integration of a feature into the hardware, the compiler, the modeling tools, and the verification scripts. Using these capabilities, you can build a custom core without the large development schedule impact that a change in hardware design would normally involve.

Integrating an optimized FFT solution is as simple as checking a box when configuring a ConnX processor. All of the verification and tool support is provided automatically as part of the tool chain. The ConnX BBE32EP core can be extended to support custom Ports (general-purpose wire interfaces) and queues (FIFO) for efficient connection to offload accelerators. These custom interfaces can be defined to match the interfaces of existing third-party IP. Buffered communication between two ConnX cores or between a ConnX core and an offload accelerator can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools. These interfaces are dedicated to the offload accelerator and offer single-cycle access. Thus, ConnX cores can access hardware offload accelerators in a single-cycle deterministic operation, greatly reducing power consumption and without impacting the shared system bus.

Local memories can be connected directly to a ConnX DSP, bypassing the system memory bus and allowing efficient implementation of functions that require storage of multiple intermediate datasets.

Application Space—LTE, WCDMA/HSPA+, Wi-Fi, and Beyond

ConnX BBE32EP and BBE64EP Baseband Engines are high-performance DSPs designed for next-generation communication systems such as LTE Advanced, 802.11ac, and DVB. Advanced Precision options are specifically designed to meet the precision and performance requirements associated with advanced MIMO systems. In addition to vector-based filtering, FFT, and matrix capabilities, a fully-featured instruction set includes a full range of bit-oriented operations used in 3G systems such as UMTS, cdma2000, and 1xEV-DO. In this way, ConnX cores excel at multi-standard physical layer processing, providing opportunities for hardware savings and a broader scope of applications than a dedicated fixed-hardware solution can provide.

As physical layer (PHY) system developers move to advanced standards such as LTE-Advanced, they face the need for dramatic increases in performance from their processing platforms. ConnX cores meet this challenge with highly parallel vector engines. When processing needs scale beyond that of a single DSP, the ConnX BBE32EP/64 family provides smooth support for multi-core solutions. Multi-core solutions may involve other cores from the ConnX BBE family or extend into specialized Tensilica processing engines such as the ConnX BSP3 Bit Stream Processor, the ConnX SSP16 Soft Stream Processor, or the ConnX Turbo16MS Turbo Decoder.

System designers also face considerable uncertainty as to the algorithmic implementation that will deliver the best performance. In fact, as systems become more diverse with wide-scale deployment of heterogeneous networks, a solution that works best for a microcell operating on a bullet train in Japan may be very different from one that will work best for a similar microcell operating in a subterranean pedestrian mall in Montreal. With a fully programmable software-based solution using the ConnX BBE64EP core, you could implement both solutions on a single platform, permitting it to evolve without going back for a re-spin of silicon for new functionality or bug fixes.

The configurability and extensibility of ConnX cores also allows you to optimize the hardware for specific algorithms without the typical development delays associated with an ASIC design.

Using ConnX BBE32/64EP cores, you can deliver a working solution in less time than with a traditional hardware or a hybrid hardware/DSP design. You can also take advantage of the hardware platform for a broader range of applications, over a longer period of time. Ultimately, this helps reduce design time and costs, helping you to finish faster and be more competitive in the marketplace.

Supported by a Complete Set of Hardware and Software Tools

Our complete set of tools includes a comprehensive instruction set simulator (ISS), which allows developers to quickly simulate and evaluate performance. The fast, functional TurboSimTM simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C and C-based system modeling can aid in full-chip simulations.

The tool set incudes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline. This comprehensive tool set also includes the linker, assembler, debugger, profiler, and graphic visualization tools. All major EDA flows are supported.

Specialized DPUs for Baseband Processing

Specialized DSPs for Complex Baseband Functions

Complementing the ConnX BBE DSPs are special-function dataplane processor units (DPUs) that target algorithms that consume enough resources (often of a specialized type) to justify their own DPU engine. DPUs offer programmable, low-power, high-performance alternatives to hard-coded ASIC accelerators that otherwise would limit the overall flexibility of your system design.

ConnX BSP3—Bit-Stream Processor

The ConnX BSP3 (Bit-Stream Processor) is designed for use in baseband PHY systems found in LTE and HSPA+ cellular radios and multi-standard broadcast receivers. It is specifically optimized for processing and manipulating bit streams, including operations for CRC, interleavers, scramblers, and more.

The ConnX BSP3 offers an architecture and optimized instruction set with the parallel execution of a 3-issue VLIW machine. The dual 32-bit wide data path combined with the 3-issue VLIW allows single-cycle load, computer, and store. Additionally, it can load four vectors in one cycle. This is all done in a small-size processor giving very high performance per area and power.

  • Dual 32b load/store supports up to 4MB addressable region
  • Optimized for 16-, 20-, 32-, and 40-bit vector operations
  • 128-bit-wide vector files, allowing the loading, computation, and storing of four 32-bit words, eight 16-bit words, or sixteen 8-bit words at a time
  • Very high performance for small area and low power for bit computation

ConnX SSP16—Soft-Stream Processor

The ConnX SSP16 (Soft-Stream Processor) is specifically optimized for processing streams of soft bits, which are 4- to-8 bit representations of transmitted bits. Soft bits are generated by the demodulator in the receive chain and used in HARQ pre-processing and header decoding. The ConnX SSP16 meets these needs by combining a 16-way SIMD, 3-slot VLIW processing pipeline optimized for 10-bit and 8-bit processing (10-bit supports the required precision for multiple operations on 8-bit data).

The dual 128-bit wide data path allows 16-way loading and operations for higher performance. The ConnX SSP16 also supports specialized functions such as the transpose memory module and the Viterbi accelerator module.

  • Supports 3-bit, 8-bit, and 16-bit scalar data types, and 8-bit vector data types that use 10-bit internal representation per element providing two guard bits
  • Extensible interfaces with custom-designed Port, Queue, and Lookup interfaces
  • Dual 128-bit load/store unit supports up to 4MB addressable region
  • Optimized for small size and low power

ConnX Turbo16MS Multistandard Turbo Processor

The ConnX Turbo16MS is a high-performance DPU specifically designed for decoding of LTE Turbo codes on data streams of up to 150 Mbps and HSPA+ data streams of up to 85 Mbps. This performance is required for 3.9G and 4G cellular radios and multi-standard broadcast receivers.

ConnX Turbo16MS has been optimized in two areas. First, a customized instruction set has been developed for LTE and HSPA+ turbo decoding. Second, it uses parallel execution for very high data bandwidth computation. This includes the 5-issue VLIW capability and the two load/store units that allow loading of dual memories in a single cycle. There are also 23 very tightly coupled scratch pad memories for storing a priori and state values that are accessed by instructions in parallel. This results in up to five memory accesses per cycle. Only this level of parallelism can give ConnX Turbo16MS the performance needed for multi-standard turbo decoding.

  • Dual 128-bit load/store units
  • LTE turbo decoding of up to 150 Mbps data streams with eight full iterations
  • HSPA+ turbo decoding of up to 85 Mbps data streams with eight full iterations
  • Small size and low power

The ConnX Turbo16MS provides multi-standard turbo decoding common to DTV, cdma2000, W-CDMA, and LTE. Typically the system designer would be forced to implement this decoder in an ASIC since it would typically consume or exceed the resources on a DSP.  The ConnX Turbo16MS enables a fully programmable decoder in a smaller package at lower power than can be provided using a generalized programmable DSP.

System Solutions for Baseband Processing

System Solutions

Cadence is committed to helping you get your new design to market as quickly as possible. To that end, we have created the Atlas LTE reference design and we have partnered with mimoOn for LTE-Advanced PHY software.

The Atlas LTE Reference Architecture Jump Starts Your LTE Design

The Atlas LTE reference architecture implements the complete 3GPP Long Term Evolution (LTE) layer 1 PHY—including the computationally demanding Turbo decoder—in a completely processor-based, fully programmable DSP core reference architecture. The Atlas reference architecture implements a fully programmable SDR, all controlled by software. All of the processors involved use the same easy software development, debug, and simulation environment. You can easily partition your algorithm into these cores with simple synchronization.

The ConnX Atlas reference architecture is intended as a starting point for design teams implementing LTE baseband systems. A design team will integrate the Atlas components together with the Layer 2 design elements and system interconnect elements of the design team's choosing. The components of the Atlas architecture are modular. A designer may opt to deploy all or just some of these processors. Or a design team may opt to re-use pre-existing RTL blocks in lieu of one or more of the Atlas components.

Comprehensive LTE-Advanced Hardware/Software PHY IP Solution

MimoOn logo

We partnered with mimoOn for the only comprehensive licensable IP solution for LTE-Advanced chip designs. Cadence is now the exclusive DSP IP vendor for mimoOn's LTE UE and eNodeB PHY software products. Read more.

Build Your Own DSPs

Customize Your Signal Processing DPUs

See some interesting ideas, but want something slightly different? That's the beauty of the Cadence Tensilica approach to IP design. From the start, we designed our IP to be customizable. We used that same technology to create these innovative baseband IP cores.

Why Cadence?

  • Ultra-low power consumption and size—With optimized cores that reduce required system clock frequency
  • Flexibility—A scalable platform to fit all performance, power, and area budgets that can be further customized to meet your needs
  • Reduced development cost and development risk—All programmable in C, backed by a world-class development tool suite and multi-core support
  • Low-risk solution with a large ecosystem—Supports all Tensilica products

We recommend two approaches to get you quickly to the exact product you need:

  • Start with one of our standard ConnX products—Modifying an extisting product will save you a lot of design work and effort
  • Start with our Xtensa LX processor—Starting with a clean slate means you can design everything just the way you'd like it

For digital signal processing (DSP) applications, with unique datapaths, processing requirements, algorithms, and memory requirements, this customization process is often essential to get the smallest, most energy-efficient core possible.

Either way, our automated tools will help you through the design process, making sure the design is correct by construction, and helping you make sure you get the right mix of power, performance, and area. And when you're done, our automated Xtensa processor generator will make sure you get not only the hardware for your new design, but also a complete matching software tool chain.

Accelerate Hot Spots in Applications

You don't have to go to higher MHz to get higher performance. By adding instructions in TIE, our Verilog-like language, you can accelerate hot spots in your applications. You can pump data through our cores with up to two 512-bit-wide data load/stores per cycle, or bypass the bus entirely with our unique GPIO and FIFO Queues. Here are some ways you can customize our DPUs:

Data paths

  • The width of data load/store, computation execution, and register files can all be tailored to your specific application

SIMD widths

  • Some applications may greatly benefit from vectorizing computation through a SIMD machine
  • The size of SIMD and vector "strides" can be customized to optimum performance per power/area for the application

Custom instructions

  • Create instructions that perform application-specific tasks
  • Create "incredible performance" for application, reduce instruction memory footprint

Parallel instruction execution

  • VLIW architecture to enable parallel computation of instructions
  • Example: use one instruction to perform load, execute, store

Tools, Software, Libraries for DSPs

Tools, Software, Libraries—We Have What You Need to Complete Your Design Quickly

For digital signal processing (DSP) applications with unique datapaths, processing requirements, algorithms, and memory requirements, the Cadence customization process is often essential to get the smallest, most energy-efficient core possible. No matter what changes you make, you'll find our tools and software will help you be more efficient.

Hardware design 

For Processor Designers

Cadence delivers patented, proven tools that automate the process of generating a custom DSP or DPU along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.

View the complete set of tools for processor designers.

Software design

For Software Developers

When you need to develop your application software, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. The Cadence Tensilica Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.

View the complete set of tools for software developers.

Libraries

Libraries and Existing DSP Code Base Support

We do everything we can to make it was easy as possible to port your existing DSP code to our DPUs. Our Xtensa C/C++ Compiler efficiently maps C algorithms to our DPUs, no assembly coding required.

We also provide a range of DSP libraries already tailored to our products, so you can speed your design process.

Literature and Other Resources

Learn More About Our Baseband DSPs and DPUs

Seriously considering using a ConnX DSP in your next SoC design but want to learn more? Here are some things you should explore:

Product Literature

ConnX BBE16 (Baseband Engine)
ConnX BBE32EP Data Sheet
ConnX BBE64EP Data Sheet
ConnX BSP Bit-Stream Processor
ConnX D2 DSP Engine

Hardware/Software Design Tools

Xtensa Processor Developer's Toolkit
Xtensa Software Developer's Toolkit

White Papers

An Efficient, High-Performance DSP Architecture for WCDMA Receivers
Microprocessor Report Reviews ConnX BBE64