Epson’s Breakthrough REALOID Printer SoC Powered by Multiple Xtensa Processor-

Multiple Xtensa DPUs Provide More Flexibility and Programmability Over Previous-Generation Hardwired RTL Solution

Epson is a global leader in imaging products including printers, projectors with Epson’s patented 3LCD-technology, and small- and medium-sized LCD displays. With an innovative and creative culture, Epson is dedicated to exceeding the vision and expectations of customers worldwide with products known for their superior quality, functionality, compactness, and energy efficiency.

Epson’s IJP Design Department designed the REALOID printer SoC to provide a high-performance and flexible solution that leapfrogs the previous generation of printers. The REALOID SoC achieves 3X higher performance than the previous generation of SoC. This performance boost means much shorter time to print a color photograph directly from sources such as a digital camera.

To achieve this, the image processing pipeline was re-designed to use a highly parallel architecture that converts the source image data from the digital camera to a data format suitable for much faster printing. The higher performance enables the REALOID-based printers to use PC-quality half-toning algorithms to print photographs with much more vibrant colors and smooth transitions. The REALOID SoC also displays the full impact of the Advanced Multi-Size Dot Technology (MSDT) that is employed in the new generation of Epson printers. The Advanced MSDT uses five types of different dot sizes to create a smooth gradation in the image, considerably improving the quality of the printed photograph.

Epson’s Design Challenge

As Epson created the target specification for this next generation of printer SoCs, they were faced with the daunting challenge of enhancing the printing capability many-fold. These next-generation requirements meant that the design complexity and size of the SoC would also increase significantly. Furthermore, the increased design complexity would have an even larger impact on the verification effort. Clearly, silicon scaling would not be enough to achieve the performance targets.

“We were faced with the possibility of having to increase our design and verification team and potentially also increasing the development time to produce the next-generation SoC," said Katsuhiko Nishizawa, General Manager of the IJP Design Department of Epson’s Imaging Products Operations Division.

Epson wanted this architecture to form the basis of printers for several years. This meant that the architecture had to be flexible enough to be upgraded to new imaging algorithms – particularly half-toning algorithms that are being improved frequently. The Epson team also wanted the SoC to have some headroom to deploy more complex algorithms in the future.

At first, Epson considered designing a multi-million gate chip completely using hardwired RTL blocks. However, this solution would not give them the flexibility and programmability they needed and would require a very large verification effort. They then found that the Xtensa® processor offered them a design alternative to using hardwired RTL. Using a processor would give the flexibility they needed and the extensibility of the Xtensa architecture meant that they could still create a solution that matched the high performance of a hardwired architecture.

 

Epson's subsystem

Figure 1: Architecture of Epson's REALOID Printer SOC with multiple Xtensa Processors communicating using FIFO interfaces (TIE Queues) and GPIO (TIE Ports)

The Solution: Multiple Xtensa Configurable Processors

"The Xtensa processor provided us a hardware implementation alternative to using hardwired RTL that was both flexible enough and high performance enough to meet and exceed our next-generation requirements," said Shuji Ohsutaka, Assistant Manager of the IJP Design Department of Epson’s Imaging Products Operation Division. The result was the REALOID printer SoC in which multiple Xtensa LX DPUs constitute the image processing pipeline. “A multi-core Xtensa solution also gives us the scalability that we need for future designs. If we want to increase the functionality or throughput in future designs, we can just add more Xtensa processors."

TIE Compiler

Figure 2: The TIE Compiler takes as input the TIE specification of the designer-defined datapath and instruction extensions and immediately generates an updated compiler toolchain, ISS, stem simulation models, etc.

Implementing High-Performance Datapaths Using TIE Extensions

Epson realized early on that they could implement nearly any hardware architecture they could implement using hardwired RTL by using the Xtensa DPUs. A designer can specify multi-cycle complex execution units, register files, processor interfaces, etc. using the Cadence® Tensilica® Instruction Extension (TIE) language. For half-toning algorithms, for instance, it is possible to create a single complex instruction that computes the error that has to be diffused to all the adjacent pixels (for example, the adjacent four pixels in the classical Floyd-Steinburg error diffusion algorithm). Similarly, a designer can create a SIMD adder that operates on two 16-bit values in a single cycle for algorithms such as DCT. Designers can also specify register files with any number of registers and of any width. The Epson design team used all these powerful features of TIE to create an optimized task engine for each different task in the image processing pipeline.

They were then able to compile the TIE descriptions of each of their new datapath elements using the TIE compiler, which updates the entire compiler toolchain and simulation models to make them aware of these new extensions (as shown in Figure 3). The new TIE instruction can be used in application code as a C/C++ intrinsic. The compiler not only recognizes the new instruction, but also schedules it – this means that the Epson design team did not have to write any assembly code for the new instructions they specified. Similarly, they were able to use the updated debugger to debug their new instructions and to view the contents of the register files they specified. The instruction set simulator (ISS) is also updated so it simulates the new instructions in a cycle-accurate manner.

Using FIFO Interfaces and GPIO to Communicate

"Xtensa gives us the unique ability to instantiate TIE Queue and TIE Port interfaces on the processors. These interfaces are exactly analogous to the FIFO interfaces and GPIO that we would have used if we had implemented the image processing pipeline using hardwired RTL. In traditional processors, we were limited by the bandwidth and throughput of the system bus. With Xtensa processors, there is virtually unlimited bandwidth to do much more natural communication between processors and other blocks in the SoC," said Shuji Ohtsuka.

The Xtensa processors in the REALOID chip communicate with memories and each other using a conventional system bus plus FIFOs implemented using Cadence's unique TIE Queue interfaces. The Xtensa processors also send control communications to each other using TIE Ports, which are equivalent to having GPIO on the processor core. These Queue and Port interfaces are accessed directly from the datapath by using instructions like “Pop_Queue" and “Write_Port". The bandwidth and throughput of the data that can flow through the Xtensa processors, therefore, is not limited by the load/store unit and the system bus.

Lower Verification Effort Compared to RTL Design

By using TIE to implement complex datapaths instead of manually writing Verilog or VHDL for a hardwired RTL block, Epson was able to greatly reduce its verification effort. In fact, Epson wrote a C golden model of all their TIE extensions and was able to co-verify the C model with the TIE extensions.

Shuji Ohtsuka said, "Unlike manually writing Verilog or VHDL, with Xtensa processors we only had to specify the functionality of the execution unit we wanted. Tensilica’s tools automatically generated pre-verified RTL for the entire processor with our new instructions, execution units, and register files along with all the associated bypass and control logic. Our verification effort was reduced down to just verifying the input-output functional behavior of the extensions we specified in TIE. We did not have to increase the size of our verification team even though our design was more than twice as large as the previous generation chip."

The Epson design team also created a system model using the C-based XTMP modeling environment provided by Cadence. This system model was useful for early system exploration, performing trade-offs between different architectures, and doing software development before the SoC was actually manufactured. The team was able to quickly verify the hardware architecture and also get an early start in software development.

Summary

Epson set an ambitious goal to leapfrog the current generation of printer imaging technology and create a printer that is a class apart from its competitors. They were able to achieve this goal by architecting a multi-core Xtensa processor-based design as an alternative to using a hardwired RTL approach. Using this methodology, they were able to achieve their high performance and flexibility targets and, at the same time, have a much shorter development time and verification effort.