# A 3.6-Gb/s 340-mW 16:1 Pipe-Lined Multiplexer using 0.18 μm SOI-CMOS Technology

Toru Nakura, Member, IEEE, Kimio Ueda, Kazuo Kubo, Yoshio Matsuda, Koichiro Mashiko, Member, IEEE, and Tsutomu Yoshihara

Abstract—This paper describes a 16:1 multiplexer using 0.18  $\mu$ m SOI-CMOS technology. To realize ultra-high-speed operations, the multiplexer adapts a pipeline structure and a phase shift technique together with a selector architecture. This architecture takes advantage of the small junction capacitances of the SOI-CMOS devices. The multiplexer achieves 3.6 Gb/s at a supply voltage of 2.0 V, while dissipating only 30 mW at the core circuit and 340 mW for the whole chip which includes the I/O buffers.

Index Terms—Multiplexer (MUX), PECL, phase shift, pipeline, selector, SOI.

#### I. INTRODUCTION

T HE FINER pitch LSI technologies make it possible to create system LSI's which integrate high-density logics with various functions such as analog circuits and memories on a chip. The system LSI's offer not only efficient data processing but also high-speed and low-power operations, because they can reduce the parasitic elements which exist between the chips in previous systems consisting of separate chips. The system LSI's would especially be suited for future telecommunication systems where large-scale data processing, analog signal processing and ultra-high-speed data transmission are required.

Multiplexers (MUX) are the key components in ultra-high-speed telecommunication systems such as synchronous optical network (SONET). Conventionally, only GaAs or bipolar devices have been used in this field because of their high-speed characteristics. However, GaAs devices are difficult to integrate with high-density logics. Although BiCMOS technology makes integration of high-density logics with other functions possible, it is difficult to optimize both bipolar transistors and CMOS transistors for high-speed, low-voltage/power operations. Therefore, CMOS technology is a well-qualified technology for ultra-high-speed MUX's to be used in future telecommunication systems.

As the CMOS technology has progressed into deep-submicron regions, CMOS devices have the ability to realize gigahertz operation chips [1], [2], especially for 2.488 Gb/s of OC-48 (STM-16) telecommunication systems. This paper describes a 16:1 MUX operating at 3.6 Gb/s, developed by

Publisher Item Identifier S 0018-9200(00)03017-1.



Fig. 1. The 4:1 selector circuit.

using our 0.18  $\mu$ m SOI-CMOS technology [3]. The focus of this paper is a MUX architecture to realize ultra-high-speed operations. In Section II, circuit design considerations are presented in detail. Section III describes the measurement results of the MUX. Section IV concludes this paper.

#### II. CIRCUIT DESIGN

## A. The 16:1 MUX Architecture

The speed and the power advantages of SOI-CMOS devices are much larger than bulk devices when the load capacitances of circuits are the source–drain capacitances [4]. The reason behind this is that SOI structure dramatically reduces the source–drain capacitance of transistors due to the buried oxide isolation. According to our parameter extraction, the drain–substrate junction capacitance ( $C_J$ ) of SOI is 0.08 fF/ $\mu$ m<sup>2</sup>, which is about 1/10 of bulk  $C_J$  of 0.8 fF/ $\mu$ m<sup>2</sup>. Therefore, selector type MUX architectures are especially suitable for SOI-CMOS devices, because the main loads of the selector circuits are the source–drain capacitances of the pass transistors.

Fig. 1 shows the 4 : 1 selector circuit that we used in our 16 : 1 MUX. A SPICE simulation indicated that the SOI-CMOS selector circuit, which has 1/10 of the junction capacitance, operates 49% faster than the bulk selector circuit.

Fig. 2 shows the conventional 4:1 selector type architecture, and Fig. 3 shows the configurations of the 1/4 divider and the timing generator circuits. In the 1/4 divider, the outputs to the timing generator have a delay of  $2T_{\rm dff}$  from CLK through the two D-FF's. The outputs S1–S4 are additionally delayed by  $T_{\rm nor}$ 

Manuscript received July 26, 1999; revised December 2, 1999.

T. Nakura, K. Ueda, Y. Matsuda, and K. Mashiko are with the System LSI Development Center, Mitsubishi Electric Corporation, Hyogo, Japan.

K. Kubo is with the Information Technology Research and Development Center, Mitsubishi Electric Corporation, Kanagawa, Japan.

T. Yoshihara is with the Display Devices Business Division, Mitsubishi Electric Corporation, Kyoto, Japan.



Fig. 2. Block diagram of conventional 4:1 selector architecture.

through the timing generator, where  $T_{\rm dff}$  and  $T_{\rm nor}$  are the delays of the D-FF and the NOR gate, respectively. Then S1–S4 select the data at the 4 : 1 selector circuit and the selected signal SOUT is latched by CLK. Hence, the delay, from the clock input to the 4 : 1 selector output, should be within the one clock cycle. This delay limits the maximum operating frequency of the conventional MUX to

$$f_{\rm conv} \le 1/(2T_{\rm dff} + T_{\rm nor} + T_{\rm sel} + T_{\rm setup}) \tag{1}$$

where  $T_{sel}$  is the delay of the 4 : 1 selector and  $T_{setup}$  is the setup time of the D-FF.

In our proposed configuration, shown in Fig. 4, the D-FF's are inserted, between the 1/4 divider and the timing generator, and between the timing generator and the 4:1 selector, to construct the pipeline structure to shorten the critical path in the MUX compared to the conventional configuration. The path delay of the 1/4 divider stage, the timing generator stage and the 4:1 selector stage are " $2T_{dff2} + T_{setup}$ ," " $T_{dff1} + T_{nor} + T_{setup}$ ," and " $T_{dff2} + T_{setup}$ ," respectively. In our design,  $T_{dff1} = 118$  ps,  $T_{dff2} = 63$  ps,  $T_{setup} = 49$  ps,  $T_{sel} = 32$  ps, and  $T_{nor} = 63$  ps, where  $T_{dff1}$  is the delay of the D-FF before the NOR gate having heavy load, and  $T_{dff2}$  is the delay of other D-FF's having small load capacitances. Thus the critical path in our MUX is the timing generator stage, where the maximum operating frequency is boosted up from  $f_{conv} = 3.1$  Gb/s in (1) to

$$f_{\text{pipeline}} \le 1/(T_{\text{dff}} + T_{\text{nor}} + T_{\text{setup}}).$$
 (2)

This is 4.3 Gb/s in our design.

Fig. 5 shows the D-FF used in our MUX. The D-FF uses the dual-rail configuration consisting of pass transistors with reduced transistor count to take advantage of the SOI's small junction capacitances [5].

In addition to the pipeline structure, the phase shift technique is also employed to our MUX. Fig. 6(a) shows the conventional timing chart without the phase shift technique. All the parallel input data, D1–D4, are latched at the falling edge of the divided clock CLK/4. D1"–D4" are input to the selector circuit at the same timing. The select signals, S1–S4, converts from the parallel input data, D1"–D4", to the serial output data SOUT



Fig. 3. Configurations of divider and timing generator circuits.



Fig. 4. Block diagram of proposed 4:1 selector architecture.

at the selector circuit. However, when using this method, the timing margin between the input data and the select signal decreases by the transient point of the input data as shown in Fig. 6(a). To avoid this situation, several ideas have been proposed to shift the phase of the input data, for example, by using additional half latches [6], or by using additional D-FF's with a delayed trigger clock [7]. In our MUX, the phase shift is accomplished only by exchanging CLK/4 and CLKB/4 for the upper two D-FF's latching D1 and D2. This means D1 and D2 are latched at the rising edge of the divided clock to shift their phase by the half-period of the divided clock CLK/4, as shown



Fig. 5. Circuit configuration of D-FF.



Fig. 6. Timing charts of (a) conventional and (b) phase shifted circuits.

in Fig. 6(b). This method does not need any additional circuits compared to earlier ideas.

By employing these architectures in two stages, a 16:1 MUX is realized as shown in Fig. 7. Since the low-speed MUX's operate at a quarter of the speed of the high-speed MUX, the lowspeed MUX's have two different points from the high-speed MUX. The first difference is that the D-FF's before the timing generator were removed to reduce the power dissipation. The second difference is that all the external input data are latched in the D-FF's by the same timing in order to increase the phase



Fig. 7. Block diagram of two step 4:1 selector architecture.



Fig. 8. Input and output buffer configurations for PECL interface.

margin between the external input data and CLK/16. This wider phase margin is one of the important specifications of MUX chips for practical use.

#### B. I/O Buffer Design

High-speed signals require pseudo ECL (PECL) I/O buffers matching 50  $\Omega$  transmission lines. The PECL level for a supply voltage of 2.0 V are  $V_{\rm OL} \leq 0.3$  V and  $V_{\rm OH} \geq 1.1$  V.

Fig. 8 shows the circuit configurations of the input and the output buffers. The input buffer consists of two stages of an NMOS current mirror circuit in order to amplify the PECL small signal to full swing signal for the internal circuit. The output buffer consists of just inverter gates but satisfies the PECL level, because  $V_{OL} = 0$  V by NMOS pull-down transistor, and  $V_{OH}$  is decided by the ratio of the PMOS on-resistance and the 50  $\Omega$  termination resistance. When the PMOS on-resistance is adjusted to 40  $\Omega$ , the output is 1.1 V. The termination resistors are implemented on the chip at the high-speed input buffer (clock input) to satisfy the low reflection of the signals. At the 16 low-speed input buffers (data input), the termination resistors are attached outside the chip to prevent thermal problems with the chip. The high-speed



Fig. 9. Chip micrograph of  $16:1 \text{ MUX} (1.75 \text{ mm} \times 1.75 \text{ mm})$ .

input buffer uses differential signals, while the low-speed input buffers are single-ended with an external reference voltage  $V_{\rm BB}$ .

## **III. MEASUREMENT RESULTS**

#### A. Process Technology

The MUX was designed and fabricated using our 0.18  $\mu$ m SOI-CMOS technology. The transistors operate in the partially depleted (PD) mode with the floating-body condition. A shallow trench isolation (STI) technology isolates the adjacent transistors. The thicknesses of the SOI layer and the buried oxide are 100 and 400 nm, respectively. A chip micrograph of the MUX is shown in Fig. 9. The MUX contains about 1500 transistors integrated on a chip size of 1.75 mm × 1.75 mm. The high-speed circuits were gathered and placed near the bonding pads to shorten the high-speed signal wires. Several large capacitors ( $\geq$ 100 pF) were inserted between Vdd and GND lines in order to suppress the switching noise, especially near the output buffers. The transistor characteristics are listed in Table I.

# B. MUX Characteristics

Measurements were performed at on-wafer conditions using an RF-coaxial probing card connected to 50  $\Omega$  transmission lines. The 16:1 MUX operated up to 3.6 Gb/s consuming only 30 mW at the core circuit, and 340 mW including the I/O buffers, under a 2.0 V supply voltage in the room temperature. Fig. 10 shows the 3.6 Gb/s operating waveforms of the multiplexed output data and the corresponding clock input. All the inputs were fixed to "1" or "0." The data output repeated "1 011 101 010 110 010" during this input set.

The simulated maximum operation frequency of the core circuit is 4.3 Gb/s, as mentioned before. The discrepancy between

TABLE I TRANSISTOR CHARACTERISTICS

|                                     | PMOS                        | NMOS                        |
|-------------------------------------|-----------------------------|-----------------------------|
| Gate Length                         | $0.18 \mu m$                |                             |
| SOI layer thickness                 | 100nm                       |                             |
| BOX layer thickness                 | 400nm                       |                             |
| Supply Voltage                      | 2.0V                        |                             |
| $V_{th}$ (extrapolated $V_d=0.1V$ ) | 0.49V                       | $0.43\mathrm{V}$            |
| Drain Current                       | $220 \mu { m m}/\mu { m m}$ | $580 \mu { m m}/\mu { m m}$ |
| Mode                                | PD, Floating Body           |                             |
| $T_{pd}$ (inverter F.O.=1)          | 26ps                        |                             |



Fig. 10. Operating waveforms of multiplexed output data and corresponding clock input at 3.6 Gb/s, 2 V supply voltage.



Fig. 11. Supply voltage dependence of maximum operating frequency and corresponding power dissipation.

the simulation and the measurement can be explained by the I/O buffer capability. As can be seen in Fig. 10, the output buffer is not optimized enough to fully exploit the performance of the core circuit.



Fig. 12. Operating frequency dependence of power dissipation at V dd = 2.0 V.



Fig. 13. Output eye pattern and corresponding clock output at 2.488 Gb/s, 1.8 V power supply, in the plastic QFP.

Fig. 11 shows the supply voltage dependence of the maximum operating frequency and the corresponding power dissipation. Note that the power consumed by the 50  $\Omega$  termination resistors in the input buffer (CLK, CLKB input) is not included, but the termination resistors in the oscilloscope (DOUT, CLK/16, and CLK outputs) are included. Under a 2.0 V supply voltage, 3.6 Gb/s operation was achieved, and the corresponding power dissipation was 30 mW for the core circuit and 340 mW for the whole chip including the I/O buffers. Even in a low supply voltage of 1.0 V, the MUX operated up to 1.2 Gb/s, while dissipating only 2.2 mW without the I/O buffers.

Fig. 12 shows the operating frequency dependence of the power dissipation. The power dissipation slopes in the core and the whole circuits were 8.4 mW per Gb/s and 50 mW per Gb/s, respectively.

Fig. 13 shows the output eye pattern and corresponding clock output at the 2.488 Gb/s operation under 1.8 V, 10% Vdd drop condition, in the room temperature. The eye pattern was derived with the chip assembled in a 64-pin plastic quad flat package (QFP) condition. Both  $T_r$  and  $T_f$  from 10 to 90% were about 100 ps. The eye diagram indicates the applicability of this chip for high-speed (OC-48, STM-16) communication systems.

# C. Floating-Body Effect of SOI Transistors

We have to consider the delay time instabilities when using floating-body SOI-CMOS transistors. The delay time of circuits decreases as the operating frequency lowers because of the lower body-potential of the transistors at lower frequencies [8], [9]. During the operation of the MUX, the incoming data signals have various frequency ingredients due to their various data patterns, although the frequencies of the clock signals are constant. Therefore, the delay times of the data signal paths would change due to the floating-body effects, which would cause bit errors. However, our MUX operate synchronously with the clock signals and the data are latched at D-FF's. Also, when the data has continuous "H" and continuous "L" patterns, the timing margin between the data and the clock signal increases at D-FF's due to the lower frequency ingredients of the data. Thus, the delay time instability due to the floating-body configuration is not the problem in our MUX.

To confirm this, we actually measured the bit-error rate (BER) of the MUX using PRBS  $2^{23} - 1$  data, which includes 1–23 continuous L or H pattern. This means that the incoming signal has a rate of data change 23 times smaller compared with the condition of the noncontinuous alternate L and H pattern. The BER was less than  $10^{-12}$  at the supply voltage of 1.6–2 V at 2.488 Gb/s operating frequency, assembled in the plastic package condition in the room temperature.

#### **IV. CONCLUSIONS**

A high-speed and low-power 16:1 MUX fabricated by our 0.18  $\mu$ m SOI-CMOS technology has been demonstrated. To take advantage of the small junction capacitances of SOI-CMOS transistors, two-step 4:1 selector architecture was adopted. In addition, a multiple pipeline structure and a phase shift technique were also employed to realize high-speed operations. The MUX achieved 3.6 Gb/s operation consuming only 340 and 30 mW, with and without the I/O buffers, respectively. This result indicates that SOI-CMOS devices can replace GaAs and bipolar devices under 2.488 Gb/s of OC-48 (STM-16) standard and will integrate various functions on a chip for future ultra-high-speed telecommunication systems.

#### REFERENCES

- S. Yasuda, Y. Ohtomo, M. Ino, Y. Kado, and T. Tsuchiya, "3-Gb/s CMOS 1:4 MUX and DEMUX ICs," *IEICE Trans. Electron.*, vol. E78-C, pp. 1746–1753, Dec. 1995.
- [2] M. Kurisu, M. Kaneko, T. Suzaki, A. Tanabe, M. Togo, A. Furukawa, T. Tamura, K. Nakajima, and K. Yoshida, "2.8-Gb/s 176mW byte-interleaved and 3.0-Gb/s 118-mW bit-interleaved 8 : 1 multiplexers with a 0.15μm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 31, pp. 2024–2029, Dec. 1996.
- [3] T. Nakura, K. Ueda, K. Kubo, W. Fernandez, Y. Matsuda, and K. Mashiko, "A 3.6-Gb/s 340mW 16:1 pipe-lined multiplexer using SOI-CMOS technology," in VLSI Circuits Symp. Dig. Tech. Papers, June 1999, pp. 27–30.
- [4] K. Ueda, Y. Wada, T. Hirota, S. Maeda, K. Mashiko, and H. Hamano, "SOI/CMOS circuit design for high-speed communication LSI's," *IEICE Trans. Electron.*, vol. E80-C, pp. 886–892, July 1997.
- [5] T. Hirota, K. Ueda, Y. Wada, K. Mashiko, and H. Hamano, "0.5 V 320 MHz 8b multiplexer/demultiplexer chips based on a gate array with regular-structured DTMOS/SOI," in *ISSCC Dig. Tech. Papers*, Feb. 1998, pp. 188–189.

- [6] C. L. Stout and J. Doernberg, "10-Gb/s silicon bipolar 8:1 multiplexer and 1:8 demultiplexer," *IEEE J. Solid-State Circuits*, vol. 28, pp. 339–343, Mar. 1993.
- [7] M. Ouchi, T. Okamura, A. Sawairi, F. Kuniba, K. Matsumoto, T. Tashiro, S. Hatakeyama, and K. Okuyama, "A Si bipolar 5-Gb/s 8 : 1 multiplexer and 4.2-Gb/s 1 : 8 demultiplexer," *IEICE Trans. Electron.*, vol. E75-C, pp. 562–565, Apr. 1992.
- [8] I. J. Kim, H. O. Joachim, T. Iwamatsu, Y. Yamaguchi, Y. Inoue, K. Eikyu, K. Ishikawa, K. Ueda, H. Morishima, K. Mashiko, and H. Miyoshi, "Transient effects of SOI transistors in circuit operation," in *Proc. 7th Int. Symp. SOI Technology and Devices*, May 1996, pp. 397–405.
- [9] K. Ueda, H. Morinaka, Y. Yamaguchi, T. Iwamatsu, I. J. Kim, Y. Inoue, K. Mashiko, and T. Sumi, "Floating-body effects on propagation delay in SOI/CMOS LSI's," in *Proc. Int. SOI Conf.*, Oct. 1996, pp. 142–143.



**Toru Nakura** (M'99) was born in Fukuoka, Japan, in 1972. He received the B.S. and M.S. degrees in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1995 and 1997, respectively.

He joined the System LSI Laboratory, Mitsubishi Electric Corporation, Hyogo, Japan, in 1997. He currently belongs to the Advanced Circuit Design Section and is engaged in designing for SOI/CMOS highspeed communication circuits.



**Kimio Ueda** was born in Shimane, Japan, in 1961. He received the B.S. and M.S. degrees in electric engineering from Nagaoka University of Technology, Niigata, Japan, in 1984 and 1986, respectively.

He joined the System LSI Laboratory, Mitsubishi Electric Corporation, Hyogo, Japan, in 1986. There he was engaged in the research and development of high-speed and low-power circuit technologies for CMOS, bipolar, and BiCMOS LSI's. He currently belongs to the Advanced Circuit Design Section and is engaged in the research and development of

SOI-CMOS LSI's.

Mr. Ueda is a Member of the Institute of Electronics, Information and Communication Engineers of Japan and the Japan Society of Applied Physics.



**Kazuo Kubo** was born in Tokyo, Japan, in 1962. He received the B.S. and M.S. degrees in electrical engineering from Saitama University, Saitama, Japan, in 1985 and 1987, respectively.

He joined the Mitsubishi Electric Corporation, Kanagawa, Japan, in 1987, where he has been engaged in research and development of transport processing and optical communication systems. He currently belongs to the Lightwave Communication Department, Information Technology Research and Development Center, Kanagawa.



**Yoshio Matsuda** was born in Ehime, Japan, in 1954. He received the B.S. degree in physics and the M.S. and Ph.D. degrees in applied physics from Osaka University, Suita, Osaka, Japan, in 1977, 1979, and 1983, respectively.

He joined the LSI Laboratory, Mitsubishi Electric Corporation, Itami, Hyogo, Japan, in 1985. Since then, he has been engaged in the development of DRAM circuit design. He is currently working on the development of advanced logic circuit design.



Koichiro Mashiko (M'88) was born in Shizuoka, Japan, in 1952. He received the B.S. and M.S. degrees in physics from the University of Tokyo, Tokyo, Japan, in 1975 and 1977, respectively. He received the Ph.D. degree in electrical engineering from Osaka University, Osaka, Japan, in 1988.

He joined the LSI Laboratory, Mitsubishi Electric Corporation, Itami, Hyogo, Japan, in 1977. There he was engaged in the research and development of dynamic RAM's from 64 kbit to 4 Mbit, 256 kbit dual-port video RAM, cache controller chips, neural

network chips, and so on. In 1990, he transferred to the Headquarters Research and Development, Mitsubishi Electric Corporation, Tokyo, Japan. From 1991 to 1993, he was with Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA. In 1993, he transferred to the System LSI Laboratory, Mitsubishi Electric Corporation, Itami, where he was engaged in the research and development of high-speed logic circuits such as multipliers and Mux/Dmux chips, 1.9 GHz IF transceiver chips, and low-voltage/low-power circuit technologies including SOI devices. In 1998, he transferred to the System LSI Development Center via the ULSI Laboratory, both of Mitsubishi Electric Corporation, Itami. Currently, he is involved in the application of the high-speed, low-power, and low-voltage LSI circuit technologies.



**Tsutomu Yoshihara** was born in Takamatsu, Japan, in 1947. He received the B.S. and M.S. degrees in physics and the Ph.D. degree in electronic engineering, from Osaka University, Osaka, Japan, in 1969, 1971, and 1983, respectively.

He joined the Kita-Itami Works, Mitsubishi Electric Corporation, Itami, Japan, in 1971. From 1971 to 1995, he was engaged in the research and development of MOS LSI memories at ULSI Laboratory. In 1996, he transferred to the Display Devices Business Division, Kyoto, Japan, and is currently involved in

the development of the plasma display panels.