# Nonvolatile and Energy-Efficient FeFET-Based Multiplier for Energy-Harvesting Devices 

Mengyuan $\mathrm{Li}^{* 1}$, Xunzhao Yin ${ }^{\dagger}$, Xiaobo Sharon $\mathrm{Hu}^{*}$, Cheng Zhuo ${ }^{\dagger}$<br>*Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556<br>${ }^{\dagger}$ College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China 310027<br>$\{$ mli22, shu $\}$ @nd.edu, $\{x z y i n 1$, czhuo $\}$ @zju.edu.cn


#### Abstract

Energy-harvesting internet-of-things devices must deal with unstable power input. Nonvolatile processors (NVPs) can offer an effective solution. Compact and low-energy arithmetic circuits that can efficiently switch between computation and backup operations are highly desirable for NVP design. This paper introduces a nonvolatile ferroelectric field-effect transistors (FeFET)-based sequential multiplier with the ability to do continued calculation after a power outage, thus achieving zero backup overhead. We exploit the unique characteristics of FeFETs to construct key components of a sequential multiplier. The multiplier relies on a FeFET-based adder and a new FeFETbased latch to achieve compact area and low operating energy. Moreover, it uses the hysteretic characteristic of FeFETs to realize the storage capability, and hence is able to store, at no extra cost, the intermediate data of an operation in a nonvolatile manner. This property provides support for continued computation when power supplies may be intermittent. Simulation results show that, assuming the same technology node, the proposed FeFET-based multiplier saves up to $21 \%$ and $19 \%$ area than a conventional CMOS-based sequential multiplier of 4 -bits and 8 bits, respectively. It also saves $\mathbf{3 2 \%}$ and $\mathbf{7 3 \%}$ less area compared with a CMOS-based array multiplier. Furthermore, the proposed design can offer up to $\mathbf{3 2 \%} / \mathbf{2 3 \%}$ energy saving per operation compared with a 4/8-bit CMOS-based sequential multiplier.


## I. Introduction

With the rapid growth of Internet of Things (IoT) devices [1], [2], energy-harvesting (EH) systems are gaining much attention due to their advantages over battery-powered systems in many power-constrained environments such as bodyarea devices. EH IoT devices hold high promise to perform essential tasks in the huge IoT market. However, some issues including the low conversion efficiency of energy sources and the unstable input power to EH systems still severely limit their deployment. Nonvolatile processors (NVPs) are a very promising alternative to address these challenges, and have gained significant attention in recent years. As a cross-layer approach, a NVP needs progress from multiple aspects including the architecture and circuit level. Although architecturelevel work has been studied in prior work [3], [4], circuit-level work that enables the NVP system to execute more efficiently and support continued execution is rather insufficient.

The unique properties of EH systems put new demands on the circuit-level design of NVP. On the one hand, the NVPs should be energy efficient under the tight resource constraints of EH systems. On the other hand, the ability to backup

[^0]data and then continue execution before and after a power outage respectively is highly desirable due to the existence of intermittent power. These demands motivate the design of novel circuits that seamlessly merge computation and backup operations.

In this paper, we focus on a nonvolatile (NV) arithmetic component, multiplier, that can naturally support continued calculation. Various NV adders based on different emerging devices have been studied and proposed [5], [6], [7], but there are few studies on NV multiplier which is an essential part of the arithmetic unit. The design of a NV multiplier is more challenging due to its more complex operation than addition.

Our NV multiplier is based on the ferroelectric field-effect transistor (FeFET) [8], [9], and leverages the unique characteristics and inherent non-volatility of FeFETs. FeFET is a NV device with a ferroelectric (FE) layer integrated on the gate stack of a MOSFET. Due to the hysteretic behavior, FeFET can work as both a NV storage element and a switch. This three-terminal device exhibits (i) compatibility with CMOS technology [10], and (ii) high $I_{O N} / I_{O F F}$ ratio with hysteresis [11]. The unique properties of FeFET enable its combination with the conventional CMOS to offer a promising solution to achieve both energy efficiency and non-volatility.

We propose a compact and energy-efficient FeFET-based NV multiplier by exploiting the characteristics of FeFETs. The major contributions of the work are summarized as below:

- We analyze different multiplier design structures and propose a FeFET-based sequential multiplier which combines the advantages of both FeFET and CMOS to achieve non-volatility without performance degradation.
- We introduce a new FeFET-based latch with only six transistors, and enable intermediate data preservation on demand by utilizing the storage property of FeFETs.
- A NV full adder (FA) is redesigned based on the FeFETbased FA proposed in [5] to offer support for the frequent shift-and-add operation in the proposed multiplication scheme because the design in [5] lacks support for writing FeFET during operation.
- We develop multiple NV circuit structures in the proposed multiplier design to ensure its non-volatility, which is particularly beneficial to edge devices with intermittent power supply.
The proposed multiplier is evaluated via detailed SPICE simulations for various combinations of input widths and architecture. The simulation results show that for 8-bit multiplication, our proposed FeFET-based multiplier can save $19 \% / 73 \%$ area overhead and $23 \% / 46 \%$ energy consumption


Fig. 1. (a) Equivalent circuit and physical structure of a FeFET. (b) FeFET $I_{d s} \times V_{g s}$ curves showing hysteresis behavior. $I_{d s}$ current varies according to $V_{d s}$ [5]. (c) FE material parameters.
compared with the conventional CMOS-based sequential and array multipliers, respectively.

## II. Background and Related Works

This section briefly reviews the FeFET basics, the simulation model used in this work, and related circuit design efforts.

## A. FeFET Device

With the discovery of the FE behavior in the thin films of SiO 2 doped hafnium oxide (HZO), research in integrating HZO in the gate stack of a MOSFET is becoming ever vibrant. Fig. 1(a) depicts a possible device with such a gate stack as well as its equivalent circuit. Depending on the thickness of the FE layer, the device structure can exhibit one of two operating modes: (i) steep switching mode promising sub $60 \mathrm{mV} /$ decade switching, where the corresponding device is referred to as negative capacitance field-effect transistors (NCFET), (ii) hysteretic switching mode offering NV storage property, where the corresponding device is referred to as FeFET. The coupling between the FE capacitance $\left(C_{F E}\right)$ and the underlying transistor capacitance $\left(C_{M O S}\right)$ causes the different behaviors .

In this paper, we are interested in exploiting the hysteretic switching mode, i.e., FeFETs. The hysteresis behavior arises from the polarization phenomenon of FE in which one of the two states (logic '0' or logic ' 1 ') is preserved in the device even without the power supply, and the device state can be switched by different suitable $V_{g s}$. Fig. 1(b) [5] shows the device characteristics of $I_{d s} v s . V_{g s}$, and the hysteresis loops are clearly seen.

## B. Simulation Model

The simulations in this paper use a physics-based circuitcompatible SPICE model for FeFETs based on the timedependent Laudau Khalatnikov (LK) equations [12]. The model enabling efficient design and analysis has been widely employed in the circuit design using FeFETs [13], [14], [15], [16]. It supports either $45 \mathrm{~nm}, 22 \mathrm{~nm}$ or 10 nm predictive technology models (PTMs) [17] for the baseline transistor. The coefficients parameters such as $\alpha, \beta$ and $\gamma$ for the FE layer in the LK equations [12] are summarized in Fig. 1(c), and are calibrated to the experimental data on HZO material.


Fig. 2. (a) Schematic of a FeFET-based latch and (b) its function table [13].

## C. Related Work

Recent years have witnessed growing interests in using FeFETs to design a variety of NV memory cells (e.g., [18], [19], [11], [20], [21]). Moreover, lots of work have investigated memory-based logic circuit design in which the FeFET serves as both a switch and a NV storage element (e.g., [5], [14], [16], [22], [23]). Since the focus of this paper is closely related to arithmetic, logic and latch designs, we review these works in more detail.

Due to the inherent storage capability of FeFETs, NV latch and flip-flop (FF) designs are possible. Reference [13] proposes a FeFET-based NV latch by adding a few transistors to the conventional CMOS-based latch, as shown in Fig. 2(a) with its function table in Fig. 2(b). The circuit operates as a normal latch when power is on, yet retains the state in the FeFETs if the supply is removed.
Though FeFETs can be used as a drop-in replacement of MOSFETs in arithmetic/logic circuits, doing so generally is not beneficial in terms of performance, energy and area. Careful exploitation of FeFET's unique properties can offer advantages that are costly to obtain with MOSFETs alone. In [5], [14], the authors introduce logic-in-memory (LiM) based design of Boolean logic gates as well as FAs following the dynamic logic (DL) and dynamic current mode logic (DyCML) styles, as shown in Fig. 3. This work demonstrates the advantages of FeFETs in building NV logic circuits over other emerging devices such as MTJ and FTJ.

## III. FeFET-based Multiplier

This section discusses the details of our FeFET-based multiplier design. We first present the overall structure of the multiplier, and then elaborate the design of the key components.

## A. Overview of Multiplier Architecture

An appropriate multiplier struture is crucial to satisfy the demands of the circuit-level design of NVP. Many research efforts [24], [25], [26] have been devoted to improving performance and power efficiency of CMOS-based multipliers. Parallel multipliers with tree structures [24] are widely used


Fig. 3. Schematics of FeFET-based FA using (a) DL; (b) DyCML styles.


Fig. 4. Overview of the proposed FeFET-based multiplier.
in high speed applications but incur substantial area overhead. A radix multiplier [25] consumes less area but at the cost of higher power consumption. Unlike the two, sequential multipliers [26], e.g., the shift-and-add multiplier, target less area and energy while operate at moderate speed. The shift registers in a sequential multiplier provide the possibility to be implemented with NV devices to achieve low-cost or even implicit backup operation. Thus, we propose a new FeFETbased NV multiplier using the sequential architecture and exploit the unique features of FeFETs to achieve less area overhead and higher energy efficiency than traditional CMOSbased designs.

The proposed sequential multiplier design aims to meet the following requirements: (i) ensuring non-volatility throughout the entire design, and (ii) optimizing for energy efficiency and design compactness. To support continued operation under intermittent power supply without the need for explicit backup, we leverage the NV property of FeFETs to design a NV adder and a NV latch. By utilizing the adder and the latch, the proposed multiplier realizes the accumulation and shift operations with less transistors compared with a CMOSbased design. The details of the accumulation and the shift operations will be discussed in Sec. III-B and Sec. III-C.

Fig. 4 depicts the architecture of the proposed multiplier and its data flow for the case $N=M=4$. For an $(N \times M)$ bit multiplication, one operand ( $N$-bit) is fed in parallel while the other one ( $M$-bit) sequentially. Thus we need $N$ AND gates, one $N$-bit adder ( $N-1$ one-bit FA and 1 one-bit half adder (HA)), and a shift unit to complete the operation. The operation has three stages: partial product (PP) generation, PP accumulation, and intermediate sum (IS) shift. According to the data flow, at a particular clock cycle $i$ :

- PP Generation stage relies on the AND gates to produce the partial product of $X$ and $y_{i}$ (where $X$ represents the multiplicand and $y_{i}$ is the $i$-th bit of multiplier $Y$ );
- PP Accumulation stage then uses the N -bit PP as one operand for an N -bit FeFET-based FA to add up with the intermediate sum preserved from the previous cycle.


Fig. 5. Operation of FeFET-based multiplier.


Fig. 6. Simulation waveforms of the proposed FeFET-based multiplier. The top five bits are shown in the waveforms.

When clock is high, the new intermediate sum $((\mathrm{N}+1)$ bit) is generated;

- IS Shift stage shifts the sum of each 1 -bit FA to the $B$ input of the adder on the right while the $B$ input of the left most (most significant bit) FA gets the its own carryout. The sum of the right most (least significant bit) HA is shifted to the register. The new $B$ values will be one of the operands in the next cycle.
The complete operation flow for the procedure above is also illustrated in Fig. 5. The circuit is initialized by the $R S T$ signal. When $R S T$ is asserted to ' 1 ' and $C L K$ is at low, logic ' 0 ' is written into the FeFETs in the FAs. When $C L K$ goes high, the FAs are at the evaluation stage and the multiplier conducts accumulation. After that, when $C L K$ is low again, the multiplier shifts the PP and repeats the aforementioned procedure. Fig. 6 shows the simulation waveforms of the proposed multiplier design with a $(4 \times 4)$-bit example. Here $S_{k}$ denotes the sum $S$ for the k-th adder, and $C_{o, I V}$ denotes $C_{o}$ for FA-IV, as shown in Fig. 4. Two input cases are shown in Fig. 6: one is with more non-zero carry-ins $(111 \times 1111)$ while the other with very few carry-ins ( $1010 \times 0101$ ). It can be readily seen that the multiplier operates correctly. In the next subsection, we present the details of the key components in the architecture.


## B. Circuits for Partial Product Accumulation

This subsection discusses the detailed circuit design for the PP accumulation stage, which relies on a FeFET-based FA for accumulation and a write module for frequent writing. A reset module is also introduced to reset the FeFETs at the beginning of computing.

1) FeFET-based NV-FA: References [5], [14] introduce two FeFET-based FA designs for compute-in-memory operations, as shown in Fig.3. However, direct adoption of the existing FA designs is not feasible because of the following three issues: (i) Different FA designs impact the performance of the proposed multiplier and require detailed comparison. (ii) The topology in Fig. 3(b) may induce short circuit during writing and incur substantial leakage power, as shown in Fig. 7(a). (iii) In the multiplier design, both operands to the adder may change from one cycle to the next. But in the previous work, the data is written in FeFET in advance which lacks support to write FeFETs during operations. Thus, we need a robust scheme to write FeFETs during operations.

Regarding the first issue, for the sequential multiplier design, the clock speed is determined by the latency of the combinatorial logic as well as the time to write FeFETs. The FA design may easily affect the performance of the proposed multiplier. Due to the FeFET's physical characteristics, substantial time is needed to write data into the FeFETs (500ps to completely change the polarization state of the FE layer for the model used in this work and [14]). Therefore. the DL adder cannot stabilize its output signal till the FeFETs’ polarization status is fully changed, which penalizes circuit performance. Fig. 7(b) depicts the simulated waveform for the DL FA in [5], where the third waveform shows DL FA output when the clock cycle is shorter than required, resulting in incomplete polarization and incorrect result. To achieve fast operation and energy efficiency, we choose the DyCML adder with two complementary branches and redesign it to address the aforementioned concerns.
Fig. 8 depicts the proposed FeFET-based DyCML NV-FA. Similar to the original DyCML design in Fig. 3, it consists of a clocked pull-up network, a logic network for accumulation function and a pair of FeFETs distributed in the pull-down network for preserving complementary input data. The NVFA operates in two phases: precharge and evaluation. In the precharge phase, $C L K$ is pulled low and the outputs are pulled up to $V_{d d}$. In the evaluation phase, with $C L K$ set to high, the branch with lower resistance is quickly pulled down to ground, thereby pulling the other branch up to logic ' 1 '.

Compared with DL based FA in Fig. 3(a), the proposed DyCML FA can stabilize the output much faster by differentiating the signal strengths of the two branches. The bottom waveforms of Fig. 7(b) demonstrate the faster polarization of the proposed NV-FA. Since the FeFETs are written during the precharge phase, the four clocked transistors (the transistors in red on top of the FeFETs in Fig. 8) are inserted to the pull-down path to mitigate the short circuit current. Fig. 7(a) shows the reduction in leakage for the proposed design.
2) Write Module: The redesigned NV-FA only partially solves the aforementioned problem of writing FeFETs. In order to ensure fast and stable write-in, we propose an additional write module for the NV-FA and combine the write module with the pull-down path to support frequent writing of the FeFETs. The write circuit, highlighted by the blue dotted box in Fig. 8, consists of two NOR gates. The inputs are a pair


Fig. 7. (a) Leakage current of the FeFET-based DyCML FA [5] (left) and the proposed NV-FA (right); (b) Waveforms for FeFET-based DL FA (middle) [5] and the proposed NV-FA (bottom) with incomplete polarization.
of complementary values and the clock signal. The outputs are connected to the gate and source terminals of each FeFET to write either the value itself or its complimentary. Table. I shows the function of the circuit. When $C L K$ is ' 0 ' with NVFA in the precharge phase, the write circuit outputs a pair of complementary values to write the FeFETs. When $C L K$ is ' 1 ' with NV-FA in the evaluation stage, both of the outputs are pulled down to ground. Such a write module is crucial to our multiplier design which needs to frequently write the FeFETs. On the one hand, the design keeps the $V_{g s}$ of the FeFETs to be '0' and ensures the FeFETs to retain their stored data. On the other hand, it serves as a part of the pull-down path during the evaluation stage.

TABLE I
Function table of the proposed write module

| Input | Input_b | $C L K$ | $Y$ | $Y \_b$ | Operation | FA status |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 1 | 0 | 0 | 1 | Write '0' | Precharge |
| 0 | 1 | 1 | 0 | 0 | Retain | Evaluation |
| 1 | 0 | 0 | 1 | 0 | Write ' 1 ' | Precharge |
| 1 | 0 | 1 | 0 | 0 | Retain | Evaluation |

3) Reset Module: Due to the NV property of FeFETs, the devices preserve the last value written after each multiplication. It is necessary to initialize the FeFETs before a new multiplication starts. We add a reset module in series with the write module. It is a NOR gate consisting of four transistors, controlled by the reset signal, $R S T$. The module outputs logic ' 0 ' at the beginning of a multiplication.

## C. Circuit for IS Shift

As we alluded to above, the latch in the proposed multiplier replaces the shift register in the traditional CMOS-based design to assist the shift operation. A CMOS-based latch can save the data temporarily but would lose the data without power supply. Though there exist several FeFET-based NV latch designs, they are for general purpose usage. In this work, we introduce a special FeFET-based latch design that takes advantage of the hysteresis characteristic of the FeFET and the output scheme of the DyCML NV-FA. The new latch not only preserves the data in a NV fashion but also requires fewer transistors (only six transistors compared to twelve transistors in the latch design of [13]).

The schematic of our proposed latch is shown in Fig. 9(a). The module consists of an n-type MOSFET and an n-type FeFET. The gate and source terminal of the FeFET are connected to inverted $S \_b$ and $S$ from the DyCML NVFA, respectively. The FeFET is connected in series with the


Fig. 8. Schematic of the redesigned FeFET-based NV-FA.

(b)

| S_b | S | Status | S_out |
| :---: | :---: | :---: | :---: |
| 0 | 1 | Positive polarization | 0 |
| 1 | 1 | Retain (Positive) | 0 |
| 1 | 0 | Negative polarization | 1 |
| 1 | 1 | Retain (Negative) | 1 |

Fig. 9. (a) Schematic of the proposed latch and (b) its function table.

MOSFET to drive the output node to the appropriate value depending on the data stored in the FeFET. Recall that the state of FeFET is controlled by $V_{g s}$, which in this cases equals the relative value of $S$ and $S_{-} b$.

In the evaluation phase, $S$ and $S \_b$ are complementary. Depending on the actual sum value, the following scenarios can occur. (i) If $S=1$, the gate voltage of the FeFET $\left(V_{g}\right)$ is driven to $V_{d d}$ and the source voltage is ' 0 '. Hence, the polarization state of the FeFET is positive, which leads to the low resistance state of FeFET. (ii) Conversely, if $S=0$, the FeFET goes into the high resistance state. (iii) When the adder goes into the precharge phase, both $S$ and $S \_b$ are pulled up to $V_{d d}$. Thus, the FeFET retains its polarization because the $V_{g s}$ is ' 0 '. If $S=1$ in the previous stage and the FeFET is at low resistance state, $S_{-}$out is pulled down to $G N D$. If the FeFET is at high resistance state, the node is driven close to $V_{d d}$ due to the physical characteristic that the high resistance of the FeFET is higher than the resistance of the closed NMOS transistor. Thus, the data stored in the FeFET can be sensed at the output node.

Note that the NMOS transistor in the proposed latch is used to sense the data stored in the FeFET with its gate terminal connected to the $C R L$ signal. In normal operations, $C R L$ is ' 0 ' making the NMOS closed, which ensures a large resistance between the power supply and the FeFET to avoid large leakage current during sensing. Upon power restoration, a pulse is applied to $C R L$ to recover the data stored in the FeFET. As for the two inverters in Fig. 9(a), during sensing, the inverted signal of $S$ is ' 0 ' providing a path to ground to ensure the latch functionality. Also, the inverters work as buffers to drive the latch and isolate the possible leakage current between the NV-FA and the proposed latch.

The simulation waveforms in Fig. 10 illustrate the operations of the proposed latch under both conditions, i.e., normal operation and sudden power failure. During a normal operation, the circuit behaves as a typical latch. However, if there is a power failure, indicated by the shaded regions in Fig. 10, the output drops down while the polarization of the FeFET retains. As demonstrated in the waveforms, the output is recovered when power comes back.

Compared with the traditional CMOS-based designs, such as the transmission gate based latch, the proposed latch has lower clock loads and provides non-volatility. Compared with the FeFET-based latch in [13], which also supports NV computing, our design has only half of the number of transistors (6 vs. 12) and saves almost $40 \%$ energy. When used in the proposed 4-bit multiplier as a NV latch, the energy per operation is reduced from $36.50 \mu W$ to $20.23 \mu W$.


Fig. 10. Simulation waveforms of the proposed latch. The waveform of 'P' indicates "polarization" of the FeFET in the latch.

## D. Non-volatility

We now discuss the circuit-level NV support of the multiplier under intermittent power. As described in Sec. III-A, the unique structure and operation allow the intermediate values to be stored in the FeFETs in a NV fashion, which benefits from the FeFET-based NV adders and NV latches. Specifically, when $C L K$ is high, the multiplier performs the accumulation operation. If the supply is removed at this time, it is guaranteed that the intermediate sum of the previous cycle is preserved in the FeFETs of the adder. If the power failure happens when $C L K$ is low, the NV latches store the intermediate sum. Therefore, the design ensures that the temporary data is never lost regardless when a power outage occurs. The NV multiplier can work with unreliable power supply with the ability to preserve the state of the multiplier upon power failure and to resume the operation from where it has stopped.

## IV. Experimental Results

In this section, we investigate the performance and energy efficiency of the proposed FeFET-based multiplier design and compare with two multipliers with different design preferences: (i) A CMOS sequential multiplier using dynamic current mode logic (SM-DyCML); (ii) A CMOS array (parallel) multiplier using static complimentary logic (AM-SCL) which is a widely used array multiplier [25]. Both 4-bit and 8bit implementations are considered. We use the 45 nm ASU PTMs [17] for both CMOS and FeFET based designs with $1 V$ nominal supply. The device parameters for the FeFET are the same as in Fig. 1(c). The simulations were conducted with HSPICE. The metrics for comparison include transistor count, maximum frequency, time per operation, average power and total energy consumption. Maximum frequency is determined by the critical path length. Time per operation is the total latency for one multiplication. We collect the average power from the SPICE simulation and then calculate the total energy consumed for one multiplication.
Table. II presents comparisons for different multiplier architectures and configurations. For a 4-bit multiplier, the proposed multiplier reduces transistor count by $21 \%$ and $32 \%$, and consumes $32 \%$ and $15 \%$ less energy consumption compared with CMOS SM-DyCML and AM-SCL, respectively. For an 8-bit case, the proposed design obtains $19 \%$ and $73 \%$ saving in transistor count, and $23 \%$ and $46 \%$ saving in energy consumption, respectively.

TABLE II
COMPARISON OF THE PROPOSED MULTIPLIER WITH CMOS BASED SM-DYCML AND AM-SCL MULTIPLIERS

| Device | \#Bit | Architecture | \#Transistor | Max freq (MHz) | Time per op. (ns) | $P_{\text {total }}(\mu W)$ | Energy $(f J)$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| FeFET | 4 | SM-DyCML | 222MOS+20FeFETs | 3300 | 1.2 | 20.23 | 24.28 |
| CMOS | 4 | SM-DyCML | 306MOS | 3300 | 1.2 | 29.49 | 35.39 |
| CMOS | 4 | AM-SCL | 356MOS | 2000 | 0.5 | 56.85 | 28.43 |
| FeFET | 8 | SM-DyCML | 458MOS+40FeFETs | 3300 | 2.4 | 39.43 | 94.63 |
| CMOS | 8 | SM-DyCML | $610 M O S$ | 3300 | 2.4 | 50.60 | 121.44 |
| CMOS | 8 | AM-SCL | $1840 M O S$ | 500 | 2.0 | 86.67 | 173.34 |

Comparing the FeFET-based proposed multiplier with the CMOS-based SM-DyCML, one can see that the proposed multiplier has a reduction in transistor count as well as energy consumption. The advantages are due to the fact that FeFETs are used as both storage elements and switches in the proposed design, thus the reset module and the FeFET-based shift module employ less transistors than those in the classic CMOS-based sequential multiplier.
We also compare our proposed design with the most commonly used CMOS-based AM-SCL. As shown in Table. II, the proposed design has smaller transistor count and energy consumption. It can be concluded that the sequential structure naturally outperforms the array multiplier in terms of transistor count and energy consumption while it is much slower. Since the EH scenario is less delay sensitive but requires higher energy efficiency, the results further support our choice of the sequential structure rather than the array structure in the EH scenario.


Fig. 11. Energy consumption and average power of the proposed multiplier at different clock frequency for (a) 4-bit (b) 8-bit.

As discussed in Sec. III-B, the use of DyCML logic style helps speed up execution operation. It is valuable to explore the relationship between energy consumption, power and operating frequency, as shown in Fig. 11 (a) and (b) for 4 -bit and 8 -bit, respectively. Apparently the average power increases with the clock frequency, while the total energy consumption decreases. Though the dynamic power is the main source contributing to the total energy, we notice that static power consumption is not negligible during operation. So it is essential to reduce the cycle time for energy efficiency.

## V. Conclusions

In view of EH systems with restricted energy, area as well as intermittent power supply, we propose an energy-efficient FeFET-based NV multiplier in this paper. The FeFET device with the hysteretic characteristic is exploited to construct key components in the multiplier to achieve a more compact structure and support NV computational operations at the circuit level. The proposed design consumes less area and energy than conventional sequential and array multipliers.

## Acknowledgement

This work was partially supported by NSFC with Grant No. 61974133, 61601406, Guangdong Province with Grant No. 2018B030338001, and Asian Research Grant from the University of Notre Dame.

## REFERENCES

[1] C. Zhuo, et al. Noise-aware dvfs for efficient transitions on batterypowered iot devices. IEEE TCAD, 2019.
[2] C. Zhuo, et al. From layout to system: Early stage power delivery and architecture co-exploration. IEEE TCAD, 2018.
[3] K. Ma, et al. Architecture exploration for ambient energy harvesting nonvolatile processors. In Proc. HPCA, pages 526-537, 2015.
[4] Y. Liu, et al. Ambient energy harvesting nonvolatile processors: From circuit to system. In Proc. DAC, pages 1-6, 2015.
[5] X. Yin, et al. Exploiting ferroelectric fets for low-power non-volatile logic-in-memory circuits. In Proc. ICCAD, pages 1-8, 2016.
[6] E. Deng, et al. High-frequency low-power magnetic full-adder based on magnetic tunnel junction with spin-hall assistance. IEEE Transactions on Magnetics, 51:1-4, 2015.
[7] E. Deng, et al. Low power magnetic full-adder based on spin transfer torque mram. IEEE Transactions on Magnetics, 49:4982-4987, 2013.
[8] S. Salahuddin et al. Use of negative capacitance to provide voltage amplification for low power nanoscale devices. Nano letters, 8:405410, 2008.
[9] A. Aziz, et al. Computing with ferroelectric fets: Devices, models, systems, and applications. In Proc. DATE, pages 1289-1298. IEEE, 2018.
[10] M. Trentzsch, et al. A 28 nm hkmg super low power embedded nvm technology based on ferroelectric fets. In Proc. IEDM, pages 1-4, 2016.
[11] S. George, et al. Nonvolatile memory design based on ferroelectric fets. In Proc. DAC, pages 1-6, 2016.
[12] T. K. Song. Lk simulations for ferroelectric switching in ferroelectric random access memory application. JKPS, 46:5-9, 2005.
[13] X. Li, et al. Advancing nonvolatile computing with nonvolatile ncfet latches and flip-flops. IEEE TCAS-I, 64(11):2907-2919, 2017.
[14] X. Yin, et al. Ferroelectric fets-based nonvolatile logic-in-memory circuits. IEEE TVLSI, 27(1):159-172, 2018.
[15] X. Chen, et al. Design and optimization of fefet-based crossbars for binary convolution neural networks. In Proc. DATE, pages 1205-1210. IEEE, 2018.
[16] D. Reis, et al. Computing in memory with fefets. In Proc. ISLPED.
[17] R. Vattikonda, et al. Modeling and minimization of pmos nbti effect for robust nanometer design. In Proc. DAC, pages 1047-1052, 2006.
[18] K. Ni, et al. Write disturb in ferroelectric fets and its implication for 1t-fefet and memory arrays. IEEE EDL, 39(11):1656-1659, 2018.
[19] D. Reis, et al. Design and analysis of an ultra-dense, low-leakage and fast fefet-based random access memory array. IEEE JXCDC, 2019.
[20] X. Li, et al. Design of $2 \mathrm{t} /$ cell and $3 \mathrm{t} /$ cell nonvolatile memories with emerging ferroelectric fets. IEEE Design \& Test, 36:39-45, 2019.
[21] J. Wu, et al. A 3t/cell practical embedded nonvolatile memory supporting symmetric read and write access based on ferroelectric fets. In Proc. DAC, page 82, 2019.
[22] X. Li, et al. Lowering area overheads for fefet-based energyefficient nonvolatile flip-flops. IEEE Transactions on Electron Devices, 65(6):2670-2674, 2018.
[23] X. Yin, et al. An ultra-dense 2 fefet tcam design based on a multi-domain fefet model. IEEE TCAS II: Express Briefs, 2018.
[24] N. Okubo. A 4.4-ns cmos $54 \times 54$-b multipler using pass-pransistor multiplexor. Proc. CICC, pages 559-602, 1994.
[25] I. S. Abu-Khater, et al. Circuit techniques for cmos low-power highperformance multipliers. IEEE JSSC, 31(10):1535-1546, 1996.
[26] Gnanasekaran. On a bit-serial input and bit-serial output multiplier. IEEE TC, C-32:878-880, 1983.


[^0]:    ${ }^{1}$ The work was done while the author was in Zhejiang University.

