

# Design and Implementation of Low Power Consumption 32 Bit ALU using FPGA

# Liril George<sup>1</sup> and Padmaja Bangde<sup>2</sup>

<sup>1</sup>Liril George, Electronics, PIIT, Mumbai, New Panvel India.

<sup>1</sup>liril\_cj@yahoo.co.in

<sup>2</sup>Padmaja Bangde, Electronics, PIIT, Mumbai, New Panvel India.

<sup>2</sup>pbangde@mes.ac.in

#### **ABSTRACT**

An Arithmetic logic unit (ALU) is a major component of the central processing unit of a computer system. It does all process related to arithmetic and logic operations that need to be done on instruction words. As the operations become more complex the ALU also become more complex, more expensive and takes up more space in the CPU hence power consumption is a major issue. In this paper a 32 bit ALU is designed using VHDL. Lower power consumption is achieved using clock gating and the results are compared with 32 bit ALU without clock gating. The design is then implemented in Xilinx Spartan 3E FPGA.

Keywords — ALU, Clock Gating, FPGA, Spartan 3E, FPGA, CMOS, VHDL.

### 1. INTRODUCTION

With advancement in technology, the number of transistor count on a single CPU has increased. Integrating these transistors for power enhancement will also have an impact on power consumption because adding more and more transistor will give rise in the heat dissipated in the device [3]. Since most of the portable devices are battery driven the power consumption of these devices must be low so the battery life improves, reliability improves etc. Because of these reasons power management has become an important design constraints for most the computationally intensive and sophisticated applications. ALU is one of the most important units in a microprocessor and it performs most of the computational operation in a CPU and hence power consumption is an important issue in an ALU.

Two main types of power dissipations occurring in CMOS circuit are:

Static power is caused due to leakage current. Dynamic power is caused due to charging and discharging of capacitance or due to switching activities of circuit. Dynamic power is represented by an equation  $P = nfCLVdd^2$  where f is switching frequency, CL is the load capacitance, Vdd is supply voltage,

n is the probability of switching [1]. Dynamic switching power is dissipated every time the logic state of the gate changes.

# 2. CLOCK GATING

Clock power is a major component of dynamic power dissipation [2]. In a synchronous circuit several modules are clocked at the same time. However, at any particular instant only single module may be functional. Hence the unnecessary clocking of the other modules lead to a lot of power dissipation. In a clock gating clock is selectively stopped for a portion of circuit which is not performing any active computation. Hence the unnecessary charging and discharging of the unused circuit that do not perform any active computation is avoided. There are various methods of clock gating techniques such as Gate based clock gating, Flip-flop based clock gating and Latch based clock gating [4].

### 3, 32 BIT ALU

32 bit ALU consists of an arithmetic unit, a logic unit, a shift unit, clock gating unit and an output multiplexer.

#### 3.1. 32 Bit Arithmetic Unit

The Arithmetic unit performs 7 operations such as addition, addition with carry, subtraction, subtraction with borrow,



increment, decrement, transfer. The circuit consists of a 32 bit parallel adder and thirty two numbers of single bits 4:1 multiplexer. A and B is a 32 bit input and the output is 33 bit

result, there is 2 common selection lines S0 and S1, C\_in is carry input of the parallel adder and the carry out is Cout.

| S1 | S0 | C_in | Result      | Activation       |
|----|----|------|-------------|------------------|
| 0  | 0  | 0    | A+B         | Addition         |
| 0  | 0  | 1    | A+B+1       | Addition with    |
|    |    |      |             | carry            |
| 0  | 1  | 0    | A+(not B)   | Subtraction      |
| 0  | 1  | 1    | A+(not B)+1 | Subtraction with |
|    |    |      |             | borrow           |
| 1  | 0  | 0    | A-1         | Decrement        |
| 1  | 0  | 1    | A           | Transfer         |
| 1  | 1  | 0    | A           | Transfer         |
| 1  | 1  | 1    | A+1         | Increment        |

Table-1: Operation performed on Arithmetic Unit



Fig-1: 32 Bit Arithmetic Unit

# 3.2. 32 Bit Logic Unit

The Logic unit does the following tasks logical AND, logical OR, logical XOR, logical NOT and complement operation. The logic unit consists of four gates and a 4:1 multiplexer. The output of the gates is applied to the data inputs of the multiplexer. Using selection lines S0 and S1 one the data inputs of the multiplexer is selected as the output.

| S0 | S1 | Result | Operation |
|----|----|--------|-----------|
| 0  | 0  | A.B    | AND       |
| 0  | 1  | A+B    | OR        |
| 1  | 0  | В      | XOR       |

| 4 |   |   |       |              |
|---|---|---|-------|--------------|
|   | 1 | 1 | Not A | Complement A |

Table-2: Operation performed on Logic Unit



Fig-2: 32 Bit Logic Unit

# 3.3. 32 Bit Shift Unit

Shift unit is used to perform logical shift operations. Shift left shifts one bit to the left gives result which is original number multiplied by two similarly shifting n times to the left gives result which is equivalent to the original number multiplied by  $2^n$  and Shift Right shifts one bit to the right gives result which is original number multiplied by two similarly n times to the right gives result which is equivalent to original number divide by  $2^n$ .

| S | Operation     |
|---|---------------|
| 0 | Right Shift A |
| 1 | Left Shift A  |

Table-3: Operation performed on Shift Unit



Fig-3: 32 Bit Shift Unit

# 3.4 Clock Gating

In clock gating clock is selectively stopped for a portion of the circuit which is not performing any active computation. The clock gating circuit takes in the clock input and generates a



gated clock based on signal S2 and S3. When control signal input S2 is 0 and S3 is 0, clock is gated through first AND gate to the Arithmetic unit. When S2 is 1 and S3 is 0, clock is gated through second AND gate to Logic unit. When S2 is 0 and S3

is 1, clock is gated through first AND gate to Shift unit. Table I shows signal S2 and S3 activating different units of ALU.

| S2 | S3 | Activation      |
|----|----|-----------------|
| 0  | 0  | Arithmetic Unit |
| 1  | 0  | Logic Unit      |
| 0  | 1  | Shift Unit      |

Table-4: Signal S2 and S3 activating different units of ALU



Fig-4: AND Based Clock Gating.

# 3.5 Output Multiplexer

When gated clock CLK\_AU is active the computed output is from arithmetic unit is fed into 4:1 output multiplexer. Same is the case in logic unit gated clock is CLK\_LU; shift unit gated clock is CLK\_SU. Thus at a time only one gated clock output is active and the computed outputs from arithmetic, logic, shift units are fed into 4:1 output multiplexer. The proper output is selected based on control signal S3.



Fig-5: 32 Bit ALU with clock gating and output multiplexer

# 4. SIMULATIONS AND IMPLEMENTATION

The 32 Bit ALU with clock gating is designed in VHDL using Xilinx ISE 12.4 design suite. The simulation is done using ISim Simulator with a clock period of 1 us. After the design is synthesized on a Spartan 3E device the design is implemented. The simulated results for implemented logic are as shown below.

#### 4.1 Simulation Waveforms



Fig-5: Simulated Waveform for Arithmetic Unit



Fig-6: Simulated Waveform for Logic Unit



Fig-7: Simulated Waveform for Shift Unit



### 4.2 RTL Schematic



Fig-8: Register Transfer Level of 32 Bit ALU

# 4.3 Technology Schematic

The technology schematic exhibits the design based on the logic elements of the target technology, in this case the FPGA The circuit is represented using Lookup tables, multiplexers and flip-flops [2]. The input and output pins are driven through input/output buffers and the clock is driven through a clock buffer.



Fig-9: Technology Schematic View of 32 Bit ALU

# 4.6 Experimental Results

After synthesis, the design is implemented in Spartan 3E FPGA, which converts the logical design into a physical file format. Design Implementation comprises following steps:

**Translate** - merges the incoming net lists and constraints into a Xilinx® design file.

**Map** - fits the design into the available resources on the target device, and optionally, places the design.

**Place and Route -** Places and routes the design to the timing constraints. The device summary of the implemented 32 Bit ALU with clock gating is shown in Table-5 shown below.

|                       | Used | Availability | Utilization |
|-----------------------|------|--------------|-------------|
| No of Slice Latches   | 68   | 29,504       | 1%          |
| No of 4 input LUTs    | 509  | 29,504       | 1%          |
| No of occupied Slices | 321  | 14,752       | 2%          |
| No of bounded IOBs    | 108  | 250          | 43%         |
| No of BUFGMUXs        | 2    | 24           | 8%          |
| No of Slices          |      |              |             |
| containing related    | 321  | 321          | 100%        |
| logic                 |      |              |             |

Table-5: Device Summary for the implemented ALU

# 4.5. Power Analysis

Xilinx Xpower Analyzer is used to analyze the power consumed by a 32 Bit ALU with clock gating and ALU without clock gating.

|                 | DYNAMIC<br>POWER (W) | QUIESCENT<br>POWER (W) |
|-----------------|----------------------|------------------------|
| 32 BIT ALU      |                      |                        |
| WITHOUT CLOCK   | 0.038                | 0.204                  |
| GATING          |                      |                        |
| 32 BIT ALU WITH | 0.028                | 0.204                  |
| CLOCK GATING    | 0.020                | 0.204                  |

Table-6: Power Consumption of 32 Bit ALU with and without clock gating

# 4. CONCLUSION

ALU is the most frequently accessed module in a CPU. Hence power consumption is a major concern in ALU. Clock gating is a well-known method to reduce power consumption. It is observed that clock gating reduces dynamic power consumption by 10 percent compared to the power consumed by a 32 bit ALU without clock gating. The designed ALU is capable of performing 7 arithmetic operations, 4 logical operations and 2 shift operations (multiplication and division).



# **REFERENCES**

- [1] Dr. Neelam R. Prakash, Akash, "Gated Clock Implementation of Arithmetic Logic Unit", Volume 4, Issue 3, ISSN, 2013.
- [2] Ankit Mitra., "Design and implementation of low power 16 bit ALU with clock gating", International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 2, Issue 6, June 2013.
- [3] Anju S. Pillai, Isha T. B, "Factors causing power consumption in an Embedded processor", International Journal of Application and Innovation in Engineering and Management (IJ1IEM), Volume 2, Issue 7, July 2013.
- [4] Dushyant Kumar Sharma, "Effects of Different Clock Gating Techniques on Design", International Journal of Scientific & Engineering Research, Volume 3, Issue 5, May 2012.
- [5] Bishwajeet Pandey, Manish Pattanaik, "Clock Gating Aware Low Power ALU Design and Implementation on FPGA", International Journal of Future Computer and Communication, Volume 2, No 5, October 2013.
- [6] Priya Singh, Ravi Goel, "Clock Gating: A Comprehensive Power Optimizations Technique for Sequential Circuits", Volume 2, Issue 2, April-June 2014.
- [7] Douglas L. Perry, "VHDL Programming by Example, 4<sup>th</sup> ed", Tata McGraw-ill Publishing Company limited, New Delhi.
- [8] Spartan 3E FPGA User Guide.