

# International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING

ISSN:2147-6799 www.ijisae.org Original Research Paper

# Design and Implementation of In - Exact Wallacetree Multiplier Through Reversible-Logic

Nayakanti Raviteja<sup>1</sup>, Durgam Mukesh<sup>2</sup>, Sandakonda Revathi<sup>3</sup>

**Submitted**: 02/06/2024 **Revised**: 20/07/2024 **Accepted**: 03/08/2024

**Abstract:** In this study, an inexact Baugh-Wooley Wallace tree multiplier with a unique architecture for an inexact 4:2 compressor is proposed. It is optimized for realization through the use of different logics that are reversible logic. Gate- Count, Quantum - Cost , Garbage – Output , and Ancilla-Input scales are used to quantify the effectiveness of the proposed reversible logic-based realization of the proposed and Baugh-Wooley Wallace tree multiplier. In this work, an  $8 \times 8$  Baugh-Wooley-Wallace-tree-multiplier and inexact 4:2 compressor is used to design this implementation. The suggested multiplier has the lowest accuracy metrics (MED and MRED) of any known compressor-based multiplier design. There are two applications for the suggested multiplier. 1) Image processing: rationalized db6 wavelet filters bank one-level decomposition and picture smoothing; 2) convolution neural networks (CNN).The Structural Similarity Index Measure (SSIM) is used to gauge how effective the suggested multiplier is in image processing applications.

**Keywords:** Wallace Tree Multiplier, and Ancilla Input (AI), Convolution Neural Networks (CNN), Structural Similarity Index Measure (SSIM), Ladner-Fischer adder.

#### 1. Introduction

Since convolution units are performance-determining and computationally demanding, they are found in the majority of signal processing applications. Convolution units generally use multipliers and adders, with multipliers having a major impact on the area, delay, and power of the unit. Real-world applications requiring computing units-such as those utilized in multimedia and convolution neural networks—high speed multipliers with area and power optimization are highly sought for. There are essentially three stages to a multiplication process.Partial creation of products; Partial accumulation products; and Partial addition products.Because it can achieve regularly structured designs more easily than other topologies like 5:3, 7:2, etc., When it comes to compressor topologies, the most used one is 4:2. Due to the increasing multimedia processing, and other error-tolerant applications, multipliers have recently been investigated in the context of approximation. In recent times, there has been study on optimizing multiplier realizations based on CMOS, FPGA, pass transistor, and FinFET. It has been

<sup>1</sup> UGScholar, Department of Electronics and Communication Engineering, G. Pullaiah College of Engineering and Technology (Autonomous), Kurnool, Andhra Pradesh, India - 518002

 $Email: tejha 002 @\,gmail.com$ 

<sup>2</sup>UG Scholar, Department of Electronics and Communication Engineering, G. Pullaiah College of Engineering and Technology (Autonomous), Kurnool, Andhra Pradesh, India - 518002

Email: sunnymukesh379@gmail.com

<sup>3</sup>UG Scholar, Department of Electronics and Communication Engineering, G. Pullaiah College of Engineering and Technology (Autonomous), Kurnool, Andhra Pradesh, India - 518002

Email: sandakondarevathi@gmail.com

suggested by Akbari that four 4:2 compressors with accurate and approximate modes of operation be used. Even if approximation is implemented, there is an area overhead due to the additional hardware required for the switching logic between accurate and approximate modes. Esposito presented a high ER architecture that was XOR-less. But in situations involving image processing, its high ER renders it ineffective. A compressor based on multiplexers was proposed by Reddy and Edavoor. It would be more suitable to apply these ideas at the traditional gate level. Although there are many gates and areas, By altering the stacking circuit technique, Strollo has suggested a 4:2 compressor design. CMOS-based applications are the focus of the aforementioned designsmore important due to the rapid scaling of electronics, and techniques to reduce this power loss are being studied. To lower the power dissipation, fresh technologies must be investigated. In order to solve this problem, reversible approach circuit and system design is a growing field of study. Bennett's comparison of reversible and conventional irreversible systems revealed that power dissipation can be almost eliminated when a circuit or system is constructed with a reversible model. With the projection of GO, GC, and QC, the authors have compared and tested the effectiveness of the suggested adder and shown how adder circuits can be implemented. Reductions in GO have been observed in the first build, and reduced GC. For integer powers of two input widths, Norwin has proposed a bidirectional barrel shifter based on reversible logic. can be realized using a synthesis technique based on Min-Max algebras, as presented by Khan and Rice. A

synchronous sequential circuit based on reversible logic was proposed by Khan. Pseudo-Reed Muller expressions were used to represent the state transitions and output functions. GO is decreased by the suggested method in the tested reversible circuit design when compared to earlier research. Gaur came up with a revolutionary parity-preserving structure that allows for the scaling of the Arithmetic Logic Unit (ALU) for N bits utilizing reversible logic. GC, QC, GO, and AI efficacy projections are provided. In the case of a 4-bit ALU design, this method improves GC, QC, GO, and AI. A multiple-control Toffoli gate net list optimization technique was proposed by Datta. Replacement and pairwise gate merging procedures are applied repeatedly in this method. QC and GC improvements were achieved by testing the suggested approach on reversible benchmark circuits. When it comes to image processing and enhancement, Raveendran has suggested the development of reversible logic circuits for image kernels. The QC, GO, AI, and GC metrics are used to gauge how well the circuit is implemented. In addition, the SSIM is used to gauge the quality of the processed photos.

Multiplication is unquestionably a crucial function for performance in DSP and AI applications. High speed multiplier architectures are necessary for these applications because they call for high speed parallel operations with suitable accuracy levels. Approximations are used in multipliers to achieve acceptable accuracy levels at high speed, low power consumption and latency. The multiplication operation that restricts performance in adder networks is partial product summation due to propagation latency. To assess the effectiveness of a CNN-based application, the model's accuracy is examined. An essential concern in VLSI circuits is heat. That being said, there is no heat dissipation due to the reversible logic. Thus, less energyconsuming complementary metal oxide semiconductor designs, among other things, play a significant role in nanotechnology.

#### 2. Literature Survey

The quantity of reversible gates varies in kind and computational complexity, making it an unreliable measure of optimization and as a result, its quantum cost and delay will differ. Furthermore, latency is a significant metric that has not been discussed as a design parameter to be minimized in earlier publications on reversible sequential circuits. A Dadda multiplier is used to examine and suggest four distinct ways to use the suggested approximation compressors. Fast multimedia applications have opened up a whole new field in approximation computation and fast error-tolerant circuits. At the expense of decreased precision, these

applications offer great performance. Delays, power consumption, and system architecture complexity are also decreased by these approaches, this research explores and suggests their design and analysis. Modern approximation multipliers are less accurate than these multipliers in terms of accuracy. Because adder network propagation delays cause a speed constraint Compressors are inserted to shorten the propagation delay. The sum is calculated and carried out concurrently by compressors at each level. This research explores and suggests their design and analysis. More area and power with higher delay will be achieved by adopting this full adder based compressor design. Two approximate compressors are suggested in this work. The proposed design thus significantly lowers MED and MRED without lowering the error rate. Revised dual-stage compressor architecture was also suggested in the article, which improved area, latency, and power without affecting accuracy metrics. Briefly stated, (MWSCAS), Aug. 2019, pp. 339-342, J. Pujar, S. Raveendran, T. Panigrahi, M. H. " Bits are erased during logic operations in conventional digital systems, which results in information loss and a significant loss of energy and power. Bits are retained at the output of reversible computations, negating information loss. Adders are a fundamental and performance-determining component of computational systems for arithmetic and logic. Combining a Fredkin gate with a Feynman gate, this research proposes an energy efficient low power reversible complete adder.Compared to complete adder architectures found in the literature, the suggested adder efficiently lowers ancilla inputs, garbage outputs, quantum cost, and transistor count. Reversibility requirements are not satisfied by a traditional complete adder circuit, which has three inputs and two outputs. It is necessary to add one garbage output and one ancilla input in order to accomplish reversibility. The current architectures mostly present the building techniques for complete adders based on reversible logic. In this study, a reversible full adder design approach is proposed. The suggested design strategy allows for the based on reversible logic effectively reduces transistor count (TC), GO, AI, and QC. This paper proposes the use of reversible logic with optimized power dissipation to construct an this paper also suggests a fundamental C-MOS circuit analysis. This study presents a reduction in power consumption, transistor count, quantum cost, ancilla inputs, and trash output size. A. The scientific literature, which suggests many circuits constructed using approximate 4-2 compressors, shows a great deal of interest in approximation multipliers. The scientific literature, which suggests many circuits constructed using approximate 4-2 compressors, shows a great deal of interest in approximation multipliers. The scientific literature, which suggests many

circuits constructed using approximate 4-2 compressors, shows a great deal of interest in approximation multipliers. The scientific literature, which suggests many circuits constructed using approximate 4-2 compressors, shows a great deal of interest in approximation multipliers. The challenge for the designer who wants to employ an approximation 4-2 compressor is choosing the appropriate topology because there are so many alternatives available. The circuits under investigation are utilized in the creation of  $8 \times 8$  and  $16 \times 16$  multipliers. We examine two multiplier configurations, both signed and unsigned, Because the optimal solution varies on the needed precision, the taken into account error metric, and the singleness of the multipliers, In order to aid in the topology selection, we have published power-error tradeoff curves and image processing samples.

#### **Existing Method**

An essential concern in VLSI circuits is heat. That being said, there is no heat dissipation due to the reversible logic. Thus, less energy-consuming complementary metal oxide semiconductor designs, among other things, play a significant role in nanotechnology. The idea is that when transistor density increases and energy depletion increases, conventional technologies will be forced to confront these challenges. During logical influence, bits of orientation are erased in ordinary areas or ranges, powerful amount resulting in a of satisfaction. Therefore, reversible logic technology can be used to increase rapidity, decrease energy dispersion, and dissipate heat waves. Therefore, it is employed to reduce energy consumption and maximize speed.

Waste Output (WOU)

Below are the expressions for HACARRY and HASUM.



Fig 4:Half-adder

# A. FULL ADDER:

An entire adder is utilized to determine the sum of three bits. Three inputs (FAIN1, FAIN2, and FACIN) and two outputs (FASUM and FACARRY) make up a full adder. Here is how FASUM and FACARRY are expressed.

FAIN1 ⊅ FAIN2 ⊕ FACIN = FASUM (FAIN1 ⊕ FAIN2) • FACIN + FA IN1 • FAIN2 = FACARRY



Fig 5: Full adder using reversible logic

## B. Exact 4:2 Compressor:

An exact compressor's reversible logic realization is seen in Figure 6



Fig 6: Reversible logic compressor design.

C. Imperceptible 4: 2 Compressor Suggestion Figure 7 shows the reversible logic gates circuit realization of the suggested inexact 4:2 compressor.



Fig 7: Inexact 4: 2 compressor

The 8x8 multiplier shown below is created by utilizing these reversible logic adders and compressors."



Fig 8: 8x8 multiplier

#### **Proposed Method** 4.

Multiplication is unquestionably a crucial function for performance in DSP and AI applications. High speed multiplier architectures are necessary for these applications because they call for high speed parallel operations with suitable accuracy levels. Approximations are used in multipliers to achieve acceptable accuracy levels at high speed, low power consumption and latency. The multiplication operation that restricts performance in adder networks is partial product summation due to propagation latency. At each level, compressors concurrently calculate the total and transfer. The resultant carry and a more substantial sum bit are added together in the subsequent step. The efficient architecture modules need two OR gates, one AND, one XOR, and a MUX in addition to that (Figure 6). OR and AND gates each require eight transistors to be implemented in CMOS logic. It implemented NOR-NAND gates though low transistor component. Even though the modified architecture's SUM and CARRY are not precisely the same as those of the recommended compressor design 4:2. Error rate is low. The suggested multiplier has the lowest accuracy metrics (MED and MRED) of any known compressor-based multiplier design.



Fig 9: Dual-stage 4:2 compressor reconstruction

Fig. 9 displays the fundamental parts of the proposed modified dual-stage 4:2 compressor reconstruction. The output count can be lowered to two by approximating the 4:2 compressors. COUT is eliminated in order to approximate. This only results in an error if the input combination is "1111." Therefore, when designing lowpower VLSI systems, an energy-efficient multiplier design might be quite important. Only the least significant portion of the output is estimated in the suggested method. There is additional area, power, and delay reduction when more than nine bits are approximated.

But there is also a noticeable decline in quality. Since they are fast, fault-tolerant, and low power consumption, there is an increasing need for efficient approximate multipliers. An approximation compressor is proposed to create an 8-bit multiplier in this paper.



Fig 10: 8-bit Multiplier

Parallel Prefix adder: it is a good arithmetic operation efficient and is adaptable to accelerate binary addition. Device development is aided by research on motivation and binary operation components. The reason field programmable gate arrays, or FPGAs, have been so popular recently is because they have accelerated. This architecture design consist pre-processing, production, and post processing. Processing stage we have

propagate (Pi) with XOR operation and generate (Gi) with AND operation

Carry propagate

$$P0 \text{ AND } P1 = Cp -----(3)$$

Carry generate

Carry generate

# **Post-Processing Stage:**

The output is presented as total and is depicted in equation 6. aequation 6 is the last role in Ladner-Fischer adder

Each bit carry goes through a post-processing stage using propagate to yield the final sum Prior to propagating and generating, the first set of input bits go through preprocessing. Following a stage of carry generation, they propagate, generate, and carry propagate before going through a post-processing stage to yield the final sum.Fig. depicts the Ladner-Fischer adder's step-by-step procedure. The Ladner-Fischer adder structure is the fastest adder that concentrates on gate level logic. It resembles a tree structure and offers exceptional speed for arithmetic operations. Less gates are used in its designs. Thus, it reduces the architecture's latency and memory usage.



Fig 11: Ladner-Fischer Adder

# 5. RESULTS

#### **Simulation Results of Proposed method:**



Fig 12: Simulation Results of Proposed Design

## Area:

Selected Device: 7al00tcsg324-3

| Slice Logic Utilization:<br>Number of Slice LUTs:<br>Number used as Logic: |    | out of out of |     | 0%<br>0% |
|----------------------------------------------------------------------------|----|---------------|-----|----------|
| Slice Logic Distribution:                                                  |    |               |     |          |
| Number of LUT Flip Flop pairs used:                                        | 88 |               |     |          |
| Number with an unused Flip Flop:                                           | 88 | out of        | 88  | 100%     |
| Number with an unused LUT:                                                 | 0  | out of        | 88  | 0%       |
| Number of fully used LUT-FF pairs:                                         | 0  | out of        | 88  | 0%       |
| Number of unique control sets:                                             | 0  |               |     |          |
| IO Utilization:                                                            |    |               |     |          |
| Number of IOs:                                                             | 33 |               |     |          |
| Number of bonded IOBs:                                                     | 33 | out of        | 210 | 15%      |

Fig 13: Area Resultscreenshot of Proposed Design Delay:

| elay:           | 5.944ns   | Levels o | f Logic | : = 11)                                        |
|-----------------|-----------|----------|---------|------------------------------------------------|
| Source:         | a<3> (PAI | ))       |         |                                                |
| Destination:    | y<13> (P) | AD)      |         |                                                |
| Data Path: a<3> | to y<13>  |          |         |                                                |
|                 |           | Gate     | Net     |                                                |
| Cell:in->out    | fanout    | Delay    | Delay   | Logical Name (Net Name)                        |
| IBUF:I->0       | 11        | 0.001    | 0.425   | a 3 IBUF (a 3 IBUF)                            |
| LUT2:I0->0      | 3         | 0.097    | 0.389   | n44/a b AND 1 ol (w<43>)                       |
|                 |           |          |         | n81/n4/Mxor d xo<0>1 (n81/f<3>)                |
| LUT6:11->0      | 2         | 0.097    | 0.300   | n89/n6/a b OR 1 ol (x<5>)                      |
| LUT6:15->0      | 4         | 0.097    | 0.697   | n90/n7/Mxor d xo<0>1 (w<87>)                   |
| LUT5:10->0      | 1         | 0.097    | 0.556   | pl/gc4/G4 (pl/gc4/G3)                          |
| LUT4:10->0      | 1         | 0.097    | 0.683   | pl/gc4/G5 SWO (N10)                            |
| LUT6:11->0      | 3         | 0.097    | 0.521   | pl/gc4/G5 (c<31>)                              |
| LUT5:12->0      | 4         | 0.097    | 0.525   | n106/n3/Mxor d xo<0>11 (n106/n3/Mxor d xo<0>1) |
| LUT3:10->0      | 1         | 0.097    | 0.279   | n106/n3/Mxor d xo<0>2 (y 13 OBUF)              |
| OBUF:I->O       |           | 0.000    |         | y_13_OBUF (y<13>)                              |
| Total           |           | 5.944ns  | (0.874  | ins logic, 5.070ns route)                      |
|                 |           |          |         | logic. 85.3% route)                            |

Fig 14: Delay Resultscreenshot of Proposed Design

#### **RTL Schematic:**



Fig 15: Technology Schematic 1



Figure 16: Technology Schematic 2

Table 1: Evaluation Table for Area and Delay

| Types    | Area    | Delay(ns) |
|----------|---------|-----------|
|          | (LUT's) |           |
| Existing | 97      | 6.119     |
| Method   |         |           |
| Proposed | 88      | 5.944     |
| Method   |         |           |
|          |         |           |

## 6. Conclusion

We designed and implement of EffientSigned Wallace Tree Multiplier though Reversible Logic Design. Existing methodology consists of high are of design and time consumption also high that mean delay is high. This

proposed system overcomes the disadvantages of existing. This system is Low area 88 LUT's for chip design and high speed 5.9 ns delay. The hardware complexity can be decreased by employing a complimentary gate-based compressor and a parallel prefix adder to cut down on delay. According to the results of the experimental research, the suggested design is efficient interms of area and speed using Wallace tree algorithm. The suggested multiplier has the lowest accuracy metrics (MED and MRED) of any known compressor-based multiplier design. The Ladner-Fischer adder structure is the fastest adder that concentrates on gate level logic. It resembles a tree structure and offers exceptional speed for arithmetic operations. Less gates are used in its designs. Thus, it reduces the architecture's latency and memory usage.

#### REFERENCES

- [1] L. Dadda, "some schemes for parallel multipliers," Alta Frequenza, vol. 34, no. 5, pp. 349–356, Mar. 1965.
- [2] S. Wallace, "A suggestion for a fast multiplier," IEEE Trans. Electron. Comput. Vol. EC-13, no. 1, pp. 14–17, Feb. 1964.
- [3] Z. Wang, G. A. Jullien, and W. C. Miller, "A new design technique for column compression multipliers," IEEE Trans. Comput., vol. 44, no. 8, pp. 962–970, Aug. 1995.
- [4] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, "Dual-quality 4:2 compressors for utilizing in dynamic accuracy configurable multipliers," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 4, pp. 1352–1361, Apr. 2017.
- [5] Esposito, A. G. M. Strollo, E. Napoli, D. De Caro, and N. Petra, "Approximate multipliers based on new approximate compressors," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 12, pp. 4169–4182, Dec. 2018.
- [6] K. Manikantta Reddy, M. H. Vasantha, Y. B. Nithin Kumar, and D. Dwivedi, "Design and analysis of multiplier using approximate 4- 2 compressor," AEU Int. J. Electron. Commun., vol. 107, pp. 89–97, Jul. 2019.
- [7] P. J. Edavoor, S. Raveendran, and A. D. Rahulkar, "Approximate multiplier design using novel dualstage 4:2 compressors," IEEE Access, vol. 8, pp. 48337–48351, 2020. [8] A. Gorantla and P. Deepa, "Design of approximate compressors for multiplication," ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, p. 44, May 2017.

- [8] A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. D. Meo, "Comparison and extension of approximate 4-2 compressors for lowpower approximate multipliers," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 3021-3034, Sep. 2020.
- [9] N. Van Toan and J.-G. Lee, "FPGA-based multilevel approximate multipliers for high-performance error-resilient applications," IEEE Access, vol. 8, pp. 25481-25497, 2020.
- [10] C.-H. Chang, J. Gu, and M. Zhang, "Ultra lowvoltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 10, pp. 1985–1997, Oct. 2004.
- [11] P. Zakian and R. N. Asli, "An efficient design of low-power and highspeed approximate compressor in FinFET technology," Comput. Electr. Eng., vol. 86, Sep. 2020, Art. no. 106651.
- [12] G. E. Moore, "Cramming more components onto integrated circuits," Electronics, vol. 38, no. 8, pp. 114-117, Apr. 1965.
- [13] R. Landauer, "Irreversibility and heat generation in the computing process," IBM J. Res. Develop., vol. 5, no. 3, pp. 183-191, Jul. 1961.
- [14] C. H. Bennett, "Logical reversibility computation," IBM J. Res. Develop., vol. 17, no. 6, pp. 525-532, Nov. 1973.



NayakantiRaviteja pursuing Bachelor of Technology in Electronics and Communication Engineering from G. Pullaiah College of Engineering and Technology (Autonomous) affiliated to Jawaharlal Technological University Anantapur India.

His area of interests are Communication /IOT.

E-Mail:-tejha002@gmail.com



DurgamMukesh pursuing Bachelor of Technology in Electronics and Communication Engineering from G. Pullaiah College of Engineering and Technology (Autonomous) affiliated to Jawaharlal Technological University Anantapur India. His area of interests are Communication /IOT. E-Mail sunnymukesh379@gmail.com



SandakondaRevathipursuing Bachelor of Technology in Electronics and Communication Engineering from G.Pullaiah College of Engineering and Technology (Autonomous) affiliated to Jawaharlal Technological University Anantapur India.

His area of interests are Communication /IOT.

E-Mail:- sandakondarevathi@gmail.com