Circuit Optimization For Transmission Gate Master Slave Flip-Flops

In this work, when dealing with transmission-gate-based master-slave (TGMS) flip-flops (FFs), a reconsideration of the classical approach for the delay, power, and area minimization is worthwhile to improve the performance in high-speed designs [1]. In particular, by splitting such FFs into two sections that are separately optimized and then reconciling the results, the emerging design always outperforms the one resulting from the employment of a classical Logical Effort procedure assuming such FFs as a whole continuous path[1]. Simulations have been performed at transistor level on several well-known TGMS FFs, designed in 65-nm and 90nm technologies using Microwind3.1 CAD tool, and the results have been compared to validate the correctness of such a procedure and of the underlying assumptions. Significant improvements have been found on delay, power and on area occupation, thus showing that this approach allows correctly dealing with the actual path in such circuits and hence to more properly steering the design towards the achievement of efficiency in the high-speed region[1].


INTRODUCTION:
Flip-Flops (FFs) are the basic building blocks of data path structures. Indeed, they allow for the storage of data processed by combinational circuits and the synchronization of operations at a given clock frequency [1]. Due to their multistage structure, high clock switching activity, and increasing portion of clock period occupied by their timing latency, the speed and energy of FFs significantly affect the overall performance of a data path [2], [3]. Optimal FF design strategies are usually based on automated algorithms embedded directly into simulators [1], [3], [4]. These algorithms are powerful methods to optimize constraints such as speed, energy consumption, or energy-delay products, even for complicated FFs consisting of several internal nodes. Moreover, they also allow to account for the joint optimization of FFs and clock networks, for instance, through a proper clock slope setting [5]. Of course, the resulting design strategies will depend on the specific FF topology and on the design constraint to be optimized. The appropriate choice of flip-flop (FF) topologies is of fundamental importance in the design of VLSI integrated circuits and, in particular, of both-speed and lowenergy Microprocessors. Indeed, FFs affect the clock frequency, since their delay occupies a significant fraction of the clock Cycle, especially in fast micro-architectures with low logic depth [2]. Moreover, together with the circuits devoted to the clock generation and distribution, FFs are part of the clock network, which is responsible for 30%-50% of the whole chip energy budget .Latches and flip-flops are basic sequential elements commonly used to store logic values and are always associated with the use of clocks and clocking networks.
In low-energy, constant throughput system, the supply voltage is often scaled down to minimize the energy consumption. The design of the clocking subsystem-register elements and clock distribution network-has to be resistant to noise and timing failures for robust circuit operation. Noise robust designs are usually fully static or pseudo-static .As a result, reducing the power consumed by flip-flops will have a deep impact on the total power consumed. In addition, from a timing perspective, flip-flop latency consumes a large portion of the cycle time while the operating frequency increases. Accordingly, flip-flop choice and design has a profound effect both in reducing the power dissipation and in providing more slack time for easier time budgeting in high-performance systems. FFs can be basically split into two topological categories: Pulsed FFs and MSFFs [7].The former feature an internally or externally generated time window during which the FF is transparent to the input data.
In this work, large number of Master-Slave flip-flops and DET topologies having less power dissipation, delay and energy in 65nm and 90nm CMOS technology have been proposed.

TRADITIONAL METHOD:
Traditionally, Logical Effort (LE) optimization is carried on by looking at the whole circuit as a unique uninterrupted path [1], [6]. Actually, it has been shown that for this specific class of circuits, the problem of delay minimization has to be looked at from a different perspective by resorting to a novel approach, which gets inspiration from preliminary considerations in [8]. The LE basis is still exploited but, unlike the traditional methodology, TGMS FFs are split into two overlapping sections and two different paths that are separately optimized. In particular, the paths considered are the first part of the one considered in the traditional methodology and the clock-to-output one. As will be shown, breaking the datato-output path instead of considering it as a whole leads to the actual delay minimization. Remarkably, also energy consumption and area occupation of the resulting designs are always significantly lower than those obtained with the traditional LE method. Therefore, this means that the actual path effort of TGMS FFs is more properly handled through such a new approach, whereas the traditional one fails to correctly catch it. These considerations can be practically exploited when sizing these circuits in the high-speed energy-efficient design region, i.e., as a base (or as a starting point) when accounting also for energy in the minimization of energy-delay products, where is significantly larger than .The remainder of this paper has been organized as follows. In Section II, the main definitions about FFs timing have been clarified and applied to the case of TGMS FFs. In Section III, the novel approach has been discussed by revisiting the method of LE. Three TGMS FFs have been designed to exemplify the proposed approach. Comparisons with the traditional design strategy and with the results of a simulations-driven optimization have been carried out.

PROPOSED METHOD:
The proposed approach suggested in this work breaks up the optimization in two steps [1]. In particular, two LE optimizations have been carried out to minimize the delays from input to node (path1) and from (enabling the Slave TG) to output (path 2), and then the results have been reconciled [1]. In the authors' opinion, such an approach is intuitively justifiable since the signals coming from and which traverse block would experience a different effort according to the classic interpretation [1]. Hence, though blocks and nearly contemporarily act when the condition is satisfied, two distinct overlapping paths can be identified. Such paths are not simply restricted to master and slave sections [1]. Indeed, the first delay (up to node) is influenced by the enabled block in the slave (and hence by the input capacitance of the gate that follows block) and the second delay is influenced by the resistance introduced by block [1] .According to the above point of view, the overall path effort is hence more appropriately broken into two separate contributions and, rather than according to the minimum is actually found according to notation means that the delay is optimized by applying the LE method [1]. Two sets of LE parameters have to be derived for paths 1 and 2, and condition (2) is applied to both paths [1]. Note that the input capacitance of the gate following block is considered as the final load for path 1, while blocks represent the first stage for path 2 [1].Further arrangements are necessary to properly define the LE equations according to the FF topology (the examples in the next section clarify many practical aspects). Nevertheless, it is anticipate that, by separately optimizing path 1 and path 2 and then reconciling the results, a unique possible size for blocks and (and hence for all of the other gates) comes out, just like in the traditional LE approach [1]. O c t 15, 2 0 1 3

Circuit Styles:
In the Master-Slave FFs, circuits are driven by complementary clock signals. The master-slave flip-flop comprises a master stage and slave stage. The master-slave configuration has the advantage of being edge-triggered, making it easier to use in larger circuits, since the inputs to a flip-flop often depend on the state of its output. This eliminates the possibility of ambiguous outputs, which can occur in single-element flip-flops as a result of propagation delays of the individual logic of ambiguous outputs, which can occur in single-element flip-flops as a result of propagation delays of the individual logic input with same clock pulse [1].
The Three MS (master-slave) topologies considered are as follows: A.

Transmission Gate Flip-Flop:
A Flip-Flop can be designed as a latch pair, where one is transparent-high, and the transparent-low. Master-Slave Flip-Flops based on transmission gates are the best when energy is the main concern. The edge-triggered flip-flop is built from two D-type level-triggered latches. Both latches are enabled with opposite polarity of the clock signal: The second (or slave) latch is controlled by the clock signal, while the first (or master) latch is enabled by the negated clock. The Transmission-gate flip-flop with input gate isolation is derived from the power PC603 latch-pair , where the input gate isolation is added for better noise immunity. An additional inverter at the output of the TGFF provides non-inverting operation [1].
In this TGFF, we use the transmission gates for both master and slave latches as shown in fig.1. It is one of the fastest classical structure. Its main advantage is the short direct path and the low power feedback. The large load on the clock will greatly affect the total power consumption of the flip-flop. This flip-flop is the transmission gate flip-flop; it has a fully static master-slave structure, which is constructed by cascading two identical pass gate latches and provides a short clock to output latency. It has a poor data to output latency because of the positive setup time. Moreover, it is sensitive to clock signal slopes and data feed through. This adds another concern when using it [1].
Clocked capacitances should be minimized in order to reduce the clock load. The method of logical effort [6] is used in transistor size optimization. The path in the TGFF responsible for the CLK-Q delay is depicted. The off-path capacitance, Coff-path is equal to the gate capacitance of two minimum width feedback transistors. Keeper transistors in the feedback of both master and slave latches are of minimal width. Minimum sizing of the master stage minimizes the energy consumption with little impact on the setup time [1].

Write-Port Master-Slave Flip-Flop:
Write-Port Master-Slave latch has the structure devoid of PMOS in its pass gate. Despite of this advantage, this latch shows worse performance and power consumption than TGFF. This is partly due to the adoption of NMOS. Passtransistors and typically non-gated Keepers but also to the impact of the longer internal wires are needed [1].

Figure2: Schematic of the WPMS FF [1]
In this circuit n-MOS only clocked transistors are used to reduce energy consumption. However on-path inverters, which are interrupted by clock signals at, respectively slow down the flip-flop and increase the energy consumed. On-path inverters do not receive a clock signal; the operation speed is increased while energy consumption is decreased. By removing the clock signal from the on-path inverters timing interruptions are removed that would otherwise slow down the flip-flop. Also by removing the clock signal from the on-path inverters, the energy consumption required for the flip-flop to operate is reduced. Transmission-gate transistors provide very low-energy consumption and relatively short delay. The transmission-gate transistor is serially connected to flip-flop input D of the inverter and the transmission-gate transistor is connected between two inverters. Small n-MOS transistors are placed in parallel with each of the transmission-gate transistors to compensate for a voltage drop when a logic level 1 is propagated through the transmission-gate transistors [1].
Transistors are serially connected with each small n-MOS transistors to further reduce energy consumption by not pullingup voltage levels of the flip-flop. The master latch includes the first pair of parallel-connected inverter, the transmissiongate transistor, and the small n-MOS transistors. The Slave latch includes the inverter, the transmission-gate transistor and the small n-MOS transistors and output inverter. When logic 0 is present on the clock signal, the logic level on the input D disables the slave latch. The slave latch is enabled and the master latch is disabled when a logic level 0 is present on the clock signal. Once enabled, the logic level of the output will be the inverted logic level of the stored logic level present at the WPMS has minimum delay, energy-delay product and minimum energy worse than the TGFF[1].

Clocked CMOS Flip-Flop: (A Clock-Skew Insensitive Approach)
An ingenious positive edge-triggered register that is based on a master-slave concept insensitive to clock overlap as shown in Fig3 has been proposed. This circuit is called the C 2 CMOS (Clocked CMOS) flip-flop which operates in two phases: when clk=0, the first driver is turned on, and the master stage acts as an inverter sampling the inverted version of D on the internal node X. The master stage is in the evaluation mode. When clk=1, the master stage section is in hold mode, while the second section evaluates. The previous value stored is propagated to the output node through the slave stage, which acts as an inverter [1]. In this work, it has been shown that C 2 MOS circuits, due to their low power consumption and the ability to apply pipelining at much finer level, can be used to build very high throughput circuits with low power consumption.

SIMULATION RESULTS:
The comparison

CONCLUSION:
A reconsideration of the classical approach for the delay, power, and area minimization to improve the performance in high-speed designs have been achieved by splitting such FFs into two sections that have been separately optimized and then reconciling the results. Simulations have been obtained at transistor level on several well-known TGMS FFs, designed in 65-nm and 90nm technologies using Microwind3.1 CAD tool. Significant improvements have been found on delay, power and on area occupation, thus showing that this approach realizes high performance designs.