The Decomposition of DSP's Control Logic Block

Borisav Jovanović, Milunka Damnjanović, Dejan Stevanović

Abstract – The paper considers the architecture and low power design aspects of the digital signal processing block embedded into a three-phase integrated power meter IC. Utilized power reduction techniques were focused on the optimization of control logic block. The operations that control unit performs are described together with power-optimization results.

Keywords - digital signal processing, power optimization.

I. INTRODUCTION

Nowadays, the most of circuits used for measurement of power line parameters embed digital signal processors (DSP). This paper proposes a DSP circuit which enables high performances at the level as those obtained with commercial DSP microprocessors, and at the same time, saves the occupied chip area and minimizes power consumption.

The proposed DSP circuit is incorporated into Integrated Power Meter (IPM) system-on-chip. DSP receives from AD converters [1] and digital filters [2] 16-bit digital samples of voltage, current and phase-shifted voltage signals at data-rate of 4096 samples per second, and calculates following power-line parameters:

- root mean square values for voltage and current,
- mean values for active power, reactive power, distortion and apparent power,
- active and reactive energy,
- power factor, and
- frequency.

The measurement range for current signal is from 10mA RMS to 100A RMS, and for voltage it is up to 300V RMS. The results are obtained for three power line phases.

The paper explains the operations performed by DSP, including the novel digital filtering methods, used for processing the instantaneous values of current- and voltage-sample signals. Besides, new circuit for distortion power measurement is presented. Since DSP's control unit is one of largest and most power consuming DSP's part, the paper presents the utilized techniques for power minimization, which are mainly focused on optimization of control logic block.

Borisav Jovanović and Milunka Damnjanović are with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {borisav.jovanovic, milunka.damnjanovic} @elfak.ni.ac.rs.

Dejan Stevanović is with The Innovation Center, School of Electrical Engineering, University of Belgrade, d.o.o. (ICEF), Bul. Kralja Aleksandra 73, 11120 Belgrade, Serbia, E-mail: dejan.stevanovic@venus.elfak.ni.ac.rs.

II. DSP’S OPERATION

A. Controller/datapath architecture

The architecture of DSP [3, 4] utilizes controller/datapath architecture and consists of several blocks:

- Block 1 – the part which consists of arithmetical units used for \(I^2, V^2, P, Q\) accumulating and energy calculation
- Block 2 - including arithmetical operators used for calculation of current and voltage RMS, power factor, active, reactive, distortion and apparent power
- Block 3 - control unit that controls all other parts of DSP.
- Block 4 - frequency measurement circuit
- Block 5 - RAM memory block storing the measurement results

DSP's control unit (Block 3) is implemented as finite state machine (FSM). During DSP’s measurement operation, the control unit periodically executes main state sequence that lasts 1024 clock periods [5], repeated 4096 times during the time interval of one second. The sequence is divided into four sub-sequences called R, S, T and E that lasts 256 clock periods each. The first three sub-sequences R, S and T control the calculations made for each phase of the three-phase energy system. The fourth sub-sequence, denoted E, manages the calculations that are periodically repeated every second [5].

The control unit is composed of four smaller finite state machines: named F0, F1, F2 and F3. The reason for dividing the control unit is significant power consumption reduction which will be examined in following sections. Two sub-FSMs, F1 and F2, perform arithmetical operations within the Block1 during the phases R, S and T, while sub-FSM F3 - performs operations within Block 2 during E period. The F0 is intended for RAM memory initialization and F0 is active only at the beginning of chip operation, after the main reset state. The operations that F1 and F2 perform will be described in detail.

B. The operation of F1

The FSM F1 executes the state sequence during the phases R, S and T and consists of one hundred and two states.

At the beginning of the F1 operation sequence, the AC part of instantaneous samples of current \(m_{I_{ac}}\) (stored in RAM block) is squared in the multiplication unit within the
Block1. The squared value \( m_{Iac}^2 \) is then passed through the digital Low Pass Filter (LPF), and after, it is accumulated into the accumulation register \( m_{AccIac}^2 \).

The LPF is implemented as Infinite Impulse Response (IIR) digital filter and helps in reducing the \( I_{RMS} \) calculation error. The error could exist because the time interval of one second (that is, accumulating time of the value \( m_{Iac}^2 \)) is not always equal to the integer number of power-line-signal half-periods. LPF has cut-off frequency 10Hz and its transfer function is given by Eq.1.

\[
H_{LPF}(z) = \frac{z^{-6}}{1 - z^{-1}(1 - z^{-6})}
\]

(1)

The filter transfer function can be transformed into the following equations performed by DSP:

\[
m_{FIac}^2 x64_{NEW} = m_{FIac}^2 x64(1 - \frac{1}{2^6}) + m_{Iac}^2
\]

(2)

\[
(m_{Iac})_{DC} = (m_{FIac}^2 x64) / 64
\]

(3)

All these operations are done by arithmetical circuits within the Block 1. The structure of Block 1 is given in Fig.1 and includes one multiplication unit and one circuit for addition and subtraction. Only the inputs (the AC part of current signal \( m_{Iac} \), values of LPF register \( m_{FIac}^2 x64 \) and accumulation register \( m_{AccIac}^2 \)) are stored in the RAM memory block. The transfer of data between the Block 5 (RAM memory) and Block 1 is achieved through 24-bit data bus. The intermediate results of operations are temporarily stored in the registers RegA and RegB of Block1 (Fig.1).

The sequence of operations for the accumulation of squared current values is given by the Fig. 2. The sequence consists of simple data transfer, shifting, multiplication and addition operations which are performed at registers RegA and RegB.

The operations utilize contents of RAM memory registers:

- \( m_{Iac} \) – which contains the AC part of instantaneous current sample
- \( m_{FIac}^2 x64 \) and \( m_{FIac}^2 x64 \) are the 24-bit MSB and 24-bit LSB parts of 48-bit LPF register \( m_{FIac}^2 x64 \), which contains the DC value of \( I_{ac}^2 \), multiplied by constant value equal to 64.
- \( m_{AccIac}^2 \) is 48-bit register for the accumulation of squared current samples.

\[
m_{Iac} \rightarrow RegA_h, RegB_h
\]

\[
RegA_h \times RegB_h \rightarrow RegA
\]

\[
m_{FIac}^2 x64 \rightarrow RegB_1
\]

\[
RegA \times (RegB >> 6) \rightarrow RegA
\]

\[
RegA + RegB \rightarrow RegA
\]

\[
RegA_h \rightarrow m_{FIac}^2 x64_h, RegB_h
\]

\[
RegA_1 \rightarrow m_{FIac}^2 x64_1, RegB_1
\]

\[
m_{AccIac}^2_h \rightarrow RegA_h
\]

\[
m_{AccIac}^2 \_l \rightarrow RegA_l
\]

\[
RegA + (RegB >> 6) \rightarrow RegA
\]

\[
RegA_h \rightarrow m_{AccIac}^2_h
\]

\[
RegA_1 \rightarrow m_{AccIac}^2_1
\]

Fig.2 The sequence of accumulation of squared current values controlled by F1

The similar procedure is performed by Block 1 for processing the \( V_{ac}^2 \) (necessary for obtaining \( V_{RMS} \)) and instantaneous values of active and reactive power. The results are stored in the RAM registers: \( m_{AccV_{ac}^2} \), \( m_{AccP} \) and \( m_{AccQ} \). The difference is in the multiplication operands: voltage samples are multiplied to obtain \( V_{RMS} \); voltage and current sample values for active power, and current-sample value is multiplied with phase-shifted voltage-sample for reactive power processing.

C. The operation of F2

The F2 is active during the phases R, S and T. It controls the energy pulses generation for measured active and reactive energy. It consists of one hundred and ninety three states. A pulse is generated when measured energy exceeds some predetermined energy level. The default energy level is one Whr (Watt-hour) for active and VAR (Volt-Ampere reactive) for reactive energy.

The DSP has four outputs producing the narrow pulses: \( Ea_{pos} \) – for consumed active, \( Ea_{neg} \) – generated active, \( Eq_{pos} \) – inductive reactive, and \( Eq_{neg} \) – capacitive reactive energy. The energy level is stored in \( m_{Whr} \) register, the part of RAM memory block, and can be modified. The operations are carried out by Block 1 using
the adder/subtractor and registers RegA and RegB.

The sequence of operations is given in Fig.3. At the beginning of each sequence, performed exactly 4096 times during the time interval of one second, the active power value $m_P$, is added to the value of 48-bit register $m_{AccEa}$. The $m_{AccEa}$ consists of two parts: the MSB part - $m_{AccEa\_h}$ and the LSB part - $m_{AccEa\_l}$, both stored in RAM. After addition operation is done, the value of $m_P$ and new value of $m_{AccEa}$ are compared with zero. If value of $m_P$ is positive and if new value of $m_{AccEa}$ is greater than the energy level equivalent (given by $m_{Whr}$), a pulse on $Ea\_pos$ is generated and $m_{AccEa}$ is subtracted by the $m_{Whr}$ value. Else, if both $m_P$ and $m_{AccEa}$ are negative, a pulse on $Ea\_neg$ is generated, and value of $m_{Whr}$ is added to $m_{AccEa}$.

The similar procedure stands for the reactive energy processing. Accompanied registers are $m_{AccEq\_h}$ and $m_{AccEq\_h}$.

![Fig.3 The sequence of operations producing the energy pulses on $Ea\_neg$ and $Ea\_pos$ pins](image)

Besides dealing with energy pulses, the F2 eliminates DC offsets from instantiations current and voltage signals that are derived from digital filters. This is necessary for the calculation of current and voltage RMS value. The DC offset will give a DC component after squaring operation. Since this DC component is extracted by LPF, this offsets can induce the error to RMS values. This problem is avoided by introducing the HPF in voltage and current signal processing chains. The HPF, applied to instantaneous current and voltage signals, is implemented as Infinite Impulse Response (IIR) digital filter with cut-off frequency 5Hz and transfer function as given by Eq.4:

$$H_{HPF}(z) = \frac{1-2^{-10}}{1-z^{-1}(1-2^{-9})}$$  \hspace{1cm} (4)

The HPF transfer function can be transformed into the equations (5) and (6) performed by DSP.

$$m_{\_FIx1024\_NEW} = m_{\_FIx1024}(1 - \frac{1}{2^9}) + (2^{10} - 1)(m_{\_I} - m_{\_I\_p})$$  \hspace{1cm} (5)

$$m_{\_I_{AC}} = m_{\_FIx1024}/1024$$  \hspace{1cm} (6)

The following registers values are used in the equations (5) and (6):

- $m_{\_I}$ and $m_{\_I\_p}$ - two consecutive current samples
- $m_{\_FIx1024}$ is 48-bit HPF register, which contains the AC value of I, multiplied by constant value 1024. The register consists of two parts: the MSB part - $m_{\_FIx1024\_h}$ and LSB part - $m_{\_FIx1024\_l}$.
- $m_{\_I_{AC}}$ is AC part of instantaneous sample of current signal. It represents the result of filtering operation and it is further used by FSM F1.

$$m_{\_I\_p} \rightarrow RegA\_l$$
$$m_{\_I} \rightarrow RegB\_l$$
$$RegA\_1 \rightarrow RegA\_h$$
$$RegA\_h \rightarrow RegA\_l$$

The operation sequence for the offset elimination,
performed by F2, is given in the Fig.4. The operations are carried out by Block 1.

The similar procedure is made for processing of \( m_{\text{Vac}} \) (necessary for obtaining \( V_{\text{RMS}} \)). The intermediate results are stored in 24-bit RAM registers: \( m_{\text{FVx1024\_h}} \) and \( m_{\text{FVx1024\_l}} \).

### D. The operation of F3 FSM

The fourth sub-sequence of the control unit, manages the calculations that are periodically repeated every second and consists of one three hundred and four states.

Based on accumulating sums \( m_{\text{AccIac}^2} \), \( m_{\text{AccVac}^2} \), \( m_{\text{AccP}} \) and \( m_{\text{AccQ}} \), arithmetical operations are performed by Block 2 to generate voltage and current root mean square values \( m_{\text{IRMS}} \) and \( m_{\text{V RMS}} \) and mean active and reactive power values \( m_{\text{P}} \) and \( m_{\text{Q}} \). The sequence of operations is performed by FSM F3.

The interior structure of Block 2 is given in Fig.5. It consists of two registers named RegC and RegD and arithmetical units that implement square rooting, subtraction, multiplication and division.

![Fig.5 The structure of Block 2](image)

The sequence, controlled by F3 that generates current root mean square \( m_{\text{IRMS}} \), is given in Fig.6.

To generate \( m_{\text{IRMS}} \), accumulated sum \( m_{\text{AccIac}^2} \) is stored into RegC and then, it is divided by 4096. Next, square rooting operation is performed over the average value of voltage square. Then, current offset \( m_{\text{IACoff}} \) is subtracted, multiplied with gain correction \( m_{\text{Igain}} \) and root mean square of current is obtained (Fig.6).

The similar processing steps are conducted for \( m_{\text{V RMS}} \). For mean active and reactive power calculation the square root calculation is avoided. Apparent power \( m_{\text{S}} \) is obtained by multiplying \( m_{\text{IRMS}} \) and \( m_{\text{V RMS}} \), and power factor \( m_{\text{CosF}} \) – by dividing active \( m_{\text{P}} \) and apparent power \( m_{\text{S}} \).

In addition to finding mean active (\( m_{\text{P}} \)), reactive (\( m_{\text{Q}} \)) and apparent power (\( m_{\text{S}} \)), the distortion power \([6]\) (stored in the register \( m_{\text{D}} \)) calculation is provided. F3 controls the operations producing the \( m_{\text{D}} \). Arithmetical operators used to calculate the value of \( m_{\text{D}} \) belong to blocks 1 and 2. The structure of Block 1 had to be slightly modified. The new input is introduced to RegB which makes the connection from the multiplication unit from Block2. The result of multiplication operation, done by arithmetical operator within Block2, has to be transferred to the RegB in Block1. The sequence is given in Fig.7.

At the beginning, the register RegA is reset to zero, and the content of register \( m_{\text{S}} \) is copied to both of the registers RegC and RegD. The squaring operation is performed and the result is moved to the RegA. Then, the active power \( m_{\text{P}} \) is moved to RegC and RegD, and the multiplication is performed. The result is subtracted from register RegA. The same operations are done with the value \( m_{\text{Q}} \). After, the content of RegA is moved to the RegC, and square root operation is performed. Finally, the result is moved from RegD into the \( m_{\text{D}} \), which is stored in the RAM memory.
III. THE IMPLEMENTATION RESULTS

The most of optimization process considered the DSP’s control unit. The control unit incorporates over six hundred states and this large number of states required huge combinational logic of synthesized FSM. The implementation occupies large portion of DSP’s area. Also, it represents one of the largest power consumers among DSP’s blocks.

The following power minimization techniques were used: FSM decomposition [7, 8], clock gating and Grey code encoding [9]. The first technique divides large control unit into several smaller state machines, simplifying their combinatorial logic blocks. The division of control unit into smaller state machines has positive effect on power dissipation. Furthermore, the clock gating disables inactive parts of FSM by stopping its clock signal, and, reduces the switching activity within the combinatorial logic blocks. Besides, Gray binary encodings are assigned to the FSM’s states.

The transition graph of original FSM was considered first, and after, divided into four sub-graphs (F0, F1, F2 and F3) that jointly produce the equivalent behaviour as the original FSM. The decomposition is performed by considering the datapath architecture. The states within one subset control the arithmetical operations performed by same part of DSP. As stated earlier, F1 and F2 perform the operations within the Block 1, while F3 – mainly within the Block 2.

After the FSM decomposition is done, the clock gating is introduced in the FSM’s implementation. New circuit is added into control logic block which identifies currently active sub FSM. The circuit also provides clock input signals to sub FSMs. The clock signal is present only at the input of active sub FSM, and the other three sub FSMs are blocked.

When the design was verified by RTL simulation, the RTL descriptions were loaded into program for logical synthesis, Cadence's RTL Compiler that generated the netlist of digital library cells. The extracted netlist was loaded back to Verilog simulator and the simulation was performed using Cadence’ NCsim tool.

SoC Encounter has performed floorplanning, placement and routing, as well as clock and reset trees generation for complete circuit (Fig.8). At the end of logical verification process, Verilog file was extracted from layout and brought back to NCsim simulator where final check of the total digital part of the IC was performed.

During the post-layout simulation, switching activity file was obtained and the power consumption results are obtained by the SoC Encounter taking account the parasitic capacitances from layout and switching activity file.

The estimation of DSP's power consumption gave the valuable information about the energy budget and identified all power hungry components. Three power analyses were performed: for the: (a) original design (before the power minimization techniques were applied), (b) DSP design which is optimized by gating and FSM decomposition, and finally, (c) design where all proposed techniques were applied: FSM decomposition, clock gating and Gray state encoding. The Table 1 gives the simulated power consumption values of different DSP cores, derived after layout generation. The power consumption of non-optimized design was 1.82mW. When all these techniques were applied, the total power became only 1.043mW, resulting in the 42% switching power reduction, comparing to the non-optimized implementation.

IV. CONCLUSION

The architecture and the low power design aspects of the digital signal processing block embedded into a three-phase integrated power meter IC, are considered. The operations that control unit performs are described together with power-optimization results.

The power reduction techniques were successfully implemented on the optimization of the control logic block.
The control unit of DSP block, implemented as finite state machine, was decomposed into four smaller state machines, clock gating was completely introduced and Gray finite state machine encoding used. The resulting effect was the significant reduction of the power consumption.

ACKNOWLEDGEMENT

This research was partially funded by The Ministry of Education and Science of Republic of Serbia under contract No. TR32004

REFERENCES


