Control-Menstruation Graph

Intermediate Representations

Keith D. Cooper , Linda Torczon , in Technology a Compiler (2d Edition), 2012

5.ii.2 Graphs

While trees provide a natural representation for the grammatical structure of the source code discovered by parsing, their rigid structure makes them less useful for representing other backdrop of programs. To model these aspects of program behavior, compilers often use more than general graphs as irs. The dag introduced in the previous section is one example of a graph.

Control-Flow Graph

The simplest unit of measurement of control menses in a program is a basic cake—a maximal length sequence of straightline, or branch-gratis, code. A bones block is a sequence of operations that always execute together, unless an functioning raises an exception. Control e'er enters a basic block at its first operation and exits at its final performance.

Basic Block

a maximal-length sequence of co-operative-free code

It begins with a labelled performance and ends with a co-operative, spring, or predicated operation.

A control-catamenia graph (cfg) models the flow of command between the basic blocks in a programme. A cfg is a directed graph, G = (N, E). Each node northwardN corresponds to a bones block. Each edge eastward = (ni , northwardj ) ∈ E corresponds to a possible transfer of control from block ni to block northj .

Control-Menstruum Graph

A cfg has a node for every bones cake and an edge for each possible control transfer between blocks.

We utilise the acronym cfg for both context-free grammer (see page 86) and control-menstruum graph. The meaning should exist articulate from context.

To simplify the discussion of program analysis in Chapters 8 and 9 Chapter 8 Chapter 9 , nosotros assume that each cfg has a unique entry node, n 0, and a unique get out node, due northf . In the cfg for a procedure, n 0 corresponds to the process's entry bespeak. If a procedure has multiple entries, the compiler can insert a unique n 0 and add edges from due north 0 to each actual entry signal. Similarly, nf corresponds to the procedure's exit. Multiple exits are more common than multiple entries, but the compiler tin can easily add a unique nf and connect each of the actual exits to it.

The cfg provides a graphical representation of the possible runtime control-menstruum paths. The cfg differs from the syntax-oriented irs, such as an ast, in which the edges prove grammatical structure. Consider the post-obit cfg for a while loop:

The edge from stmt 1 back to the loop header creates a bicycle; the ast for this fragment would be acyclic. For an if-and so-else construct, the cfg is acyclic:

It shows that control ever flows from stmt 1 and stmt ii to stmt 3. In an ast, that connection is implicit, rather than explicit.

Compilers typically use a cfg in conjunction with another ir. The cfg represents the relationships among blocks, while the operations within a block are represented with some other ir, such as an expression-level ast, a dag, or 1 of the linear irs. The resulting combination is a hybrid ir.

Some authors recommend building cfgdue south in which each node represents a shorter segment of code than a bones block. The most mutual alternative block is a single-statement block. Using single-statement blocks tin simplify algorithms for analysis and optimization.

Unmarried-Statement Blocks

a cake of code that corresponds to a single source-level argument

The tradeoff betwixt a cfg built with unmarried-statement blocks and one built with basic blocks revolves around time and space. A cfg built on single-argument blocks has more nodes and edges than a cfg built with basic blocks. The single-argument version uses more than memory and takes longer to traverse than the bones-block version of a cfg. More of import, as the compiler annotates the nodes and edges in the cfg, the single-argument cfg has many more sets than the bones-block cfg. The time and infinite spent in constructing and using these annotations undoubtedly dwarfs the cost of cfg construction.

Many parts of the compiler rely on a cfg, either explicitly or implicitly. Assay to support optimization by and large begins with control-menstruum assay and cfg construction (Chapter ix). Instruction scheduling needs a cfg to understand how the scheduled code for private blocks flows together (Chapter 12). Global annals allocation relies on a cfg to empathize how ofttimes each operation might execute and where to insert loads and stores for spilled values (Affiliate xiii).

Dependence Graph

Compilers also use graphs to encode the flow of values from the point where a value is created, a definition, to any point where it is used, a use. A data-dependence graph embodies this relationship. Nodes in a information-dependence graph represent operations. Most operations contain both definitions and uses. An edge in a data-dependence graph connects two nodes, one that defines a value and another that uses it. We draw dependence graphs with edges that run from definition to use.

Information-Dependence Graph

a graph that models the flow of values from definitions to uses in a code fragment

To make this concrete, Figure five.iii reproduces the example from Figure 1.three and shows its information-dependence graph. The graph has a node for each statement in the block. Each edge shows the menses of a unmarried value. For example, the edge from three to vii reflects the definition of rb in argument 3 and its subsequent use in statement 7. rarp contains the starting address of the local data area. Uses of rarp refer to its implicit definition at the start of the procedure; they are shown with dashed lines.

Effigy v.3. An iloc Basic Block and Its Dependence Graph.

The edges in the graph correspond real constraints on the sequencing of operations—a value cannot be used until it has been defined. However, the dependence graph does not fully capture the program's control flow. For example, the graph requires that 1 and 2 precede 6. Zilch, yet, requires that 1 or 2 precedes 3. Many execution sequences preserve the dependences shown in the code, including 〈1, ii, iii, 4, 5, six, 7, viii, 9, 10〉 and 〈2, 1, 6, 3, seven, 4, 8, 5, 9, x〉. The liberty in this partial society is precisely what an "out-of-club" processor exploits.

At a higher level, consider the lawmaking fragment shown in Figure 5.4. References to a[i] are shown deriving their values from a node representing prior definitions of a. This connects all uses of a together through a unmarried node. Without sophisticated analysis of the subscript expressions, the compiler cannot differentiate between references to individual array elements.

Effigy 5.4. Interaction between Control Flow and the Dependence Graph.

This dependence graph is more than complex than the previous example. Nodes v and 6 both depend on themselves; they use values that they may have divers in a previous iteration. Node 6, for instance, tin can take the value of i from either 2 (in the initial iteration) or from itself (in any subsequent iteration). Nodes 4 and 5 also take two singled-out sources for the value of i: nodes 2 and 6.

Information-dependence graphs are often used as a derivative ir—constructed from the definitive ir for a specific task, used, and then discarded. They play a fundamental office in instruction scheduling (Chapter 12). They find application in a variety of optimizations, particularly transformations that reorder loops to expose parallelism and to ameliorate memory beliefs; these typically crave sophisticated assay of assortment subscripts to determine more precisely the patterns of access to arrays. In more sophisticated applications of the information-dependence graph, the compiler may perform extensive analysis of array subscript values to determine when references to the same array can overlap.

Call Graph

To address inefficiencies that ascend beyond procedure boundaries, some compilers perform interprocedural analysis and optimization. To represent the runtime transfers of command between procedures, compilers use a call graph. A call graph has a node for each process and an edge for each distinct procedure call site. Thus, the code calls q from three textually distinct sites in p; the call graph has iii edges (p, q), one for each phone call site.

Interprocedural

Whatever technique that examines interactions across multiple procedures is called interprocedural.

Intraprocedural

Any technique that limits its attention to a single procedure is called intraprocedural.

Phone call Graph

a graph that represents the calling relationships among the procedures in a programme

The call graph has a node for each process and an border for each call site.

Both software-engineering science do and language features complicate the construction of a phone call graph.

Separate compilation, the practise of compiling pocket-size subsets of a program independently, limits the compiler'south ability to build a call graph and to perform interprocedural analysis and optimization. Some compilers build partial call graphs for all of the procedures in a compilation unit of measurement and perform analysis and optimization across that set up. To analyze and optimize the whole programme in such a system, the programmer must present information technology all to the compiler at once.

Procedure-valued parameters, both as input parameters and as return values, complicate phone call-graph construction by introducing ambiguous telephone call sites. If fee takes a process-valued argument and invokes it, that site has the potential to call a different procedure on each invocation of fee. The compiler must perform an interprocedural analysis to limit the set of edges that such a call induces in the call graph.

Object-oriented programs with inheritance routinely create cryptic procedure calls that can only be resolved with boosted type information. In some languages, interprocedural analysis of the form hierarchy tin can provide the data needed to disambiguate these calls. In other languages, that information cannot exist known until runtime. Runtime resolution of ambiguous calls poses a serious trouble for call graph construction; it also creates significant runtime overheads on the execution of the ambiguous calls.

Section nine.4 discusses practical techniques for telephone call graph construction.

Section Review

Graphical irs present an abstract view of the code being compiled. They differ in the meaning imputed to each node and each edge.

In a parse tree, nodes represent syntactic elements in the source-linguistic communication grammer, while the edges tie those elements together into a derivation.

In an abstract syntax tree or a dag, nodes represent concrete items from the source-language program, and edges tie those together in a way that indicates control-flow relationships and the flow of data.

In a control-period graph, nodes represent blocks of lawmaking and edges correspond transfers of command between blocks. The definition of a block may vary, from a single statement through a bones block.

In a dependence graph, the nodes correspond computations and the edges stand for the flow of values from definitions to uses; as such, edges besides imply a partial order on the computations.

In a call graph, the nodes represent individual procedures and the edges represent individual call sites. Each call site has a singled-out edge to provide a representation for call-site specific noesis, such every bit parameter bindings.

Graphical irsouth encode relationships that may exist difficult to correspond in a linear ir. A graphical ir tin provide the compiler with an efficient way to move betwixt logically continued points in the program, such equally the definition of a variable and its use, or the source of a conditional branch and its target.

Review Questions

1.

Compare and contrast the difficulty of writing a prettyprinter for a parse tree, an ast and a dag. What boosted data would exist needed to reproduce the original code's format precisely?

2.

How does the number of edges in a dependence graph grow as a function of the input programme's size?

Prettyprinter

a program that walks a syntax tree and writes out the original code

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780120884780000050

Advances in Computers

Neil Walkinshaw , in Advances in Computers, 2013

3.2.ane.1 Data Menstruum

The CFG in itself only conveys data well-nigh the possible guild(s) in which statements can exist executed. To add together information nearly the possible menstruation of variable values from one statement to the other, we first by defining two types of human relationship. For a given node s in a CFG, the office def ( s ) identifies the prepare of variables that are assigned a value (i.e., divers) at s . The function employ ( s ) identifies the set up of variables that are used at southward .

The above def and use relations can, forth with the CFG, be used to compute the reaching definitions [2]. For a given definition of a variable in the lawmaking, the reaching definitions analysis computes the subsequent points in the lawmaking where that specific definition is used. To go far slightly more formal, each node due south in the CFG is annotated with a ready of variable-node pairs v , north for each v use ( southward ) . These are computed such that the value of v is divers at due north , and there is a definition-complimentary path with respect to 5 from north to s . These reaching definitions are visualized as dashed edges in the extended CFG, shown in Fig. eight.

Fig. 8. CFG with reaching definitions shown as dashed lines.

Intuitively, this graph captures the basic elements of program beliefs. The complex interrelationships betwixt data and control are summarized in a single graph. The intuitive value is clear; it is straightforward to visually trace how different variables affect each other, and how dissimilar sets of variables feature in particular paths through the code. In practice such graphs are rarely used as visual aids, merely form the basis for more advanced analyses, as will be shown below.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012408089800001X

Data-Flow Assay

Keith D. Cooper , Linda Torczon , in Engineering a Compiler (Second Edition), 2012

Computing Say-so Frontiers

To brand ϕ-insertion efficient, we demand to summate the dominance frontier for each node in the menses graph. We could codify a information-menstruum problem to compute df(n) for each due north in the graph. Using both the dominator tree and the cfg, nosotros can codify a unproblematic and directly algorithm, shown in Figure 9.viii. Since only nodes that are join points in the cfg can be members of a say-so frontier, we starting time identify all of the join points in the graph. For a bring together point j, we examine each of its cfg predecessors.

Figure 9.eight. Algorithm for Computing Dominance Frontiers.

The algorithm is based on three observations. First, nodes in a df ready must be join points in the graph. 2d, for a join point j, each predecessor k of j must have jdf(thousand), since k cannot dominate j if j has more than one predecessor. Finally, if jdf(k) for some predecessor k, and then j must also be in df(l) for each l ∈ Dom(grand), unless l ∈ Dom(j).

The algorithm follows these observations. Information technology locates nodes j that are bring together points in the cfg. So, for each predecessor p of j, it walks upwards the dominator tree from p until it finds a node that dominates j. From the 2nd and third observations in the preceding paragraph, j belongs in df(l) for each node l that the algorithm traverses in this dominator-tree walk, except for the last node in the walk, since that node dominates j. A small amount of bookkeeping is needed to ensure that any due north is added to a node'southward dominance borderland but in one case.

To see how this works, consider again the case cfg and its potency tree. The analyzer examines the nodes in some order, looking for nodes with multiple predecessors. Bold that it takes the nodes in name order, it finds the join points as B ane, then B 3, then B 7.

1.

B 1 For cfg-predecessor B 0, the algorithm finds that B 0 is IDom(B 1), then information technology never enters the while loop. For cfg-predecessor B 3, it adds B one to df(B 3) and advances to B 1. It adds B 1 to df(B 1) and advances to B 0, where it halts.

2.

B 3 For cfg-predecessor B ii, information technology adds B three to df(B 2), advances to B 1 which is IDom(B 3), and halts. For cfg-predecessor B vii, it adds B 3 to df(B 7) and advances to B 5. It adds B 3 to df(B 5) and advances to B 1, where it halts.

iii.

B vii For cfg-predecessor B 6, it adds B 7 to df(B 6), advances to B five which is IDom(B 7), and halts. For cfg-predecessor B 8, information technology adds B 7 to df(B 8) and advances to B five, where it halts.

Accumulating these results, we obtain the following authorization frontiers:

B 0 B ane B 2 B 3 B 4 B five B vi B 7 B 8
df {B i} {B iii} {B 1} {B iii} {B 7} {B 3} {B vii}

Read total chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780120884780000098

Introduction to Optimization

Keith D. Cooper , Linda Torczon , in Engineering a Compiler (Second Edition), 2012

Solving the Data-Period Problem

To compute the LiveOut sets for a procedure and its cfg, the compiler can utilise a iii-stride algorithm.

ane.

Build a cfg This footstep is conceptually simple, although language and architecture features can complicate the trouble (come across Section 5.3.4).

two.

Gather initial information The analyzer computes a UEVar and VarKill set for each block b in a simple walk, as shown in Effigy 8.14a.

iii.

Solve the equations to produce LiveOut(b) for each cake b Figure viii.14b shows a simple iterative fixed-indicate algorithm that will solve the equations.

Effigy 8.xiv. Iterative Alive Analysis.

The following sections work through an case ciphering of 50iveOut. Section nine.2 delves into information-flow computations in more depth.

Gathering Initial Information

To compute FiftyiveOut, the analyzer needs UEVar and VarKill sets for each cake. A single pass can compute both. For each block, the analyzer initializes these sets to ∅. Side by side, information technology walks the block, in order from pinnacle to bottom, and updates both UEVar and VarKill to reflect the touch of each operation. Figure 8.14a shows the details of this computation.

Consider the cfg with a simple loop that contains an if-then construct, shown in Effigy 8.15a. The code abstracts away many details. Effigy viii.15b shows the corresponding UEVar and VarKill sets.

Figure viii.15. Example FiftyiveOut Computation.

Solving the Equations for LiveOut

Given the UEVar and VarKsick sets, the compiler applies the algorithm from Figure 8.14b to compute LiveOut sets for each node in the cfg. It initializes all of the LiveOut sets to ∅. Side by side, it computes the LiveOut set for each cake, in lodge from B 0 to B 4. It repeats the process, computing FiftyiveOut for each node in order until the LiveOut sets no longer change.

The tabular array in Figure 8.15c shows the values of the LiveOut sets at each iteration of the solver. The row labelled Initial shows the initial values. The first iteration computes an initial approximation to the 50iveOut sets. Because it processes the blocks in ascending order of their labels, B 0, B ane, and B 2 receive values based solely on the UEVar sets of their cfg successors. When the algorithm reaches B 3, it has already computed an approximation for LiveOut(B ane), so the value that it computes for B 3 reflects the contribution of the new value for LiveOut(B 1). LiveOut(B iv) is empty, as befits the exit block.

In the 2nd iteration, the value southward is added to 50iveOut(B 0) equally a consequence of its presence in the approximation of LiveOut(B 1). No other changes occur. The third iteration does non change the values of the 50iveOut sets and halts.

The social club in which the algorithm processes the blocks affects the values of the intermediate sets. If the algorithm visited the blocks in descending gild of their labels, information technology would crave ane fewer pass. The terminal values of the LiveOut sets are independent of the evaluation order. The iterative solver in Figure 8.14 computes a fixed-point solution to the equations for LiveOut.

The algorithm will halt because the FiftyiveOut sets are finite and the recomputation of the LiveOut set for a cake tin only increment the number of names in that fix. The just mechanism in the equation for excluding a proper name is the intersection with

. Since 5arYardsick does not alter during the computation, the update to each LiveOut set increases monotonically and, thus, the algorithm must somewhen halt.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780120884780000086

Scalar Optimizations

Keith D. Cooper , Linda Torczon , in Engineering science a Compiler (Second Edition), 2012

10.two.2 Eliminating Useless Control Flow

Optimization can change the ir class of the program then that information technology has useless control flow. If the compiler includes optimizations that tin produce useless control period as a side effect, then it should include a pass that simplifies the cfg by eliminating useless control flow. This section presents a simple algorithm chosen Clean that handles this chore.

Make clean operates straight on the procedure's cfg. It uses four transformations, shown in the margin. They are practical in the post-obit guild:

one.

Fold a Redundant Branch If Clean finds a block that ends in a branch, and both sides of the co-operative target the same block, it replaces the branch with a jump to the target cake. This state of affairs arises as the result of other simplifications. For example, Bi might have had two successors, each with a spring to Bj . If another transformation had already emptied those blocks, then empty-cake removal, discussed side by side, might produce the initial graph shown in the margin.

2.

Remove an Empty Block If Make clean finds a cake that contains merely a leap, it tin can merge the block into its successor. This situation arises when other passes remove all of the operations from a block Bi . Consider the left graph of the pair shown in the margin. Since Bi has only one successor, Bj , the transformation retargets the edges that enter Bi to Bj and deletes Bi from Bj 's gear up of predecessors. This simplifies the graph. It should also speed up execution. In the original graph, the paths through Bi needed two control-flow operations to attain Bj . In the transformed graph, those paths apply one operation to reach Bj .

3.

Combine Blocks If Make clean finds a block Bi that ends in a jump to Bj and Bj has only one predecessor, information technology can combine the two blocks, as shown in the margin. This situation can arise in several means. Another transformation might eliminate other edges that entered Bj , or Bi and Bj might exist the result of folding a redundant co-operative (described previously). In either case, the two blocks can be combined into a unmarried block. This eliminates the leap at the end of Bi .

iv.

Hoist a Branch If Make clean finds a block Bi that ends with a bound to an empty block Bj and Bj ends with a co-operative, Clean can supplant the block-catastrophe leap in Bi with a copy of the branch from Bj . In effect, this hoists the branch into Bi , as shown in the margin. This situation arises when other passes eliminate the operations in Bj , leaving a leap to a co-operative. The transformed code achieves the same effect with only a branch. This adds an edge to the cfg. Observe that Bi cannot be empty, or else empty block removal would have eliminated it. Similarly, Bi cannot be Bj 'southward sole predecessor, or else Clean would accept combined the two blocks. (After hoisting, Bj still has at least one predecessor.)

Some bookkeeping is required to implement these transformations. Some of the modifications are fiddling. To fold a redundant branch in a program represented with iloc and a graphical cfg, Clean just overwrites the block-ending branch with a bound and adjusts the successor and predecessor lists of the blocks. Others are more than difficult. Merging two blocks may involve allocating space for the merged block, copying the operations into the new block, adjusting the predecessor and successor lists of the new block and its neighbors in the cfg, and discarding the two original blocks.

Many compilers and assemblers accept included an advert hoc pass that eliminates a jump to a jump or a spring to a branch. Clean achieves the same consequence in a systematic way.

Make clean applies these four transformations in a systematic fashion. It traverses the graph in postorder, so that Bi 'due south successors are simplified before Bi , unless the successor lies along a back edge with respect to the postorder numbering. In that case, Clean will visit the predecessor earlier the successor. This is unavoidable in a cyclic graph. Simplifying successors earlier predecessors reduces the number of times that the implementation must move some edges.

In some situations, more than 1 of the transformations may apply. Careful analysis of the various cases leads to the lodge shown in Figure 10.2, which corresponds to the social club in which they are presented in this section. The algorithm uses a serial of if statements rather than an if-then-else to let it apply multiple transformations in a single visit to a block.

Figure 10.2. The Algorithm for Clean .

If the cfg contains back edges, then a pass of Clean may create additional opportunities—namely, unprocessed successors along the back edges. These, in turn, may create other opportunities. For this reason, Clean repeats the transformation sequence iteratively until the cfg stops changing. It must compute a new postorder numbering betwixt calls to OnePass because each pass changes the underlying graph. Effigy 10.2 shows pseudo-code for Clean .

Clean cannot, by itself, eliminate an empty loop. Consider the cfg shown in the margin. Assume that cake B 2 is empty. None of Clean 's transformations tin eliminate B 2 considering the branch that ends B ii is not redundant. B 2 does not finish with a bound, and then Clean cannot combine information technology with B 3. Its predecessor ends with a branch rather than a spring, then Make clean tin can neither combine B two with B 1 nor fold its branch into B 1.

Notwithstanding, cooperation betwixt Clean and Dead can eliminate the empty loop. Dead used control dependence to mark useful branches. If B 1 and B iii contain useful operations, simply B two does non, then the Marking pass in Dead will make up one's mind that the branch ending B ii is not useful because B 2rdf(B 3). Considering the co-operative is useless, the code that computes the branch condition is also useless. Thus, Expressionless eliminates all of the operations in B 2 and converts the branch that ends it into a jump to its closest useful postdominator, B 3. This eliminates the original loop and produces the cfg labelled "After Dead" in the margin.

In this form, Clean folds B 2 into B i, to produce the cfg labelled "Remove B 2" in the margin. This action also makes the branch at the end of B 1 redundant. Clean rewrites it with a leap, producing the cfg labelled "Fold the Branch" in the margin. At this signal, if B 1 is B 3's sole remaining predecessor, Make clean coalesces the two blocks into a single block.

This cooperation is simpler and more effective than calculation a transformation to Make clean that handles empty loops. Such a transformation might recognize a co-operative from B i to itself and, for an empty B i, rewrite it with a jump to the co-operative'due south other target. The problem lies in determining when B i is truly empty. If B i contains no operations other than the branch, then the lawmaking that computes the branch condition must prevarication outside the loop. Thus, the transformation is rubber but if the self-loop never executes. Reasoning about the number of executions of the self-loop requires knowledge about the runtime value of the comparison, a task that is, in full general, beyond a compiler'due south power. If the block contains operations, but but operations that control the branch, then the transformation would need to recognize the situation with pattern matching. In either example, this new transformation would be more than complex than the four included in Clean . Relying on the combination of Expressionless and Clean achieves the appropriate result in a simpler, more than modular style.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780120884780000104

Fault Tolerance in Computer Systems—From Circuits to Algorithms*

Shantanu Dutt , ... Fran Hanchek , in The Electrical Engineering Handbook, 2005

8.4.1 Chlorofluorocarbon with Assigned Signatures

The signatures labeling the nodes of the CFG are assigned arbitrarily using prime numbers or successive integers. These signatures are transferred to the WD explicitly by the checked processor. For this purpose, signature transfer statements are added into the source of the checked program.

Assigned signature-based methods cheque only that the nodes are executed in an immune sequence (i.e., that the main processor traverses the CFG correctly), which is not necessarily the correct sequence of the program execution since the choice itself is unchecked. To bank check the contents of a node and the type and sequence of the instructions in it, derived signature methods are required. One of the first methods adult to perform Chlorofluorocarbon using assigned signatures is structural integrity checking (SIC) (Lu, 1982; Mahmood and McCluskey, 1988), which makes utilise of the control structure embedded in the syntax of the programming language.

To perform Cfc, the source lawmaking to be executed past the checked organisation is preprocessed: the CFG is extracted, signatures are assigned to the nodes (blocks), the blocks are augmented with these signatures, and possibly the program of the master processor is modified past inserting the statements that transfer the signatures to the WD.

Methods differ in the way the signatures are assigned to the nodes and in how the reference information is represented. In SIC (Lu, 1982), the nodes are encoded (labeled) by selecting 8-fleck numbers randomly and independently from a uniform distribution. The reference information for the WD is represented under the course of a reference program that has the same CFG equally the checked program. In place of computations, it contains statements to receive and check the signatures from the main processor.

A newer command flow technique is the extended structural integrity checking (ESIC) (Mahmood and McCluskey, 1988), which extends the checking capabilities of SIC to check run-time computed procedure calls and interrupt handlers. The nodes of the CFG are encoded by successive numbers, and the reference information is extracted in the form of a stored reference signature database in which the valid successors of each signature are given in a sparse matrix format. The first and final nodes of a procedure are tagged past special flags, SOP start of procedure (SOP) and terminate of procedure (EOP). If an SOP signature is received (procedure call), then the actual state (pointer to the previous signature) is pushed to a WD internal stack and the reference state corresponding to the called procedure is reached. Afterward an EOP signature (return from a procedure phone call), the original state (arrow to the original reference signature) is popped from the stack.

With this technique, procedure calls that are nondeterministic in the preprocessing phase and interrupt handler procedures tin can be checked. Whether the correct procedure is called remains unchecked, but the return address is checked using the signature stack.

The relatively easy implementation of the preprocessor and the ability of asynchronous checking are the advantages of the assigned signature-based methods. The disadvantages are the performance degradation of the checked processor due to explicit signature transfer, the need of recompilation of the source code that prevents checking of existing executables, and the fact that the sequence of instructions in a node is unchecked.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780121709600500347

Towards a New Formal SDL Semantics based on Abstract Land Machines

U. Glässer , ... A. Prinz , in SDL '99, 1999

Case Specification

The value of execmod(Process1) is divers to be Process1-Module. By setting the value of Mod to execmod(Process1), the final step of the initialization of processes, every bit stated past the InitProcess rule, finer switches the two process agents resulting from the initialization stage to the execution phase.

By means of an appropriate labeling of command catamenia graph nodes in the graphical representation of Processi (see Figure 1), we tin can directly relate the execution of individual operations to the resulting rules in the auto model.

PROCESS  =   {Process1}

execmod(Process1)   =   Process1_Module

Characterization  =   {fifty 0,…,l iii}, startlabel(Process1)   = l 0, inputlabels(Process1)   =   {l 1}

Process1_Module:

ExecProcess

≡ if label(Self)   = 50 0 and then

AssignValue((10, Self), eval(5, Self))

label(Cocky) := l 1

if label(Self)   = fifty one then

ConsumeInputSignal({(A, 〈〉,fifty two)}, Ø)

if characterization(Self)   = fifty 2 then

AssignValue((x, Cocky), eval(x +   one, Cocky))

label(Cocky):= l 3

if label(Self)   = l 3 and so

OutputSignal(A, 〈〉, Env, undef)

label(Self):= 50 one

if label(Self) inputlabels(Self) then

HandleSignals

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780444502285500138

Performance Estimation of Embedded Software with Instruction Cache Modeling

YAU-TSUN STEVEN LI , ... ANDREW WOLFE , in Readings in Hardware/Software Co-Pattern, 2002

seven.3 Performance Problems

The structural and cache constraints are derived from the CFG and CCGs that are very similar to network flow graphs. We therefore expect that the ILP solver tin solve the trouble efficiently. Tabular array Iv shows, for each plan, the number of variables and constraints, the number of branches in solving the ILP problem, and the CPU time required to solve the problem. Since each program may take more than than one set up of functionality constraints [Li and Malik 1995], a + symbol is used to separate the number of functionality constraints in each fix. For a program having due north sets of functionality constraints, the ILP is chosen n times. The + symbol is used again to carve up the number of ILP branches and the CPU time for each ILP call.

Table IV. Functioning Issues in Cache Analysis

No. of Variables No. of Constraints ILP branches Time (sec.)
Plan d's f'southward p's x's Struct. Cache Funct.
check data 12 0 0 40 25 21 5+5 1+1 0+0
circumvolve 8 1 81 100 24 186 1 1 0
des 174 11 728 560 342 one,059 16+16 xiii+13 171+197
dhry 102 21 503 504 289 777 24×iv+26×4 1×viii 0×3+2+0 +i×2+4
djpeg 296 20 i,816 416 613 2,568 64 1 87
fdct 8 0 18 34 16 49 2 1 0
fft 27 0 0 80 46 46 11 1 0
line 31 ii 264 231 73 450 2 1 three
matcnt 20 4 0 106 59 61 4 one 0
matent2 20 2 0 92 49 54 4 1 0
piksrt 12 0 0 42 22 26 4 ane 0
sort 15 1 0 58 35 31 6 ane 0
sort2 15 0 0 l 30 27 6 1 0
scats 28 13 75 180 99 203 4 1 0
stats2 28 7 41 144 75 158 4 1 0
whetstone 52 3 301 388 108 739 14 1 2

We found that fifty-fifty with thousands of variables and constraints, the branch and jump ILP solver could still find an integer solution within the first few calls to the linear programming solver. The time taken to solve the problem ranged from less than a 2nd to a few minutes on a SG1 Indigo2 workstation. With a commercial ILP solver, CPLEX, the CPU time reduced significantly to a few seconds.

In lodge to evaluate how the cache size affects solving time, we doubled the number of cache lines (and hence the cache size) from 32 lines to 64 lines and adamant the CPU time needed to solve the ILP problems. Table Five shows the results. From the table, we determined that the number of variables and constraints inverse little when the number of cache lines is doubled. The fourth dimension to solve the ILP trouble is of the aforementioned guild as before. The master reason is that although increasing the number of cache lines increases the number of CCGs, and hence more cache constraints are generated, each CCG has fewer nodes and edges. Every bit a result, there are fewer cache constraints in each CCG. These two factors roughly cancel each other out.

Table V. Complexity of the ILP Problem: Number of Enshroud Lines Doubled to 64

No. of variables No. of constraints ILP branches Time (sec.)
Program d's f'south p's x'due south Struct. Cache Funct.
des 174 11 809 524 342 1,013 16+sixteen 7+10 90+145
whetstone 52 three 232 306 108 559 xiv 1 one

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9781558607026500156

Input-Sensitive Profiling

A. Alourani , ... M. Grechanik , in Advances in Computers, 2016

iii.iii.1 Summary

Traditional profilers link functioning metrics to nodes and paths in control flow graphs (or call graphs) by gathering performance measurements (eg, execution time) for specific input values to assistance developers better the operation of software applications past identifying what methods swallow more resources (eg, CPU and memory usage). However, these profiling techniques do not pinpoint why these methods are responsible for intensive resource usages and do not identify how the resource consumptions of the same method differ with the increasing size of the input (eg, the number of nodes in an input linked list or a tree or a bigger input assortment). That is, when executing an application with different sizes of inputs, the same method of this application oftentimes consumes different resources. A primary problem with profiling techniques is that they exercise not explain how the price that measures resource usage is affected individually by the size of the input, the algorithm (eg, recursions and loops), and the underlying implementation of algorithms (eg, traversing a data construction iteratively or recursively). Traditional profilers calculate resources usage by combining these factors and provide limited data by reporting the overall cost. It is increasingly of import to discover a fashion of identifying how private factors, including the size of input, algorithm, and implementation, affect the cost to uncover the relationship of the execution cost to the programme input.

Zaparanuks and Hauswirth advise an automated profiling methodology to assistance developers to detect algorithmic (eg, recursions and loops) inefficiencies by inferring a cost role of a program that relates the price to the input size and to predict how the resource usage would scale with increasing the size of the input. AlgoProf was developed to automatically identify algorithms (eg, recursions and loops) in a program and infer the time complexity of each algorithm for a specific algorithmic step (eg, the total number of loop iterations) during performance testing. An important feature of this profiling technique is the power to pinpoint why methods are responsible for intensive resource usages (eg, the CPU and retentiveness) and execution times. Aside from detecting the root causes of scalability problems in a program, this technique can address the trouble of measuring the size of the input automatically.

The authors introduced an algorithmic profiler for computing the cost function of a plan by identifying algorithms (eg, loops and recursions) and their inputs to measure out their sizes (eg, the number of nodes in a linked list), costs (eg, the execution times of loop iterations), and generated performance plots that mapped input size to the cost, ie, they compute cost functions for private algorithms (eg, loops and recursions). This technique allows developers to brand an accurate estimate of the computational cost as a function of algorithms (eg, loops and recursions) based on multiple program runs. The profiling technique employs cost metrics based on a repetition data construction access, such as the execution times of loop iterations, equally compared with the execution times of the whole method that is used past traditional profiling techniques. These traditional techniques provide a unmarried toll value, such every bit hotness (eg, longer execution times), whereas the algorithmic profiler provides much deeper insight into a office that maps the cost to the size and type of the input. Thus, the algorithmic complexity tin can be inferred more accurately by using cost functions to observe algorithmic inefficiencies. The algorithmic profiler enables developers to understand how the resource usage measurements are affected by the size of the input, the algorithm, and the type of underlying implementation individually. Aside from identifying the root causes of scalability problems in a programme, this technique pinpoints why methods are responsible for intensive resource usages (eg, the CPU and memory) and execution times.

The effectiveness of AlgoProf in detecting algorithmic (eg, recursions and loops) inefficiencies was evaluated with a number of programs that implement different algorithms (eg, recursions and loops). Every program uses i data structure blazon, eg, an array, a linked list, a tree, or a graph. AlgoProf was able to estimate the algorithmic complexities of all data structures in the programs accurately, along with inferring their cost functions to discover algorithmic inefficiencies. Furthermore, the power of AlgoProf to place the root causes of scalability problems was evaluated using a Coffee programme that requires the assignment of a larger array when the size of array runs out of space. AlgoProf shows a plot that links a cost (eg, execution times) with the growing array. If the array is grown by a single chemical element at a time, the cost becomes quadratic or worse, ie, exponential. If the array is grown past doubling the size and changing a single line of the source code, the cost can be reduced to a linear function. To sum upward, the proposed profiling technique proves to have the power to assist developers detect algorithmic inefficiencies and pinpoint why methods are responsible for intensive resource usage (eg, the CPU and memory) and execution time. Moreover, AlgoProf provided an effective functioning profiling solution to estimate the time complexity and detect algorithmic inefficiencies of a method in a program.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/S0065245816300262

Interface Co-Synthesis Techniques for Embedded Systems*

Pai Chou , ... Gaetano Borriello , in Readings in Hardware/Software Co-Design, 2002

2.2 Main Algorithm

The main algorithm (Fig. 3) is chosen with 4 parameters: CFG SW , CFG HW , DeviceList, and ProcessorList. CFG SW is the set up of command flow graphs to be implemented in software. CFG HW are the control period graphs implemented in hardware and requiring I/O sequencers. DeviceList is the list of peripheral devices to be connected to the processors in ProcessorList. Each device is connected to one and only one processor and each processor may control multiple devices.

Fig. 3. Main algorithm for interface synthesis

The first step of the algorithm synthesizes hardware sequencers for CFG HW and their access routines from CFG SW . The next step allocates I/O resources for the devices controlled by direct I/O, including the newly synthesized sequencers. The algorithm first attempts to use I/O ports if the processor has them. If in that location are any unconnected device ports remaining, then the algorithm connects them using retentivity-mapped I/O. Finally, the algorithm generates the device drivers by binding device ports in the SEQs to the I/O resources of the processors. In the next section, we summarize the port allotment algorithm (described in item in [1]) and nowadays a preprocessing step called port-width partition. Retentivity-mapped I/O and sequencer synthesis are described in sections 4 and 5.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781558607026500326