Difference between revisions of "Examples"

From Mobius Wiki
Jump to: navigation, search
(Examples)
 
(71 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== <span style="font-size:120%">Fault-Tolerant Multiprocessor System</span> ==
+
Below are a list of example models built in M&ouml;bius. These examples come from the Mobius team and user community. '''Please consider sharing''' the models you have built in the past. If you do decide to share your model, please look at the [[Examples#Share_Your_Model|Share Your Model]] section for instructions on creating a new page and linking it here.
  
This section presents an example of a system that can be modeled using Möbius. It starts with a description of the system, and then guides you through one way to build a model of the system and solve it using both simulation and numerical solution. The example is intended to take you step-by-step through the process of creating and solving a model in Möbius, and to exhibit many of the capabilities and features of the tool.
+
== Examples ==
  
 +
* [[SCADA Case Study from QEST 2011]] - A case study that uses the ADVISE formalism to study the security of a SCADA and corporate network.
 +
*[[ADVISE Bank Robbery Tutorial Model]]- An example using advise with full steps and project included.
 +
*[[Conveyor Belt]]- Incomplete
 +
*[[Database2]]- Incomplete
 +
*[[Ian2]]- Incomplete
 +
*[[Faulty Proc2]]- Incomplete
 +
*[[Multi-Proc]]- Incomplete
  
=== <span style="font-size:110%">System Description</span> ===
+
== Share Your Model ==
 +
Here is [[Example Model Template|the template]] that people should use. If you are not familiar with mediawiki here are steps involved in adding your own model.
  
The system under consideration is a highly redundant fault-tolerant multiprocessor system adapted from <ref name=L:Fault:92>D. Lee, J. Abraham, D. Rennels, and G. Gilley. A numerical technique for the evaluation of large, closed fault-tolerant systems. In ''Dependable Computing for Critical Applications'', pages 95–114. Springer-Verlag, Wien, 1992.</ref> and shown in <xr id="fig:ex_multiproc" />. At the highest level, the system consists of multiple computers. Each computer is composed of 3 memory modules, of which 1 is a spare module; 3 CPU units, of which 1 is a spare unit; 2 I/O ports, of which 1 is a spare port; and 2 non-redundant error-handling chips.
+
# Log in to https://www.mobius.illinois.edu/wiki/.
 
+
# On the top right search for the name of the page you want to create.
 
+
# If there are no matching results it should prompt you to create a page with the name that you searched for.
<figure id="fig:ex_multiproc">
+
# Under the page, create a submission by simply following the template of the [[Example Model Template|the template]]. Also, one has been created as an example for your reference and can be found [[Example Model Template|here]].
[[Image:multiproc.png|center]]
+
# To view the source code of the page just click on the edit tab on the top right of the page. Copying the source code should have the format ready for you.
<br/>
+
# Under description you can simply copy your paper's Abstract section to give viewers a brief idea about your paper. Please be sure to link your paper and website. If you are confused about the syntax to create a link please refer to [https://www.mediawiki.org/wiki/Help:Links#External_links Links]. Also please upload your project by clicking on the [[Special:Upload]] link at the bottom of your page.  
<center><xr id="fig:ex_multiproc" nolink />: Fault-tolerant multiprocessor system.</center></figure>
+
# If you have any questions feel free to contact Ken Keefe at [mailto:kjkeefe@illinois.edu kjkeefe@illinois.edu].
 
 
 
 
Internally, each memory module consists of 41 RAM chips (2 of which are spare chips) and 2 interface chips. Each CPU unit and each I/O port consists of 6 non-redundant chips. The system is considered operational if at least 1 computer is operational. A computer is classified as operational if, of its components, at least 2 memory modules, at least 2 CPU units, at least 1 I/O port, and the 2 error-handling chips are functioning. A memory module is operational if at least 39 of its 41 RAM chips, and its 2 interface chips, are working.
 
 
 
Where there is redundancy (available spares) at any level of system hierarchy, there is a coverage factor associated with the component failure at that level. For example, following the parameter values used by Lee et al.<ref name=L:Fault:92 />, if one CPU unit fails, with probability 0.995 the failed unit will be replaced by the spare unit, if available, and the corresponding computer will continue to operate. On the other hand, there is also a 0.005 probability that the fault recovery mechanism will fail and the corresponding computer will cease to operate. <xr id="tab:ex_coverage" /> shows the redundant components and their associated fault coverage probability. Finally, the failure rate of every chip in the system, as in <ref name=L:Fault:92 />, is assumed to be 100 failures per billion hours<sup>1</sup>.
 
 
 
:: <span style="font-size:88%"><sup>1</sup> 0.0008766 failures per year.</span>
 
 
 
 
 
<figtable id="tab:ex_coverage">
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
  |+ <xr id="tab:ex_coverage" nolink />: Coverage probabilities.
 
|-
 
! align=center| Redundant Component
 
! align=center| Fault Coverage Probability
 
|-
 
| align=center| RAM Chip
 
| align=center| 0.998
 
|-
 
| align=center| Memory Module
 
| align=center| 0.95
 
|- align=center
 
| CPU Unit
 
| 0.995
 
|- align=center
 
| I/O Port
 
| 0.99
 
|- align=center
 
| Computer
 
| 0.95
 
|} </figtable>
 
 
 
 
 
=== <span style="font-size:110%">Getting Started</span> ===
 
 
 
A model of the system in this example is included with the Möbius distribution. Refer to Section C.1 for instructions on installing the example models. You are encouraged to open the model and follow the detailed discussions of its various components in the sections below.
 
 
 
From the Möbius <span style="font-size:115%">Project Manager</span> window, click <span style="font-size:108%"><span style="font-variant:small-caps">Project<math>\to</math>Unarchive</span></span>. A dialog will present a list of archived projects in the project directory. Choose <span style="font-size:115%">Multiproc-Paper</span> and hit <span style="font-size:115%">Unarchive</span>. After the project has been successfully unarchived, you will be prompted to resave the project using <span style="font-size:108%"><span style="font-variant:small-caps">Project<math>\to</math>Resave</span></span>. At the dialog, choose <span style="font-size:115%">Multiproc-Paper</span> again, hit <span style="font-size:115%">Resave</span>, and wait until all components have been built. The <span style="font-size:115%">Multiproc-Paper</span> project editor will appear as shown in Figure 3.1.
 
 
 
 
 
=== <span style="font-size:110%">Atomic Models</span> ===
 
 
 
To build a model for an entire system, begin by defining SAN submodels to repre-
 
sent the failures of various components in the system.
 
 
 
The SAN submodel of the CPUs is called cpu_module and is shown in <xr id="fig:ex_sancpu" />. To open this model, click the <span style="font-size:115%">Atomic</span> tab in the project panel, and then double-click on cpu_module or right-click on it and select <span style="font-size:108%"><span style="font-variant:small-caps">Open</span></span>. The places named cpus and computer_failed represent the current state of the CPUs and the current state of the multiprocessor system, respectively. That is, the number of tokens in cpus represents the number of operational CPUs in a given computer. Likewise, the number of tokens in computer_failed indicates the number of computers that have failed in the system. To open any of these places, right-click on the place and select <span style="font-size:108%"><span style="font-variant:small-caps">Edit</span></span>. This will bring up the <span style="font-size:115%">Place Attributes</span> dialog, in which you can edit the <span style="font-size:115%">Name</span> of the place and the initial marking (number of tokens) of the place. Note that the <span style="font-size:115%">Tokens</span> field can be specified with either a constant or a global variable name. For example, the place cpus has been initialized with three tokens, as each computer consists of three CPU units.
 
 
 
 
 
<figure id="fig:ex_sancpu">
 
[[Image:ex_sancpu.png|center]]
 
<br/>
 
<center><xr id="fig:ex_sancpu" nolink />: SAN submodel of cpu_module.</center></figure>
 
 
 
 
 
To create a new place, either click the blue circle icon in the toolbar or select <span style="font-size:108%"><span style="font-variant:small-caps">Elements<math>\to</math>Place</span></span> from the menu. Then click where you would like the place to go in the editor. The <span style="font-size:115%">Place Attributes</span> dialog will appear, and you can edit the <span style="font-size:115%">Name</span> of the place as well as the initial marking of the place in the <span style="font-size:115%">Tokens</span> field, as described earlier. To delete a place, right-click on it and select <span style="font-size:108%"><span style="font-variant:small-caps">Delete</span></span>, and hit <span style="font-size:115%">OK</span> to confirm.
 
 
 
The places labeled ioports, errorhandlers, and memory_failed are also included in this model to aid in reducing the size of the state space for the overall system model by lumping as many failed states together as possible. Additional state lumping (beyond that provided by the reduced base model construction method) can be achieved because once a computer fails, there is no need to keep track of which component failure caused the computer failure. More specifically, because of the assumption that all internal components of the failed computer have failed, the states that represent a computer failure due to a failure of a CPU unit, a memory module, an I/O port, or an error-handling chip are combined into a single state. The marking of the combined state is reached by setting the number of tokens in each of the places cpus, ioports, and errorhandlers to zero, setting the number of tokens in memory_failed to 2, and incrementing the number of tokens in computer_failed.
 
 
 
The failure of a CPU unit corresponds to the completion of timed activity cpu_failure. To open this activity, right-click on it and select <span style="font-size:108%"><span style="font-variant:small-caps">Edit</span></span>. This will bring up the <span style="font-size:115%">Timed Activity Attributes</span> dialog. In this dialog, you can edit the name of the activity and the distribution of its firing delay in the <span style="font-size:115%">Time distribution function</span> field. For this activity, the <span style="font-size:115%">Exponential</span> distribution should be selected. The activity completion rate is shown in <xr id="tab:ex_cpuact" />. This rate corresponds to six<sup>2</sup> times the failure rate of a chip times the number of operational CPU units in the computer. If a spare CPU unit is available (i.e., <span style="font-size:125%"><font face=Courier>cpus-<span style="font-size:115%">></span>Mark() == 3</font></span>), three cases are associated with the activity completion, as designated in the <span style="font-size:115%">Case quantity</span> field. To define the case probabilities, click on the appropriate case number’s tab and type the expression in the box. The expression for the case probability can be a constant, a global variable, or a C++ statement returning a value as in this example. The first case represents a successful coverage of a CPU unit failure. If that case occurs, the failed CPU unit is replaced by the spare unit, and its corresponding computer continues to operate. The second case represents the situation in which a CPU unit failure occurs that is not covered, but the failure of its corresponding computer is covered. If that happens and a spare computer is available, the failed computer is replaced by the spare computer and the system continues to operate. However, if no spare computer is available, the multiprocessor system fails. The third case represents the situation in which neither the CPU failure nor the corresponding computer failure is covered, resulting in a total system failure.
 
 
 
:: <span style="font-size:88%"><sup>2</sup> Remember that each CPU unit consists of 6 non-redundant chips.</span>
 
 
 
 
 
<figtable id="tab:ex_cpuact">
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
  |+ <xr id="tab:ex_cpuact" nolink />: cpu_module activity time distributions.
 
|-
 
! align=center| Activity
 
! align=center| Distribution
 
|-
 
| align=center| cpu_failure
 
| align=center| expon(0.0052596 * <span style="font-size:125%"><font face=Courier>cpus-<span style="font-size:115%">></span>Mark()</font></span>)
 
|} </figtable>
 
 
 
 
 
On the other hand, if no spare CPU is available (i.e., <span style="font-size:125%"><font face=Courier>cpus-<span style="font-size:115%">></span>Mark() == 2</font></span>), then a CPU unit failure causes a computer failure. In this marking, two possible outcomes may result from the completion of activity cpu_failure. In the first, a spare computer is available, so that the computer failure can be covered. In the second, no spare computer is available, and system failure results. <xr id="tab:ex_cpucaseprob" /> shows the case numbers and the probabilities associated with each case for the activity cpu_failure. It is clear that the case probabilities are marking-dependent, since the coverage factors depend on the state of the system.
 
 
 
 
 
<figtable id="tab:ex_cpucaseprob">
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
  |+ <xr id="tab:ex_cpucaseprob" nolink />: cpu_module case probabilities for activities.
 
|-
 
! align=center| Case
 
! align=center| Probability
 
|- align=center
 
| colspan="2"| cpu_failure
 
|-
 
| align=center| 1
 
| <span style="font-size:125%"><font face=Courier>if (cpus-<span style="font-size:115%">></span>Mark() == 3) <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.995);
 
else <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.0);</font></span>
 
|-
 
| align=center| 2
 
| <span style="font-size:125%"><font face=Courier>if (cpus-<span style="font-size:115%">></span>Mark() == 3) <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.00475);
 
else <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.95);</font></span>
 
|-
 
| align=center| 3
 
| <span style="font-size:125%"><font face=Courier>if (cpus-<span style="font-size:115%">></span>Mark() == 3) <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.00025);
 
else <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return(0.05);</font></span>
 
|} </figtable>
 
 
 
 
 
The input gate Input_Gate1 is used to determine whether the timed activity cpu_failure is enabled in the current marking, and hence can complete. The cpu_failure activity is enabled only if at least 2 working CPU units are available and their corresponding computer and the system have not failed. <xr id="tab:ex_cpuig1" /> shows the enabling predicate and function associated with this gate.
 
 
 
 
 
<figtable id="tab:ex_cpuig1">
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
  |+ <xr id="tab:ex_cpuig1" nolink />: cpu_module input gate predicates and functions.
 
|-
 
! align=center| Gate
 
! align=left| Enabling Predicate
 
! align=center| Function
 
|-
 
| align=center| Input_Gate1
 
| align=left| <span style="font-size:125%"><font face=Courier>(cpus-<span style="font-size:115%">></span>Mark()<span style="font-size:115%">></span>1) && <br/>
 
(memory_failed-<span style="font-size:115%">></span>Mark()<span style="font-size:115%"><</span>2) && <br/>
 
(computer_failed-<span style="font-size:115%">></span>Mark()<span style="font-size:115%"><</span>num_comp)</font></span>
 
| align=center| identity
 
|} </figtable>
 
 
 
 
 
The output gates OG1, OG2, and OG3 are used to determine the next marking based on the current marking and the case chosen when cpu_failure completes. They correspond to the different situations that arise because of the coverage or non-coverage of system components. <xr id="tab:ex_cpuog" /> lists the output gates and the function of each gate.
 
 
 
 
 
<figtable id="tab:ex_cpuog">
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
  |+ <xr id="tab:ex_cpuog" nolink />: cpu_module output gate functions.
 
|-
 
! align=center| Gate
 
! align=left| Function
 
|-
 
| align=center| OG1
 
| align=left| <span style="font-size:125%"><font face=Courier>if (cpus-<span style="font-size:115%">></span>Mark() == 3) <br/>&nbsp;&nbsp;&nbsp;cpus-<span style="font-size:115%">></span>Mark()--;</font></span>
 
|-
 
| align=center| OG2
 
| align=left| <span style="font-size:125%"><font face=Courier>cpus-<span style="font-size:115%">></span>Mark() = 0; <br/>ioports-<span style="font-size:115%">></span>Mark() = 0; <br/>errorhandlers-<span style="font-size:115%">></span>Mark() = 0; <br/>memory_failed-<span style="font-size:115%">></span>Mark() = 2; <br/>computer_failed-<span style="font-size:115%">></span>Mark()++;</font></span>
 
|-
 
| align=center| OG3
 
| align=left| <span style="font-size:125%"><font face=Courier>cpus-<span style="font-size:115%">></span>Mark() = 0; <br/>ioports-<span style="font-size:115%">></span>Mark() = 0; <br/>errorhandlers-<span style="font-size:115%">></span>Mark() = 0; <br/>memory_failed-<span style="font-size:115%">></span>Mark() = 2; <br/>computer_failed-<span style="font-size:115%">></span>Mark() = num_comp;</font></span>
 
|} </figtable>
 
 
 
 
 
In a SAN model, relationships between elements are designated by connecting lines or arcs. For example, places and input gates may be connected to an activity to indicate they are enabling conditions for the activity. An activity (or one of its cases) may be connected to a place or an output gate to indicate that upon completion of the activity, the marking of the place is affected or the output gate function is executed. It is not necessary to connect an output gate to a place whose marking the output gate function changes. Such a connection exists only to ease understanding of the model. To draw a connecting line or arc, choose either <span style="font-size:108%"><span style="font-variant:small-caps">Straight Connection</span></span>, <span style="font-size:108%"><span style="font-variant:small-caps">Connected Line</span></span>, or <span style="font-size:108%"><span style="font-variant:small-caps">Spline Curve</span></span> from the <span style="font-size:108%"><span style="font-variant:small-caps">Elements</span></span> menu. To connect two model elements using the first option, click on the first element and then click on the second element to draw a straight line between them. Using the second or third options, click on the first element, then click on one or more points between the two elements, and finally click on the second element. The <span style="font-size:108%"><span style="font-variant:small-caps">Connected Line</span></span> option will connect the two elements by linear interpolation of all user-defined points between them. The <span style="font-size:108%"><span style="font-variant:small-caps">Spline Curve</span></span> option is similar, but will connect the two elements with a smooth curve. The order in which the two elements are clicked is important, since the arcs, although drawn as undirected edges, are actually specified in a directed manner. For instance, to connect an input gate to an activity, the arc must be drawn ''from'' the input gate ''to'' the activity, and not vice versa. Also, there are some combinations of elements that cannot be connected, such as one place with another place or an input gate with an output gate.
 
 
 
Another way to model the failure of CPU modules would be to model the failure of a single CPU module as a SAN and replicate this model three times. However, since the failure of any chip inside the CPU module causes the CPU to fail, and each chip is assumed to have an exponentially distributed failure rate, the failure rate of one CPU module is just the sum of the failure rates of the 6 CPU chips. Therefore, modeling the failure of one CPU module, and then replicating this model three times, results in a model that is equivalent to the cpu_module submodel described above. Both approaches will generate the same number of states. In contrast, a significant state space reduction can be achieved by modeling one memory module as a SAN and replicating this model three times, instead of modeling the failure of the three memory modules in one SAN. The reason is that the failure of a single RAM chip does not cause the memory module to fail, so a memory module cannot be modeled as a single entity.
 
 
 
The SAN submodels of the I/O ports, the memory module, and the two error-handling chips are shown in <xr id="fig:ex_sanio" />, <xr id="fig:ex_sanmem" />, and <xr id="fig:ex_sanerror" />, respectively. The line of reasoning followed in modeling each of these components is similar to that followed in modeling the CPU modules. Note the similarity between the io_port_module and cpu_module SANs. A more detailed discussion of creating SAN models can be found in Section [[Building_Models#SAN|4.1]] of Building Models.
 
 
 
 
 
<figure id="fig:ex_sanio">
 
[[Image:ex_sanio.png|center]]
 
<br/>
 
<center><xr id="fig:ex_sanio" nolink />: SAN submodel of io_port_module.</center></figure>
 
 
 
<br/>
 
<br/>
 
 
 
<figure id="fig:ex_sanmem">
 
[[Image:ex_sanmem.png|center]]
 
<br/>
 
<center><xr id="fig:ex_sanmem" nolink />: SAN submodel of memory_module.</center></figure>
 
 
 
<br/>
 
<br/>
 
 
 
<figure id="fig:ex_sanerror">
 
[[Image:ex_sanerror.png|center]]
 
<br/>
 
<center><xr id="fig:ex_sanerror" nolink />: SAN submodel of the errorhandlers.</center></figure>
 
 
 
 
 
=== <span style="font-size:110%">Composed Model</span> ===
 
 
 
Now the replicate and join operations previously defined (see Section [[Building_Models#Replicate/Join|5.1]] of Building Models) are used to construct a complete composed model from the atomic models. <xr id="fig:ex_composed" /> shows the multi_proc composed model for the multiprocessor system. To open this model click the <span style="font-size:115%">Composed</span> tab in the project panel, and double-click on multi_proc or right-click on it and select <span style="font-size:108%"><span style="font-variant:small-caps">Open</span></span>.
 
 
 
 
 
<figure id="fig:ex_composed">
 
[[Image:ex_composed.png|center]]
 
<br/>
 
<center><xr id="fig:ex_composed" nolink />: Composed model multi_proc.</center></figure>
 
 
 
 
 
 
 
 
 
 
 
Möbius
 
 
 
== <span style="font-size:120%">Möbius</span> ==
 
 
 
=== <span style="font-size:110%">Motivation</span> ===
 
 
 
==== <span style="font-size:106%">''Solution''</span> ====
 
 
 
<span style="font-size:102%">'''Graph'''</span>
 
 
 
<span style="font-size:108%"><span style="font-variant:small-caps">Edit</span></span>
 
<span style="font-size:115%">Möbius</span>
 
<span style="font-size:125%"><font face=Courier>Documentation</font></span>
 
 
 
“” –
 
 
 
<equation id="eqn:binom" shownumber>
 
<center><math>f(k)=\binom{n}{k}p^k(1-p)^{n-k}\quad k=0,1,\dots,n</math></center>
 
</equation>
 
 
 
Sort of like <xr id="eqn:binom" />, but not really.
 
 
 
 
 
== References ==
 
<references />
 

Latest revision as of 19:41, 24 June 2015

Below are a list of example models built in Möbius. These examples come from the Mobius team and user community. Please consider sharing the models you have built in the past. If you do decide to share your model, please look at the Share Your Model section for instructions on creating a new page and linking it here.

Examples

Share Your Model

Here is the template that people should use. If you are not familiar with mediawiki here are steps involved in adding your own model.

  1. Log in to https://www.mobius.illinois.edu/wiki/.
  2. On the top right search for the name of the page you want to create.
  3. If there are no matching results it should prompt you to create a page with the name that you searched for.
  4. Under the page, create a submission by simply following the template of the the template. Also, one has been created as an example for your reference and can be found here.
  5. To view the source code of the page just click on the edit tab on the top right of the page. Copying the source code should have the format ready for you.
  6. Under description you can simply copy your paper's Abstract section to give viewers a brief idea about your paper. Please be sure to link your paper and website. If you are confused about the syntax to create a link please refer to Links. Also please upload your project by clicking on the Special:Upload link at the bottom of your page.
  7. If you have any questions feel free to contact Ken Keefe at kjkeefe@illinois.edu.

Fault-Tolerant Multiprocessor System[edit]

This section presents an example of a system that can be modeled using Möbius. It starts with a description of the system, and then guides you through one way to build a model of the system and solve it using both simulation and numerical solution. The example is intended to take you step-by-step through the process of creating and solving a model in Möbius, and to exhibit many of the capabilities and features of the tool.


System Description[edit]

The system under consideration is a highly redundant fault-tolerant multiprocessor system adapted from [1] and shown in <xr id="fig:ex_multiproc" />. At the highest level, the system consists of multiple computers. Each computer is composed of 3 memory modules, of which 1 is a spare module; 3 CPU units, of which 1 is a spare unit; 2 I/O ports, of which 1 is a spare port; and 2 non-redundant error-handling chips.


<figure id="fig:ex_multiproc">

Multiproc.png


<xr id="fig:ex_multiproc" nolink />: Fault-tolerant multiprocessor system.
</figure>


Internally, each memory module consists of 41 RAM chips (2 of which are spare chips) and 2 interface chips. Each CPU unit and each I/O port consists of 6 non-redundant chips. The system is considered operational if at least 1 computer is operational. A computer is classified as operational if, of its components, at least 2 memory modules, at least 2 CPU units, at least 1 I/O port, and the 2 error-handling chips are functioning. A memory module is operational if at least 39 of its 41 RAM chips, and its 2 interface chips, are working.

Where there is redundancy (available spares) at any level of system hierarchy, there is a coverage factor associated with the component failure at that level. For example, following the parameter values used by Lee et al.[1], if one CPU unit fails, with probability 0.995 the failed unit will be replaced by the spare unit, if available, and the corresponding computer will continue to operate. On the other hand, there is also a 0.005 probability that the fault recovery mechanism will fail and the corresponding computer will cease to operate. <xr id="tab:ex_coverage" /> shows the redundant components and their associated fault coverage probability. Finally, the failure rate of every chip in the system, as in [1], is assumed to be 100 failures per billion hours1.

1 0.0008766 failures per year.


<figtable id="tab:ex_coverage">

<xr id="tab:ex_coverage" nolink />: Coverage probabilities.
Redundant Component Fault Coverage Probability
RAM Chip 0.998
Memory Module 0.95
CPU Unit 0.995
I/O Port 0.99
Computer 0.95
</figtable>


Getting Started[edit]

A model of the system in this example is included with the Möbius distribution. Refer to Section C.1 for instructions on installing the example models. You are encouraged to open the model and follow the detailed discussions of its various components in the sections below.

From the Möbius Project Manager window, click Project\toUnarchive. A dialog will present a list of archived projects in the project directory. Choose Multiproc-Paper and hit Unarchive. After the project has been successfully unarchived, you will be prompted to resave the project using Project\toResave. At the dialog, choose Multiproc-Paper again, hit Resave, and wait until all components have been built. The Multiproc-Paper project editor will appear as shown in Figure 3.1.


Atomic Models[edit]

To build a model for an entire system, begin by defining SAN submodels to repre- sent the failures of various components in the system.

The SAN submodel of the CPUs is called cpu_module and is shown in <xr id="fig:ex_sancpu" />. To open this model, click the Atomic tab in the project panel, and then double-click on cpu_module or right-click on it and select Open. The places named cpus and computer_failed represent the current state of the CPUs and the current state of the multiprocessor system, respectively. That is, the number of tokens in cpus represents the number of operational CPUs in a given computer. Likewise, the number of tokens in computer_failed indicates the number of computers that have failed in the system. To open any of these places, right-click on the place and select Edit. This will bring up the Place Attributes dialog, in which you can edit the Name of the place and the initial marking (number of tokens) of the place. Note that the Tokens field can be specified with either a constant or a global variable name. For example, the place cpus has been initialized with three tokens, as each computer consists of three CPU units.


<figure id="fig:ex_sancpu">

Ex sancpu.png


<xr id="fig:ex_sancpu" nolink />: SAN submodel of cpu_module.
</figure>


To create a new place, either click the blue circle icon in the toolbar or select Elements\toPlace from the menu. Then click where you would like the place to go in the editor. The Place Attributes dialog will appear, and you can edit the Name of the place as well as the initial marking of the place in the Tokens field, as described earlier. To delete a place, right-click on it and select Delete, and hit OK to confirm.

The places labeled ioports, errorhandlers, and memory_failed are also included in this model to aid in reducing the size of the state space for the overall system model by lumping as many failed states together as possible. Additional state lumping (beyond that provided by the reduced base model construction method) can be achieved because once a computer fails, there is no need to keep track of which component failure caused the computer failure. More specifically, because of the assumption that all internal components of the failed computer have failed, the states that represent a computer failure due to a failure of a CPU unit, a memory module, an I/O port, or an error-handling chip are combined into a single state. The marking of the combined state is reached by setting the number of tokens in each of the places cpus, ioports, and errorhandlers to zero, setting the number of tokens in memory_failed to 2, and incrementing the number of tokens in computer_failed.

The failure of a CPU unit corresponds to the completion of timed activity cpu_failure. To open this activity, right-click on it and select Edit. This will bring up the Timed Activity Attributes dialog. In this dialog, you can edit the name of the activity and the distribution of its firing delay in the Time distribution function field. For this activity, the Exponential distribution should be selected. The activity completion rate is shown in <xr id="tab:ex_cpuact" />. This rate corresponds to six2 times the failure rate of a chip times the number of operational CPU units in the computer. If a spare CPU unit is available (i.e., cpus->Mark() == 3), three cases are associated with the activity completion, as designated in the Case quantity field. To define the case probabilities, click on the appropriate case number’s tab and type the expression in the box. The expression for the case probability can be a constant, a global variable, or a C++ statement returning a value as in this example. The first case represents a successful coverage of a CPU unit failure. If that case occurs, the failed CPU unit is replaced by the spare unit, and its corresponding computer continues to operate. The second case represents the situation in which a CPU unit failure occurs that is not covered, but the failure of its corresponding computer is covered. If that happens and a spare computer is available, the failed computer is replaced by the spare computer and the system continues to operate. However, if no spare computer is available, the multiprocessor system fails. The third case represents the situation in which neither the CPU failure nor the corresponding computer failure is covered, resulting in a total system failure.

2 Remember that each CPU unit consists of 6 non-redundant chips.


<figtable id="tab:ex_cpuact">

<xr id="tab:ex_cpuact" nolink />: cpu_module activity time distributions.
Activity Distribution
cpu_failure expon(0.0052596 * cpus->Mark())
</figtable>


On the other hand, if no spare CPU is available (i.e., cpus->Mark() == 2), then a CPU unit failure causes a computer failure. In this marking, two possible outcomes may result from the completion of activity cpu_failure. In the first, a spare computer is available, so that the computer failure can be covered. In the second, no spare computer is available, and system failure results. <xr id="tab:ex_cpucaseprob" /> shows the case numbers and the probabilities associated with each case for the activity cpu_failure. It is clear that the case probabilities are marking-dependent, since the coverage factors depend on the state of the system.


<figtable id="tab:ex_cpucaseprob">

<xr id="tab:ex_cpucaseprob" nolink />: cpu_module case probabilities for activities.
Case Probability
cpu_failure
1 if (cpus->Mark() == 3)
     return(0.995);

else
     return(0.0);

2 if (cpus->Mark() == 3)
     return(0.00475);

else
     return(0.95);

3 if (cpus->Mark() == 3)
     return(0.00025);

else
     return(0.05);

</figtable>


The input gate Input_Gate1 is used to determine whether the timed activity cpu_failure is enabled in the current marking, and hence can complete. The cpu_failure activity is enabled only if at least 2 working CPU units are available and their corresponding computer and the system have not failed. <xr id="tab:ex_cpuig1" /> shows the enabling predicate and function associated with this gate.


<figtable id="tab:ex_cpuig1">

<xr id="tab:ex_cpuig1" nolink />: cpu_module input gate predicates and functions.
Gate Enabling Predicate Function
Input_Gate1 (cpus->Mark()>1) &&

(memory_failed->Mark()<2) &&
(computer_failed->Mark()<num_comp)

identity
</figtable>


The output gates OG1, OG2, and OG3 are used to determine the next marking based on the current marking and the case chosen when cpu_failure completes. They correspond to the different situations that arise because of the coverage or non-coverage of system components. <xr id="tab:ex_cpuog" /> lists the output gates and the function of each gate.


<figtable id="tab:ex_cpuog">

<xr id="tab:ex_cpuog" nolink />: cpu_module output gate functions.
Gate Function
OG1 if (cpus->Mark() == 3)
   cpus->Mark()--;
OG2 cpus->Mark() = 0;
ioports->Mark() = 0;
errorhandlers->Mark() = 0;
memory_failed->Mark() = 2;
computer_failed->Mark()++;
OG3 cpus->Mark() = 0;
ioports->Mark() = 0;
errorhandlers->Mark() = 0;
memory_failed->Mark() = 2;
computer_failed->Mark() = num_comp;
</figtable>


In a SAN model, relationships between elements are designated by connecting lines or arcs. For example, places and input gates may be connected to an activity to indicate they are enabling conditions for the activity. An activity (or one of its cases) may be connected to a place or an output gate to indicate that upon completion of the activity, the marking of the place is affected or the output gate function is executed. It is not necessary to connect an output gate to a place whose marking the output gate function changes. Such a connection exists only to ease understanding of the model. To draw a connecting line or arc, choose either Straight Connection, Connected Line, or Spline Curve from the Elements menu. To connect two model elements using the first option, click on the first element and then click on the second element to draw a straight line between them. Using the second or third options, click on the first element, then click on one or more points between the two elements, and finally click on the second element. The Connected Line option will connect the two elements by linear interpolation of all user-defined points between them. The Spline Curve option is similar, but will connect the two elements with a smooth curve. The order in which the two elements are clicked is important, since the arcs, although drawn as undirected edges, are actually specified in a directed manner. For instance, to connect an input gate to an activity, the arc must be drawn from the input gate to the activity, and not vice versa. Also, there are some combinations of elements that cannot be connected, such as one place with another place or an input gate with an output gate.

Another way to model the failure of CPU modules would be to model the failure of a single CPU module as a SAN and replicate this model three times. However, since the failure of any chip inside the CPU module causes the CPU to fail, and each chip is assumed to have an exponentially distributed failure rate, the failure rate of one CPU module is just the sum of the failure rates of the 6 CPU chips. Therefore, modeling the failure of one CPU module, and then replicating this model three times, results in a model that is equivalent to the cpu_module submodel described above. Both approaches will generate the same number of states. In contrast, a significant state space reduction can be achieved by modeling one memory module as a SAN and replicating this model three times, instead of modeling the failure of the three memory modules in one SAN. The reason is that the failure of a single RAM chip does not cause the memory module to fail, so a memory module cannot be modeled as a single entity.

The SAN submodels of the I/O ports, the memory module, and the two error-handling chips are shown in <xr id="fig:ex_sanio" />, <xr id="fig:ex_sanmem" />, and <xr id="fig:ex_sanerror" />, respectively. The line of reasoning followed in modeling each of these components is similar to that followed in modeling the CPU modules. Note the similarity between the io_port_module and cpu_module SANs. A more detailed discussion of creating SAN models can be found in Section 4.1 of Building Models.


<figure id="fig:ex_sanio">

Ex sanio.png


<xr id="fig:ex_sanio" nolink />: SAN submodel of io_port_module.
</figure>



<figure id="fig:ex_sanmem">

Ex sanmem.png


<xr id="fig:ex_sanmem" nolink />: SAN submodel of memory_module.
</figure>



<figure id="fig:ex_sanerror">

Ex sanerror.png


<xr id="fig:ex_sanerror" nolink />: SAN submodel of the errorhandlers.
</figure>


Composed Model[edit]

Now the replicate and join operations previously defined (see Section 5.1 of Building Models) are used to construct a complete composed model from the atomic models. <xr id="fig:ex_composed" /> shows the multi_proc composed model for the multiprocessor system. To open this model click the Composed tab in the project panel, and double-click on multi_proc or right-click on it and select Open.


<figure id="fig:ex_composed">

Ex composed.png


<xr id="fig:ex_composed" nolink />: Composed model multi_proc.
</figure>



Möbius

Möbius[edit]

Motivation[edit]

Solution[edit]

Graph

Edit Möbius Documentation

“” –

<equation id="eqn:binom" shownumber>

f(k)=\binom{n}{k}p^k(1-p)^{n-k}\quad k=0,1,\dots,n

</equation>

Sort of like <xr id="eqn:binom" />, but not really.


References[edit]

  1. 1.0 1.1 1.2 D. Lee, J. Abraham, D. Rennels, and G. Gilley. A numerical technique for the evaluation of large, closed fault-tolerant systems. In Dependable Computing for Critical Applications, pages 95–114. Springer-Verlag, Wien, 1992.