Symbolic State Space Generator

From Mobius Wiki
Jump to: navigation, search

Symbolic State Space Generator

The symbolic state space generator (symbolic SSG) is a component that was added to the Möbius tool for version 1.6.0. In this chapter, we first introduce this component, discuss its advantages and limitations, and describe how to use it in an efficient way. Then, we will explain how to fill up each of the input parameters of its corresponding GUI editor. The different phases of the state space generation will be briefly described. Finally, we will provide instructions on how to run the symbolic SSG from the command line. For more details on the inner workings of the symbolic SSG, please see [1] [2].

Introduction

In solving a model using numerical solution, a modeler must first run a state space generator on the model to build a representation of the stochastic process underlying the model, and then use a numerical solver to compute the measures (s)he is interested in.

A numerical solution algorithm needs data structures to represent two essential entities in the memory: the stochastic process (input) and one or more probability distribution vectors (output). The stochastic process is generated by a state space generator. In this chapter, we only consider stochastic processes that are continuous time Markov chains (CTMCs), and can therefore be represented by state transition rate matrices whose entries are the rates of exponentially distributed timed transitions from one state to another.

In many numerical solvers, sparse matrix representation and arrays are respectively used to represent the process and the probability vector(s). In such a design of data structures, the stochastic process represented by a sparse matrix almost always takes more memory space than the probability vectors.

A problem arises when the model under study is so large that the memory requirement of the two data structures exceeds the amount of memory available on a computer. For example, to analyze a model with ten million states for a specific point of time using the transient solver (TRS) and the data structures mentioned above, at least 10^7 \times 10 \times 8 bytes = 800 MB to 10^7 \times 20 \times 8 bytes = 1600 MB (depending on the model) of RAM are needed, assuming that each state leads to 10 to 20 other (successor) states and each floating-point number requires 8 bytes of storage.

That problem has stimulated a lot of research looking for more “practical” data structures, i.e., more compact (with smaller space requirements) data structures that do not incur too much time overhead. The most practical data structure that virtually all numerical solvers (including ones in Möbius) use to represent probability distribution vectors is simply an array. Efforts by researchers to design better ones have failed so far.

However, there has been a lot of successful research on design of data structures for the representation of the state transition matrix. In particular, symbolic data structures (hence the name symbolic SSG) are widely used nowadays to store state transition matrices in a very compact manner. As opposed to sparse matrix representation, symbolic data structures do not have space requirements proportional to the number of nonzero entries. In virtually all cases, they take up orders of magnitude less space than their sparse representation counterparts. This extremely low space requirement enables the modeler to numerically solve models that are much larger than was previously possible. For example, to do the same analysis we mentioned above, we only need about 100 MB of RAM using symbolic data structures. Because of the concept behind symbolic data structures and the way they are constructed, we see their benefits only in composed models, especially those consisting of atomic models of appropriate sizes; more information on this will be provided later in the section.

The major price a modeler has to pay is overhead in speed. Based on our measurements, when the symbolic SSG is used to generate the state space, numerical solution is expected to be about 3 to 10 times slower than it would be for the flat SSG. Notice that slowdown is not related to the state space generation itself. In fact, in virtually all cases, the symbolic SSG is faster than the flat SSG; in some cases it may be several times faster.

The symbolic SSG utilizes three types of symbolic data structures to represent the required entities. It uses MDDs (Multi-valued Decision Diagrams) for the set of reachable states, MxDs (Matrix Diagrams) for the state transition rate matrices, and MTMDDs (Multi-Terminal MDDs) for the values of the performance variables in each reachable state.

The symbolic SSG utilizes two types of lumping algorithm. Lumping is a technique in which states of a CTMC that are behaviorally “equivalent” are lumped together and replaced by a single state. Therefore, the number of states of a “lumped” CTMC is always equal to or smaller than the number of states of the original CTMC. The performance variables computed from a lumped CTMC are exactly the same as the performance variables computed from the original one. That means that we save on both time and space if we solve a smaller lumped CTMC rather than the original one. For example, models that consist of a number of equal parts or components have a potential for lumping that reduces the size of the state space significantly.

The first lumping technique is called model-level lumping, in which the symmetry of a Replicate/Join model is exploited as explained in Section 5.1.1 of Building Models to reduce the size of the state space. That lumping technique has been part of the symbolic SSG from the beginning.

The second lumping technique that has been added to symbolic SSG in version 1.8.0 is called compositional lumping. In compositional lumping, the lumping algorithm is applied to individual components (atomic models in the Möbius context) of a compositional model, whereas in state-level lumping the algorithm is applied to the overall CTMC generated from the model. In fact, that is the main advantage of the compositional lumping technique over state-level lumping; it is applied on CTMCs of components that are much smaller than the CTMCs of the whole model. That advantage means that, in general, compositional lumping algorithms require less time and memory space for their operation. Möbius supports ordinary compositional lumping. See [2] for more information.

To summarize, the symbolic SSG uses compact data structures to represent the state transition rate matrix of Markov chains such that it devotes most of the available RAM to the storage of the probability distribution vector(s). If possible, it uses model-level and compositional lumping algorithms to reduce the size of the state space of the CTMC that needs to be solved in order to compute the desired performance variables. That makes it possible for the modeler to numerically analyze models that are orders of magnitude larger than was previously possible using flat SSG and sparse matrix representation.

Limitations

As we mentioned above, speed is the major price we pay in using symbolic data structures. Moreover, there are a number of conditions that have to be satisfied before we can use the symbolic SSG:

  • The stochastic process underlying a model has to be Markov. Therefore, only exponential and immediate (i.e., zero-timed) transitions are allowed.
  • If the model under study is a composed one, immediate transitions must not affect or be affected by shared state variables.
  • The reward model must not have impulse reward definitions.
  • The starting state has to be stable (not vanishing), i.e., no immediate transition should be enabled in the starting state.

If any of the above conditions is not met for a model, the symbolic SSG is not usable, and its GUI shows a message indicating the unmet condition(s).

Finally, when using the iterative steady state solver, it is not possible to use the SOR (Successive Over Relaxation) variant with state spaces that are generated using symbolic SSG. Instead, the Jacobi variant should be used. The reason is that the symbolic data structure constructed by the SSG does not provide the elements of the state transition matrix in any specific order (see Section 4.3.3).

Tips on efficient use

The time and space efficiency of the symbolic SSG can vary widely depending on the way the model is built. We have observed up to one order of magnitude of speed difference among a number of composed models that result in the same underlying Markov chain. Theoretically speaking, even larger differences are possible. Here are two guidelines that will help modelers construct their models in a way that increases the performance of the symbolic SSG:

  • Do not use symbolic SSG for atomic models, especially large atomic models. Not only does it not give any better performance, it drastically reduces both time and space efficiency of state space generation and numerical analysis. For those types of models, simply use the flat state space generator.
  • Try to decompose the complete model into atomic models such that each of them has roughly 3 to 50 states. In the general case, it is impossible to estimate how many states an atomic model has. However, the modeler can sometimes make an estimation. Above 50 states, the larger the atomic models’ state spaces are, the more the state space generation performance will be negatively affected.

Parameters

The SSG Info tab (<xr id="fig:symssg_info" />) is presented when you open the interface, and allows you to specify options and edit input to the symbolic state space generator.


<figure id="fig:symssg_info">

Symssg input.png


<xr id="fig:symssg_info" nolink />: Symbolic state space generator input.
</figure>


  • The Study Name text box specifies the name of the child study. This box will show the name of the study you identified as the child when you created the symbolic SSG. You can change the child study by typing a new name in the box or clicking the Browse button and selecting the solver child from the list of available studies.
  • The Experiment List box displays the list of active experiments in the study, and will be updated in response to any changes made through the Experiment Activator button.
  • The Experiment Activator button can be used to activate or deactivate experiments defined in the child study, and provides the same functionality as the study editor Experiment Activator described in Section 7.3 of Building Models.
  • The Run Name text box specifies the name for a particular run of the symbolic state space generator. This field is used to name trace files and output files (see below), and defaults to “Results” when a new symbolic SSG is created. If no run name is given, the symbolic SSG name will be used.
  • The Build Type pull-down menu is used to choose whether the generator should be run in Optimize or Normal mode. In normal mode, the model is compiled without compiler optimizations, and you can specify various levels of output by changing the trace level, as explained below. This mode is useful for testing or debugging a model, as a text file named <Run Name>_Exp<x>_trace.txt will be created for each experiment and contain a trace of the generation. In optimize mode, the model is compiled with maximum compiler optimizations, and all trace output is disabled for maximum performance. Running the generator will write over any data from a previous run of the same name, so change the Run Name field if you wish to save output/traces from an earlier run.
  • The Trace Level pull-down menu is enabled only when Normal mode is selected as the Build Type. This feature is useful for debugging purposes, as it allows you to select the amount of detail the generator will produce in the trace. The two options for Trace Level are:
Level 0: None    The processing time for each phase of the generation is printed out (see the next section for more on generation phases). Moreover, the numbers of unlumped and lumped states are also shown. The ouput is the same as it would be if optimize mode were selected.
Level 1: Verbose    This includes everything printed in Level 0, plus detailed information about the symbolic data structures, such as the number of levels, the size of the nodes in each level, the final number of nodes, the maximum number of nodes during the runtime, and the number of firings of local and global actions.
The symbolic SSG does not produce a great deal of debugging information; therefore, you will not notice much difference between the execution times of the two different trace levels.
  • The numbers entered in the MDD Unique Table Size and MxD Unique Table Size text boxes are used by the MDD and MxD data structures. They are the sizes of the hash tables used to store all the unique nodes of the MDD and MxD data structures, respectively. Their default values are 50,000 and 10,000, and their minimum values are 5,000 and 1,000, respectively.
There is no clear-cut set of rules on how to set these numbers. The default values should be appropriate for most models. For extremely large models for which the state space generation takes much more than an hour, double or quadruple the default values. Increasing the values above a certain value deteriorates the performance of the SSG algorithm.
  • The Enable Compositional Lumping checkbox, when selected, activates the compositional lumping algorithm, which for some models (but not all of them) reduces the size of the CTMC that needs to be solved. Although enabling the algorithm increases the running time of the state space generation, in some cases, it leads to significant reduction in the numerical solution time.
  • The Place Comments in Output checkbox, when selected, will put the user-entered comments in the generator output/trace. If the box is not checked, no comments will appear, regardless of whether any have been entered.
  • The Edit Comments button allows you to enter your own comments for a run of the generator. Clicking this button brings up the window shown in <xr id="fig:ssg_commentbox" />. Type any helpful comments in the box and hit OK to save or Cancel to discard the comments. The Clear button will clear all the text from the comment box.

Generation

Clicking the Start State Space Generation button switches the view to the SSG Output tab and begins the process of generating the state space. Snapshots of the output window in different phases of the generation appear in <xr id="fig:symssg_output" />.

<figure id="fig:symssg_output">

<subfigure>Symssg output 1.png</subfigure> <subfigure>Symssg output 2.png</subfigure>
(a) Just after starting. (b) Performing phase 6 (compositional lumping is enabled).
</figure>
<xr id="fig:symssg_output" nolink />: Symbolic state space generator output.


As for the flat state space generator, the generation begins by compiling the models4 and building the generator. For each active experiment, the values of the global variables are printed.

4 Here models refers to the atomic/composed models, performance variable, and study.

The state space is then generated for the experiment in six phases if compositional lumping is enabled, and in five phases otherwise. Two progress bars at the bottom of the dialog show the amount of completed work. The Overall progress bar indicates which phase is in progress and the overall progress of the generation, and the Phase progress bar indicates the progress of each phase of the generation. The amount of time taken to complete each phase is printed after the phase is finished. At any time, you may use the Stop button to terminate the run. A text file named <Run Name>_output.txt will be created. It will mirror the data output in the GUI.

The 6 phases are:

  1. The unlumped state space is generated in the form of an MDD data structure in a number of iterations (see Section 5.1.1 of Building Models for definitions of lumped and unlumped state spaces). The total number of iterations depends on the model and is not known in advance. After each iteration is complete, the Phase progress bar shows the iteration number and the number of unlumped states that have been explored so far. When this phase is completed, the total number of explored unlumped states is printed. Phases 1, 3, and 6 are often the most time-consuming phases of the generation process.
  2. The MxD data structure representing the state transition rate matrix of the unlumped CTMC is built in this phase. This phase is usually completed very quickly.
  3. In this phase and the following one, lumping algorithms are applied on the state space to reduce the size of the CTMC that needs to be solved to compute the desired performance variables. In this phase, the compositional lumping algorithm is applied on the unlumped state space. The size of the state space lumped by the compositional lumping algorithm is printed out at the end of the step. This phase is performed in two subphases. In the first one, the initial partition for each atomic model is computed based on the values of the performance variables. In the second subphase, which is based on an efficient partition-refinement algorithm[2], the initial partition is refined repeatedly to compute the lumped state space for each atomic model. Virtually for all models, the second subphase is much faster than the first subphase.
  4. In this phase, the model-level lumping induced by Replicate/Join operators is performed. The size of the state space after compositional and model-level lumping is printed out at the end of this step. This phase usually takes the least amount of time of all the phases.
  5. For each of the performance variables defined on the model, one MTMDD symbolic data structure is built. This data structure gives the value of that performance variable for each state of the lumped state space.
  6. The “mapping” MDD, a special type of the MDD data structure, is constructed in this phase[1]. This phase has two subphases. In the first, the set of states of the mapping MDD is computed; during this process, the progress bars are not updated because the cardinality of the set is not known yet. In the second subphase, the mapping MDD construction actually starts. While the mapping MDD is being built, the Phase progress bar shows the number of states of the mapping MDD built so far, the total number of states of the mapping MDD, and the fraction of the phase that has been completed. The Overall progress bar is also updated accordingly.

Command line options

The symbolic state space generator is usually run from the GUI. However, since the models are compiled and linked with the generator libraries to form an executable, the generator can also be run from the command line (e.g., from a system shell/terminal window). The executable is found in the directory

<Möbius Project Root>/<Project Name>/Solver/<Solver Name>/

and is named <Solver Name>SymGen_<Arch>[_debug], where Arch is the system architecture (Windows, Linux, or Solaris) and _debug is appended if you build the symbolic state space generator in normal mode. The command line options and their arguments are as follows:

-P   experimentfilepath

Same as the flat state space generator. For more details, refer to Section 2.2.3.


-N   experimentname

Same as the flat state space generator. For more details, refer to Section 2.2.3.


-B   experimentnumber

Generate the state space for the experiment numbered experimentnumber, with the first experiment having number 0. For example, use “-B1” to run the second experiment (e.g., Experiment_2). Defaults to 0.


-c   

Enables the compositional lumping algorithm, which is disabled by default.


-e, -s   db_entity, db_macserver

These are internal options for interaction with the database. You can safely ignore them.


-l   trace_level[0-1]

Use trace_level as the trace level described above. Only applies when you are running the debug executable.


-t   tracefilename

Create a trace file named tracefilename as output. If this option is not specified, output is directed to the standard output stream.


-d   MDD_Utable_size, union_cache_size, local_exploration_cache_size, MxD_Utable_size, submat_cache_size, add_cache_size

These are the different parameters for the symbolic data structures. They should follow the -d option in the form of a comma-separated list of non-negative numbers. The numbers in parentheses are the default value and the minimum value enforced by the symbolic SSG.
  • MDD_Utable_size   The size of the unique table of the MDD (50,000 and 1,000)
  • union_cache_size   The cache size of the union operation of the MDD (10,000 and 500)
  • local_exploration_cache_size   The cache size of the local state space exploration operation of the MDD (10,000 and 500)
  • MxD_Utable_size   The size of the unique table of the MD (10,000 and 500)
  • submat_cache_size   The cache size of the submatrix operation of the MD (5,000 and 100)
  • add_cache_size   The cache size of the add operation of the MD (5,000 and 100)
This command line option is optional. If it is not given, the default values will be used. If the option is present and the value for a parameter is zero or not given (i.e., there is no character between two consecutive commas) the default value for that parameter is used.


Example:    sssgSymGen_Linux_debug -P. -B0 -l1 -ttrace.txt
-d 50000,,10000,0,5000,5000

  • <Solver Name> is sssg
  • <Arch> is Linux

This command will generate the state space for the first experiment (Experiment_1) with trace level 1, place the experiment files in the current directory, and create a trace file named trace.txt. All parameters for the symbolic data structures except union_cache_size and MD_Utable_size are given by the user via the -d option. For those two parameters the default values are used.

References

  1. 1.0 1.1 S. Derisavi, P. Kemper, and W. H. Sanders. Symbolic state-space exploration and numerical analysis of state-sharing composed models. Linear Algebra and Its Applications, 386:137–166, 15 July 2003.
  2. 2.0 2.1 2.2 S. Derisavi, P. Kemper, and W. H. Sanders. Lumping matrix diagram representations of Markov chains. In DSN/PDS, Yokohama, Japan, June/July 2005.