Data Provider

The ROAST data provider (randomization engine) is a data generator based on the Mimesis open source library.

The generated data can be used to define a range of values or a set of discrete values for a parameter of a component. Examples include:

  • Clock divider where the value can be from 3 to 7

  • Bus data width where the value can be 32, 64, or 128 bits

The functions supplied by this engine are bound to the Python random module. While the examples shown are numerical based, the base functions can use objects for elements within sequences. For example, the order of objects within a sequence can be randomized when calling the shuffle method.

Randomizer class

The Randomizer class is the base provider. The functionality includes:

Initialize random generator with seed

An optional seed can be provided for reproducable randomization. If seed is omitted or None, the current system time is used.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer(seed=12345)

Disable randomization

In some scenarios, randomization may need to be disabled. Randomization is enabled by default and can be disabled at the global or parameter level. To disable, set randomize to False. When disabled, the default value will be returned.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer(randomize=False)

To disable at the parameter level, set the randomize property to False when loading parameter values into the data provider. Both default and randomize properties are discussed in the section, Load provider with JSON data file.

Random module

The random module can be accessed directly through the class for methods such as the distribution functions.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.random.random()
0.27405095889396036
>>> randomizer.random.uniform(10, 100)
23.291595105628538

Load provider with JSON data file

A JSON data file can be loaded to establish a database of parameters and possible values.

parameters.json
 {
     "ip": {
         "attribute1": {
             "default": 64,
             "elements": [32, 64, 128, 256, 512],
             "excluded": [64, 128]
         },
         "attribute2": {
             "default": 127,
             "range": [0, 255],
             "excluded": [35, 36, 37]
         },
         "attribute3": {
             "default": 1.4,
             "range": [0, 2, 0.2]
         },
         "attribute4": {
             "default": 18,
             "range": [10, 20],
             "randomize": false
         },
         "attribute5": {
             "range": [20, 30],
             "replace": false
         },
         "attribute6": {
             "default": "0X08",
             "elements": ["0x02", "0o10", "0b10000"],
             "format": "#010b"
         },
         "attribute7": {
             "default": "0x0C",
             "range": ["0X00", "0O20", "0B10"],
             "format": "#04x"
         },
         "attribute8": {
             "default": "1.000000e+03",
             "range": ["8.000000e+02", "1.200000e+03", 50],
             "format": "e"
         },
         "attribute9": {
             "range": [1, 100]
         },
         "attribute10": {
             "range": [1, 100],
             "preset": "LOW_HEAVY"
         },
         "attribute11": {
             "range": [1, 100],
             "shape": [1, 10]
         }
     }
 }

For each attribute, there are nine properties that can be defined:

  1. elements - Sequence of discrete values.

  2. range - Sequence of range parameters: start, stop, and step.

    • If one value is provided, it is the stop value. It assumes that start is 0 and step is 1.

    • If two values are provided, it is the start and stop values. It assumes step is 1.

  3. excluded - Sequence of values to never return.

  4. default - Value returned if the global randomize or parameter randomize is set to False.

  5. format - Defines how string values are presented. See Format Specification Mini-Language for details. To specify hex, octal, or binary formatted strings, use the alternative format starting with #.

  6. randomize - Boolean to disable randomization for the specific parameter.

  7. replace - Boolean to determine whether sampling is with or without replacement. This setting will override the global replace setting.

  8. preset - Weighted randomization preset to use to shape the generated data. LOW_HEAVY, HIGH_HEAVY, NORMAL, INVERSE_NORMAL, and EXTERME_LIMITS.

  9. shape - Sequence of shape parameter values for the randomization distribution function. This is for the Alpha and Beta values of the Beta distribution.

Note

Warning

  • Either elements or range must exist or an exception will be raised. Both should not be used in the same attribute. If both exist, elements will have priority.

  • Both preset and shape should not be used for the same attribute. If both exist, preset will have priority.

When a JSON file is specified, it is read and stored into parameters.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.parameters
<Box: {'ip': {'attribute1': {'default': 64, 'elements': [32, 64, 128, 256, 512], 'excluded': [64, 128]}, 'attribute2': {'default': 127, 'range': [0, 255], 'excluded': [35, 36, 37]}, 'attribute3': {'default': 9, 'range': [0, 127]}, 'attribute4': {'default': 127}, 'attribute5': {'default': 2, 'range': [0, 10, 2], 'excluded': [4, 6]}, 'attribute6': {'default': 14, 'range': [20]}, 'attribute7': {'default': 1.4, 'range': [0, 2, 0.2]}, 'attribute8': {'default': -1.4, 'range': [0, -2, -0.2]}, 'attribute9': {'default': -1.6, 'range': [-2, -1, 0.2]}, 'attribute10': {'default': 18, 'range': [10, 20], 'randomize': False}, 'attribute11': {'range': [20, 30], 'replacement': False}, 'attribute12': {'default': 8, 'elements': [2, 8, 16], 'format': '#010b'}, 'attribute13': {'default': 12, 'range': [0, 16, 2], 'format': '#04x'}, 'attribute14': {'default': 1000.0, 'range': [800.0, 1200.0, 50], 'format': 'e'}, 'delay_500': {'default': 52, 'range': [20, 100]}, 'delay_501': {'default': 52, 'range': [20, 100]}, 'delay_502': {'default': 52, 'range': [20, 100]}, 'delay_503': {'default': 52, 'range': [20, 100]}, 'ramp_500': {'default': 2, 'range': [1, 30]}, 'ramp_501': {'default': 2, 'range': [1, 30]}, 'ramp_502': {'default': 2, 'range': [1, 30]}, 'ramp_503': {'default': 2, 'range': [1, 30]}}}>

To retrieve the default value of ip.attribute1:

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer(randomize=False)
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_value("ip.attribute1")
64
>>> randomizer.get_value("ip.attribute6")
'0b00001000'
>>> randomizer.get_value("ip.attribute7")
'0x0c'
>>> randomizer.get_value("ip.attribute8")
'1.000000e+03'

Note

  • Internally, string values are converted to float. Hex, octal, and binary strings are converted to int.

  • Changed in version 4.0: values replaced with elements due to library conflict.

  • Added in versions 4.0: format, randomize, replace, preset, and shape properties.

Load provider through configuration

The randomization parameters can be loaded through configuration. The same properties defined in the previous section, Load provider with JSON data file, are used. To load through configuration, the parameters will need to be defined within a Box so that dot-based keys can be used to traverse the dictionary. After the configuration is generated, store it into parameters. There are two methods to define through configuration.

Method 1 (nested dictionary):

conf.py
from box import Box

parameters = Box(
    {
        "ip": {
            "attribute1": {
                "default": 64,
                "elements": [32, 64, 128, 256, 512],
                "excluded": [64, 128]
            },
            "attribute2": {
                "default": 127,
                "range": [0, 255],
                "excluded": [35, 36, 37]
            },
            "attribute3": {
                "default": 1.4,
                "range": [0, 2, 0.2]
            },
            "attribute4": {
                "default": 18,
                "range": [10, 20],
                "randomize": false
            },
            "attribute5": {
                "range": [20, 30],
                "replace": false
            }
        }
    },
    box_dots=True,
)

del Box

Method 2 (dot-based):

conf.py
from box import Box

parameters = Box(default_box=True, box_intact_types=[list, tuple])
parameters.ip.attribute1.default = 64
parameters.ip.attribute1.elements = [32, 64, 128, 256, 512]
parameters.ip.attribute1.excluded = [64, 128]
parameters.ip.attribute2.default = 127
parameters.ip.attribute2.range = [0, 255]
parameters.ip.attribute2.excluded = [35, 36, 37]
parameters.ip.attribute3.default = 1.4
parameters.ip.attribute3.range = [0, 2, 0.2]
parameters.ip.attribute4.default = 18
parameters.ip.attribute4.range = [10, 20]
parameters.ip.attribute4.randomize = False
parameters.ip.attribute5.range = [20, 30]
parameters.ip.attribute5.replace = False

del Box

With either method, the configuration can be generated through the Layered Configuration System. If using pytest, use the fixture for Configuration Generation.

>>> from roast.providers.randomizer import Randomizer
>>> from roast.confParser import generate_conf
>>> randomizer = Randomizer()
>>> config = generate_conf()
>>> randomizer.parameters = config.parameters
>>> randomizer.get_value("ip.attribute1")
256

Note

New in version 4.0.

Boolean choice

This will randomly return True or False.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.boolean()
False
>>> randomizer.boolean()
True

All possible values

The does not have any randomization and is a helper function to return all possible values with excluded values removed.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_all_values("ip.attribute1")
[32, 256, 512]
>>> randomizer.get_all_values("ip.attribute8")
['8.000000e+02', '8.500000e+02', '9.000000e+02', '9.500000e+02', '1.000000e+03', '1.050000e+03', '1.100000e+03', '1.150000e+03', '1.200000e+03']

Single random value

This will return a random element.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_value("ip.attribute1")
256
>>> randomizer.get_value("ip.attribute7")
'0x10'

Sequence of random values (weighted)

A sequence of choices can be randomly generated from a sequence of values. There are three options:

  1. Length - To define how many elements are in returned sequence.

  2. Weights - To define which elements should be selected more often. The weights can be relative.

  3. Unique - To define if any selected elements can be repeated.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> attribute3 = randomizer.get_all_values("ip.attribute3")
>>> attribute3
[0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
>>> randomizer.choices(attribute3, length=4)
[0.4, 1.0, 1.0, 1.2]
>>> randomizer.choices(attribute3, length=4, unique=True)
[0.6, 0, 0.8, 1.8]
>>> weights = [10, 1, 1, 1, 10, 10, 1, 1, 1, 1, 10]
>>> randomizer.choices(attribute3, weights, length=4)
[0.4, 0, 1.0, 1.0]
>>> randomizer.choices(attribute3, weights, length=4, unique=True)
[2.0, 1.0, 0.8, 0]

Sequence of random values (essential)

A sequence of choices randomly generated from a sequence of values defined with essential elements. This can also be considered as a constrained shuffle. This is similar to the previous type, Sequence of random values (weighted), without weights and results are always unique. There are two options:

  1. Essential - To define which elements must be included in returned sequence.

  2. Length - To define how many elements are in returned sequence.

Behaviors:

  • By default, if both essential and length are not specified, a normal shuffle will be returned.

  • If only essential is specified, a random length between the length of essential sequence and the length of items will be returned.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.shuffle(items=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
[8, 4, 2, 10, 7, 6, 9, 3, 1, 5]
>>> randomizer.shuffle(items=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), length=7)
(1, 5, 10, 8, 2, 4, 6)
>>> randomizer.shuffle(items=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], essential=[3, 6, 9])
[2, 9, 5, 7, 3, 6, 8]
>>> randomizer.shuffle(items=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), essential=[3, 6, 9], length=5)
(5, 6, 7, 3, 9)

Note

New in version 4.0.

Single or sequence of random values (weighted distribution)

A single or sequence of choices can be randomly generated from a sequence of values based on the Beta probability distribution. Presets are available to provide predetermined probability of an element being selected from the provided sequence of elements.

  • LOW_HEAVY - elements near lower end more frequently

  • HIGH_HEAVY - elements near higher end more frequently

  • NORMAL - elements near median more frequently

  • INVERSE_NORMAL - elements near ends more frequently

  • EXTREME_LIMITS - elements near ends significantly more frequently

parameters.json
 {
     "ip": {
         "attribute9": {
             "range": [1, 100]
         },
         "attribute10": {
             "range": [1, 100],
             "preset": "LOW_HEAVY"
         },
         "attribute11": {
             "range": [1, 100],
             "shape": [1, 10]
         }
     }
 }
>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> values = []
>>> for _ in range(10):
...     value = randomizer.get_value("ip.attribute10") # LOW_HEAVY
...     values.append(value)
...
>>> values
[1, 13, 7, 17, 38, 21, 21, 16, 7, 7]
>>>
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> values = []
>>> for _ in range(10):
...     value = randomizer.get_value("ip.attribute11") # a=1, b=10
...     values.append(value)
...
>>> values
[1, 2, 3, 9, 3, 8, 3, 2, 13, 22]

The preset can be directly specified when calling get_value(). This will override any setting specified in the configuration.

>>> from roast.providers.randomizer import Randomizer, WeightPreset
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> values = []
>>> for _ in range(10):
...     value = randomizer.get_value("ip.attribute9", preset=WeightPreset.LOW_HEAVY)
...     values.append(value)
...
>>> values
[5, 3, 9, 35, 12, 5, 5, 9, 18, 15]
>>>
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> values = []
>>> for _ in range(10):
...     value = randomizer.get_value("ip.attribute9", preset=WeightPreset.HIGH_HEAVY)
...     values.append(value)
...
>>> values
[94, 95, 90, 99, 97, 93, 90, 80, 99, 97]

The Alpha and Beta shape parameters can also be provided as parameters. As an example, we can provide the values of the LOW_HEAVY preset. This will also override any setting in the configuration.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> values = []
>>> for _ in range(10):
...     value = randomizer.get_value("ip.attribute9", a=1, b=10)
...     values.append(value)
...
>>> values
[6, 25, 8, 16, 6, 1, 12, 5, 8, 10]

Warning

When using the weighted distribution feature, the distribution parameters cannot be changed on-the-fly. This means that if values have been generated for an attribute using LOW_HEAVY, it cannot be changed to generate HIGH_HEAVY values by simply specifying the new preset. This is because the weights for each possible value are generated during the initial call to get_value() and stored in the weights property within the randomizer for performance reasons. If changing of the distribution parameters is needed, instantiate another instance of the randomizer or empty the weight array.

Choices defined by expression

This will return randomized values defined by an expression.

For example, randomized delays and ramp times with a condition to define the relationship.

The condition is ip.delay_502 >= ip.delay_503 + ip.ramp_503

parameters.json
 {
     "ip": {
         "delay_502": {
             "default": 52,
             "range": [20, 100]
         },
         "delay_503": {
             "default": 52,
             "range": [20, 100]
         },
         "ramp_503": {
             "default": 2,
             "range": [1, 30]
         }
     }
 }
>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.generate_conditional("ip.delay_502 >= ip.delay_503 + ip.ramp_503")
{'ip.delay_502': 67, 'ip.delay_503': 43, 'ip.ramp_503': 12}

In addition to defined attributes, variables can also be used to be evaluated.

For example, a dynamically defined offset as part of the expression.

The condition is ip.delay_502 >= ip.delay_503 + offset.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> offset = 10
>>> randomizer.generate_conditional("ip.delay_502 >= ip.delay_503 + offset", offset=offset)
{'ip.delay_502': 80, 'ip.delay_503': 40}
>>> offset = 20
>>> randomizer.generate_conditional("ip.delay_502 >= ip.delay_503 + offset", offset=offset)
{'ip.delay_502': 100, 'ip.delay_503': 65}

The complete set of operators that can be used are listed in the Arithmetic Parser User Guide.

Sequence of choices defined by expression

This will return a sequence of randomized values based on a condition.

For example, a randomized sequence of four delay values where the each random value needs to be greater than the previous where the condition is: delay_500 < delay_501 < delay_502 < delay_503.

parameters.json
 {
     "ip": {
         "delay_500": {
             "default": 52,
             "range": [20, 100]
         },
         "delay_501": {
             "default": 52,
             "range": [20, 100]
         },
         "delay_502": {
             "default": 52,
             "range": [20, 100]
         },
         "delay_503": {
             "default": 52,
             "range": [20, 100]
         }
     }
 }
>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.generate_sequence("prev < current", ["ip.delay_500", "ip.delay_501", "ip.delay_502", "ip.delay_503"])
{'ip.delay_500': 41, 'ip.delay_501': 78, 'ip.delay_502': 81, 'ip.delay_503': 86}

While both prev and current are pre-defined and can be used to define the condition, any expression can be used.

Within the expression, variables can also be used to be evaluated.

For example, a dynamically defined offset as part of the expression where the condition is prev + offset <= current.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> offset = 10
>>> randomizer.generate_sequence("prev + offset <= current", ["ip.delay_500", "ip.delay_501", "ip.delay_502"], offset=offset)
{'ip.delay_500': 26, 'ip.delay_501': 36, 'ip.delay_502': 72}

The complete set of operators that can be used are listed in the Arithmetic Parser User Guide.

Exporting generated values

Whenever a random value is generated by the provider, the value is stored into the data class attribute, a Box dictionary. This can very useful for debugging purposes and can be exported to a JSON file. This JSON file can then be used as a database of previously generated values discussed in the section Exclude values used in previous runs.

Since the dictionary may have dotted keys, a special method is provided to convert any dotted keys into a nested dictionary that can be properly exported to JSON.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_value("ip.attribute1")
32
>>> randomizer.data
<Box: {'ip.attribute1': [32]}>
>>> randomizer.to_json("generated.json")
>>> with open("generated.json", "r") as f:
...    parsed = json.load(f)
...
>>> parsed
{'ip': {'attribute1': [32]}}

Note

New in version 4.0.

Exclude values used in previous runs

Values generated by the provider can be excluded for use in subsequent runs. This allows tests to sample without replacement over multiple runs. To enable this, import the exported data file as described in the previous section. Upon import, the global replace property will be set to False. To override and disable for a parameter, set the replace property to True at the parameter level as described in Load provider with JSON data file.

By default, samples are with replacement.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer(seed=12345)
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_all_values("ip.attribute1")
[32, 256, 512]
>>> randomizer.get_value("ip.attribute1")
512
>>> randomizer.get_all_values("ip.attribute1")
[32, 256, 512]

This can be set to without replacement at the parameter level. Example shown is for JSON parameter file.

parameters.json
 {
     "ip": {
         "attribute5": {
             "range": [20, 30],
             "replace": false
         }
     }
 }

The value generated from the first call will be excluded from possible values on the next call.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer(seed=12345)
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_all_values("ip.attribute5")
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> randomizer.get_value("ip.attribute5")
26
>>> randomizer.get_all_values("ip.attribute5")
[20, 21, 22, 23, 24, 25, 27, 28, 29, 30]

To exclude values from an exported data file, load as a excludes file which will set the global replace property to False. The contents of the excludes file will then be the initial value of the data class attribute. After randomized values are generated, optionally export all of the accumulated values back to the generated values JSON file.

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.get_value("ip.attribute1")
32
>>> randomizer.data
<Box: {'ip.attribute1': [32]}>
>>> randomizer.to_json("generated.json")
>>>
>>> randomizer = Randomizer()
>>> randomizer.datafile = "parameters.json"
>>> randomizer.excludes_file = "generated.json"
>>> randomizer.data
<Box: {'ip': {'attribute1': [32]}}>
>>> randomizer.get_all_values("ip.attribute1")
[256, 512]
>>> randomizer.get_value("ip.attribute1")
512
>>> randomizer.data
<Box: {'ip': {'attribute1': [32, 512]}}>
>>> randomizer.get_all_values("ip.attribute1")
[256]
>>> randomizer.to_json()
>>> with open("generated.json", "r") as f:
...    parsed = json.load(f)
...
>>> parsed
{'ip': {'attribute1': [32, 512]}}

Warning

The parameter setting will override the global setting. For example, if the global setting is False and the parameter setting is True, the attribute will sample with replacement, meaning that it will not exclude previously generated values loaded from file.

Note

New in version 4.0.

Custom data providers

Custom providers can be created and dynamically added to generate specific data.

from mimesis import BaseProvider

class SomeProvider(BaseProvider):
    class Meta:
        name = "some_provider"

    @staticmethod
    def hello():
        return "Hello!"

This can be used as such:

>>> from roast.providers.randomizer import Randomizer
>>> randomizer = Randomizer()
>>> randomizer.add_provider(SomeProvider)
>>> randomizer.some_provider.hello()
'Hello!'

Documentation for Custom Providers at Mimesis website.