Skip to content
Snippets Groups Projects
VIPS Logo

Developing a model using Python

Tor-Einar Skog, Senior developer, NIBIO

Updated: 2023-02-23

This page builds on A step-by-step introduction to implementing prediction models in VIPS

Preparations

The tools you need to develop a VIPS model in Python are:

You should be familiar with

  • The Python programming language, version 3
  • Using Python's venv

Workflow example

The normal workflow is

  1. that you have some correctly formatted weather data in a file that you put on the project's classpath,
  2. you mix this with the other configuration data and
  3. develop the model based on these input data.
  4. You must have one main class that extends the VIPSModel Abstract Base Class, which is available in the VIPSCore-Python-Common package.
  5. The test framework can be used to test single methods that are part of the algorithms or you can test the complete model.
  6. When you're happy with how the model works you can test deploy it to the VIPSCore server (TODO: Document this)

We will take this step-by-step below

Implementing a forecasting model

We are going to implement a forecasting model for a virtual fungus called «Fungus pilosus flavis» (please bear with me, any phytopatologists who might read this). Let's say that it there is a forecasting model for it that states that

  • There is no infection risk until you have reached 500 day degrees (celcius)
  • After that, the risk multiplies by 2 for each consecutive hour of leaf wetness (starting at 1 on the first hour). When reaching the threshold of 24, there is serious risk of infection, and measures should be taken.

We'll be using MS Visual Studio Code for this example, but the process should be transferable to other IDEs.

Creating a new project for the forecasting model

We are going to create a Python package. There is a very good post about how to do this in RealPython, and Python's own documentation is also good (and a bit shorter/simpler)

Start with creating a folder named e.g. FungusPilosisFlavisModel. Enter it, and activate git inside, like this (or however you normally do it!):

$ git init

It is recommended to create a .gitignore file with this standard content:

Create and activate a Python virtualenv for this project. This can be done outside or inside the project folder, just make sure that your IDE knows where to find it. If you create it inside the folder, the IDE may autodiscover it and suggest that it will use it as the default virtualenv in this project.

$ python3 -m venv venv
$ source venv/bin/activate
(venv)$

Open the folder with your IDE, and add these files and folders

Package scaffolding

Below you'll find example contents of the pyproject.toml file. Make sure to edit the data that are specific for your project, such as

  • name
  • version
  • authors
  • dependencies
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "vips_fungus_pilosus_flavis_model"
version = "0.0.1"
description = "Example VIPS model, showcasing functionality"
readme = "README.md"
authors = [{ name="Foo Barson", email="tor-einar.skog@nibio.no" }]
license = { file = "LICENSE"}
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: GNU Affero GPL v3",
    "Operating System :: OS Independent",
]
dependencies = [
    "shapely",
    "pydantic",
    "pytz",
    "pandas",
    "vipscore_common @ git+https://gitlab.nibio.no/VIPS/vipscore-python-common.git"
]

requires-python = ">=3.9"

[project.optional-dependencies]
dev = ["pytest"]

Make the package editable locally

Create and activate a Python virtualenv for this project. This can be done outside or inside the project folder, just make sure that your IDE knows where to find it. If you create it inside the folder, the IDE may autodiscover it and suggest that it will use it as the default virtualenv in this project.

# Create virtualenv
$ python3 -m venv venv
$ source venv/bin/activate
(venv)$
# Install your package in editable mode inside your virtualenv so that you can develop and test it
(venv)$ python -m pip install -e .

Create the main VIPSModel class

In /src, create a folder named e.g. vips_fungus_pilosus_flavis_model Inside of this folder, create the main module fungus_pilosus_flavis_model.py. Then, add this contents to the file:

from vipscore_common.vips_model import VIPSModel
from vipscore_common.entities import Result, ModelConfiguration, WeatherObservation, WeatherElements
from vipscore_common.data_utils import *



class FungusPilosisFlavisModel(VIPSModel):
    """
        This is the result of a VIPS Model implementation class
    """

    MODEL_ID        = "FUNGUSPILO"
    COPYRIGHT       = "(c) 2023 ACME Industries"

Make sure you add the __init__.py file in the same folder as your module

Create the method for finding when 500 day degrees has been passed

To find out when 500 day degrees (since some date) have passed, you need the mean temperature of each day. All weather observations in VIPS are represented by an instance of the class WeatherObservation. This class has a few important properties:

  • ElementMeasurementTypeId: Rain, mean temperature, leaf wetness etc.
  • TimeMeasured
  • LogInterval: Hourly, Daily, Monthly measurement
  • Value: the numerical value of the weather observation

We need a list of one WeatherObservation with mean temperature per day. A simple approach could be:

  1. Loop through all the WeatherObservation objects, and add the value to the total day degree sum as we do so
  2. When the threshold of 500 has been reached, return the date of that WeatherObservation object.

So we could start by writing this method:

        
    THRESHOLD = 500.0

    def get_date_when_day_degree_limit_has_passed(self, observations: list):
        # Initalize the day degree counter
        day_degrees = 0.0
        # Iterate through the list of observations
        # !! Assuming the observations list is sorted chronologically!!
        for observation in observations:
            # Make sure it's only daily temperature observations that are used
            if observation.logIntervalId == WeatherObservation.LOG_INTERVAL_ID_1D and observation.elementMeasurementTypeId == WeatherElements.TEMPERATURE_MEAN:
                # Add to day_degree_sum
                day_degrees = day_degrees + observation.value
                # If threshold is reached, return the date of the current temperature measurement
                if day_degrees >= self.THRESHOLD:
                    return observation.timeMeasured
        # We have finished looping through the observations, and dayDegrees has
        # not passed 500. So we can't return a Date, we must return None(nothing)
        return None

IMPORTANT NOTE: These kinds of operations are better solved using Pandas, but for people unfamiliar with using Pandas, we stick with the simplest form of Python.

Start testing

Now that we have a method, we need to start testing. We will use pytest. Start by adding a test module named test_fungus_pilasus_flavis_model.py in the tests folder, and add imports and a method declaration:

import datetime, pytz
import unittest
from src.vips_fungus_pilosus_flavis_model.fungus_pilosus_flavis_model import *

class TestFungusPilasusFlavisModel(unittest.TestCase):
    def test_get_date_when_day_degree_limit_has_passed(self):
        # TODO: Get observations list
        observations = None
        # Instantiate the model
        instance = FungusPilosisFlavisModel()
        result = instance.get_date_when_day_degree_limit_has_passed(observations)
        expected_date = datetime(2016, 5, 25, 22, 0, tzinfo=pytz.timezone("UTC"))
        self.assertEquals(result, expected_date)

As you can tell, we lack a couple of important parts here:

  • The list of weather observations (observations)
  • The expected date We need to get hold of weather data:
  • Mean daily temperature
  • Hourly leaf wetness Test data can be obtained from NIBIO's Norwegian Agromet service. So to get daily temperature values for a period, you can run:

https://lmt.nibio.no/services/rest/vips/getdata/forecastfallback?weatherStationId=5&elementMeasurementTypes[]=TM&logInterval=1d&startDate=2016-03-01&startTime=00&endDate=2016-09-30&endTime=00&timeZone=Europe/Oslo

Save the returned results as e.g. tests/tm.json

Now we add this helper method to the test class:

import vipscore_common.data_utils

def get_temperature_data():
    with open("tests/tm.json") as f:
        return get_weather_observations_from_json(f.read())

...and we call it from the test method:




class TestFungusPilasusFlavisModel(unittest.TestCase):
    def test_get_date_when_day_degree_limit_has_passed(self):
        # Get observations list
        observations = get_temperature_data()
        # Instantiate the model
        instance = FungusPilosisFlavisModel()
        result = instance.get_date_when_day_degree_limit_has_passed(observations)
        
        self.assertEqual(result, expected_date)

Let's see if we can get this to work. Install pytest and run:

(venv) $ pip install pytest
(venv) $ pytest

If everything else is correctly set up, this will fail with the following error message:

============================================================ test session starts ============================================================
platform linux -- Python 3.10.6, pytest-7.2.1, pluggy-1.0.0
rootdir: /home/treinar/nextcloud/MaDiPHS/workshop_2023-02/FungusPilosusFlavisModel
collected 1 item                                                                                                                            

tests/test_fungus_pilasus_flavis_model.py F                                                                                           [100%]

================================================================= FAILURES ==================================================================
________________________________ TestFungusPilasusFlavisModel.test_get_date_when_day_degree_limit_has_passed ________________________________

self = <tests.test_fungus_pilasus_flavis_model.TestFungusPilasusFlavisModel testMethod=test_get_date_when_day_degree_limit_has_passed>

    def test_get_date_when_day_degree_limit_has_passed(self):
        # TODO: Get observations list
        observations = None
        # Instantiate the model
>       instance = FungusPilosisFlavisModel()
E       TypeError: Can't instantiate abstract class FungusPilosisFlavisModel with abstract methods copyright, get_model_description, get_model_name, get_model_usage, get_result, get_warning_status_interpretation, license, model_id, sample_config, set_configuration

tests/test_fungus_pilasus_flavis_model.py:15: TypeError
========================================================== short test summary info ==========================================================
FAILED tests/test_fungus_pilasus_flavis_model.py::TestFungusPilasusFlavisModel::test_get_date_when_day_degree_limit_has_passed - TypeError: Can't instantiate abstract class FungusPilosisFlavisModel with abstract methods copyright, get_model_description, get_model_n...
============================================================= 1 failed in 0.63s =============================================================

pytest fails because we have not implemented any of the abstract methods of VIPSModel. So let's do that, but only by passing them all

    def set_configuration(self, model_configuration: ModelConfiguration):
        """
            Set the configuration object (with all its possible parameters)
            Must be done before you call get_result
        """
        pass

    def get_result(self) -> list[Result]:
        """Get the results as a list of Result objects (TODO ref)"""
        pass

    @property
    def model_id(self) -> str:
        """10-character ID of the model. Must be unique (at least in the current system)"""
        pass

    @property
    def sample_config(self) -> dict:
        """A sample configuration in JSON format (TODO check relation with Dict)"""
        pass

    @property
    def license(self) -> str:
        """Returns the license for this piece of software"""
        pass

    @property
    def copyright(self) -> str:
        """Name of person/organization that holds the copyright, and contact information"""
        pass

    def get_model_name(self, language: str) -> str:
        """Returns the model name in the specified language (<a href="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO-639-2</a>)"""
        pass


    def get_model_description(self, language: str) -> str:
        """Returns the model description in the specified language (<a href="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO-639-2</a>)"""
        pass

    def get_warning_status_interpretation(self, language: str) -> str:
        """How to interpret the warning status (red-yellow-green, what does it mean?) in the specified language (<a href="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO-639-2</a>)"""
        pass

    def get_model_usage(self, language: str) -> str:
        """Technical manual for this model, in the specified language  (<a href="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO-639-2</a>)"""
        pass

Running pytest again should make the tests pass.

Create the method to calculate the infection risk

We can operate on hourly weather data for leaf wetness and calculate the infection risk. Data in will be a list of weather observations (leaf wetness, hourly). Output data will be a dictionary with timestamp as key, and the infection risk as value. So for instance for 24th July 2014 14:00 UTC there will be only one value.

To get hourly leaf wetness values for the same period and location, request:

https://lmt.nibio.no/services/rest/vips/getdata/forecastfallback?weatherStationId=5&elementMeasurementTypes[]=BT&logInterval=1h&startDate=2016-03-01&startTime=00&endDate=2016-09-30&endTime=00&timeZone=Europe/Oslo

An example of a solution can be:

    def get_infection_risk(self, observations:list):
        # Create the map with dates and infection risk values
        risk_map = {}
        
        # Counter for consecutive hours of leaf wetness
        consecutive_hours_with_leaf_wetness = 0

        # !! Assuming the observations list is sorted chronologically!!
        # Loop through the list of observations
        for observation in observations:
            # We define a lower threshold for leaf wetnes to be 10mins/hour
            if observation.value > 10.0:
                # Leaf wetness registered, add to consecutive hours counter
                consecutive_hours_with_leaf_wetness = consecutive_hours_with_leaf_wetness + 1
            else:
                # No leaf wetness, reset counter
                consecutive_hours_with_leaf_wetness = 0
            # We set the risk value
            risk_map[observation.timeMeasured] = consecutive_hours_with_leaf_wetness * 2
        # Return the map with all values
        return risk_map

Exercise: Write a test for get_infection_risk()

Putting it together

We now have the most important methods created (and successfully tested). What we need to do now is to get data in (set configuration, get weather data etc) and get the results out in the expected format.

Data in

Input data are sendt in a large lump called a ModelConfiguration. It's a key based store of many different kind of objects: Numbers, strings, dates, WeatherObservations. This configuration object is sent to the model through the method set_configuration. So to get the weather data, we need to extract them from the configuration object in that method. An example of how to do this is as follows:

First, at the top of the class, declare the object that holds the weather data:

    observations = None

This list will stay empty (NULL) until set_configuration does something about it. So let's do that, e.g.:

    def set_configuration(self, model_configuration: ModelConfiguration):
        """
            Set the configuration object (with all its possible parameters)
            Must be done before you call get_result
        """
        # Get the observation list, using the data_utils helper module
        self.observations = get_weather_observations_from_json_list(model_configuration.config_parameters["observations"])

So now we have the weather data in a list, and we can start using them

Data out

Data out are sent as a list of Result objects. The method to get the data is called get_result(), surprisingly. An example of this method could be:


    CONTROLLED_INFECTION_RISK = "CONTROLLED_INFECTION_RISK"

    def get_result(self) -> list[Result]:
        """Get the results as a list of Result objects (TODO ref)"""
        # Initialize the list of results
        results = []
        # !! Assuming the observations list is sorted chronologically!! TODO Sort algorithm
        # Which date did day degree sum exceed 500?
        day_degree_limit_reach_date = self.get_date_when_day_degree_limit_has_passed(self.observations)

        # Get infection risk for the whole period
        uncontrolled_infection_risk = self.get_infection_risk(self.observations)
        # Get all dates from the map of infection risk
        date_list = list(uncontrolled_infection_risk.keys())
        date_list.sort()

        for current_date in date_list:
            result = Result(
                validTimeStart=current_date,
                validTimeEnd=None,
                warningStatus=0 # Temporary, set it later
            )

            # If we're after the date of day degree sum > 500, use the infectionrisk
            if current_date >= day_degree_limit_reach_date:
                # Set infection risk
                result.set_value(self.MODEL_ID, self.CONTROLLED_INFECTION_RISK, "%s" % uncontrolled_infection_risk[current_date])
            else:
                # Set infection risk to 0
                result.set_value(self.MODEL_ID, self.CONTROLLED_INFECTION_RISK, "0")

            # Set the warning status
            # If controlled infection risk < 64, status is NO RISK
            # Otherwise it's HIGH RISK
            result.warning_status = Result.WARNING_STATUS_NO_RISK if uncontrolled_infection_risk[current_date] <64 else Result.WARNING_STATUS_HIGH_RISK
            results.append(result)
        return results

Now it's time to test the methods. We

    def test_get_result(self):
        """
        We get an infection risk of 10 at a certain point in the time series
        """
        tm_obs = get_temperature_data()
        lw_obs = get_lw_data()

        observations = tm_obs + lw_obs
        
        instance = FungusPilosisFlavisModel()
        model_config = ModelConfiguration(
            model_id = instance.MODEL_ID,
            config_parameters = {"observations": observations}
        )
        instance.set_configuration(model_config)

        results = instance.get_result()

        self.assertIsNotNone(results)

        self.assertEqual(int(results[5094].get_value(instance.MODEL_ID,instance.CONTROLLED_INFECTION_RISK)),10)

Implementing the meta information methods

So, now you have a forecasting model that produces the expected results. When this model is deployed to the VIPS core runtime, it is discovered automatically and added to the list of available models. In order for other systems (like VIPSLogic or another client) to be able to query and show information about the model, it needs to implement the methods that provide documentation:

  • get_model_name() - the name of the model. For instance «Fungus pilosus flavis model»
  • get_license() - Open Source? Proprietary? Your pick
  • get_copyright() - For instance «(c) 2014 Bioforsk»
  • get_model_description() - Detailed description of how the model works, from a biological perspective
  • get_model_usage() - How to configure the model (what parameters are needed, what values may they have and so on)
  • get_sample_config() - A sample JSON configuration file.

Most of these methods have two versions: One takes language into account, one doesn't. Translation in model documentation is part of a presently unwritten chapter. For now, you can do this if you want as a general pattern:

    def get_model_name(self, language = VIPSModel.default_language) -> str:
        """Returns the model name in the specified language (<a href="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO-639-2</a>)"""
        return "Fungus pilosus flavis model"

The Reference Model contains examples for translation and how to include images in the description text.