Skip to content
Snippets Groups Projects
implement_model.md 27.91 KiB

A step-by-step introduction to implementing prediction models in VIPS

Tor-Einar Skog, Senior developer, NIBIO

Updated: 2022-08-30

What you will learn

This document describes how to implement and test a forecasting model that can be used on the VIPS platform.

Prerequisites

  • You should be familiar with how the VIPS system works. It is recommended that you read the VIPS introduction
  • You should have a basic understanding of the Java programming language.
  • You should be somewhat familiar with NetBeans, or use another IDE that your’re familiar with, just bear in mind that some of the instructions may be irrelevant to you in that case.
  • For model implementation in R or Python (currently not recommended), familiarity with the language at hand is of course recommended
  • You need the library VIPSCommon (a file called VIPSCommon-2022.1.jar) somewhere on your local computer. You can clone and build it from here: https://gitlab.nibio.no/VIPS/VIPSCommon
  • You need the example NetBeans project «FungusPilosusFlavisModel» somewhere on your local computer. You can clone it from our GitLab: https://gitlab.nibio.no/VIPS/test/funguspilosusflavismodel

Model design in VIPS

The structure of a model

Model concept

A model is conceptually illustrated above. You have a set of input data in some form, you have the analysis/algorithms happening inside the model, and the model returns a set of results. In VIPS, certain design requirements must be met:

  • The model must be programmed in Java or another language that can run on the Java Virtual Machine. This includes R and Python (only 2.7, not recommended)
  • The model must implement an interface (a design contract)
  • The model must be packaged as a JAR (Java Archive) file
  • The input data must be in a specific format (details below)
  • Results must be returned in a specific format (details below)
  • The model must provide its own description and usage information in at least English and aditionally in any preferred language

When a model meets these requirements, it can be deployed to the VIPS Core runtime server and be made available without any more configuration. The model can be called over HTTP/REST from any authenticated client on the Internet. In order to set up batch jobs (running at regular intervals) in the VIPS admin, some configuration classes need to be added to the VIPSLogic system, which acts as an authenticated client.

Developing a model

What you need to develop a VIPS model is:

  • A decent coding environment (IDE) like NetBeans, Eclipse or IntelliJ.
  • A library of VIPS classes called VIPSCommon.jar
  • A testing framework like Junit. This is normally bundled with your IDE (see above)

The normal workflow is that you have some correctly formatted (see other documentation) weather data in a file that you put on the project’s classpath, you mix this with the other configuration data and develop the model based on these input data. You must have one main class that implements the Model interface, which is available in the VIPSCommon.jar library. The test framework can be used to test single methods that are part of the algorithms or you can test the complete model.

When you’re happy with how the model works you can test deploy it to the VIPSCore server (TODO: Document this)

Implementing a forecasting model step-by-step

In this project, we are going to implement a forecasting model for a virtual fungus called «Fungus pilosus flavis» (please bear with me, any phytopatologists who might read this). Let's say that it there is a forecasting model for it that states that

  • There is no infection risk until you have reached 500 day degrees (Celcius)
  • After that, the risk multiplies by 2 for each consecutive hour of leaf wetness (starting at 1 on the first hour). When reaching the threshold of 24, there is serious risk of infection, and measures should be taken.

We’ll be using NetBeans IDE for this example, but the process should be transferable to other IDEs. NetBeans can be downloaded from here: https://netbeans.apache.org/download/ Select either the Java EE version or the one with everything. Follow the install instructions and start Netbeans.

Creating a new NetBeans project for the forecasting model

  1. Start the NetBeans application and remove the «Start Page». Select File -> New Project and select project type Java Application. Put it somewhere that you’ll remember.
  2. NetBeans sets up the project structure for you, but you should create a package. Right click Source Packages and select New -> Java Package. Name it whatever you want, e.g. com.acme.vips.funguspilosusflavis
  3. Create the main model class: Right click on the package and select New -> Java Class. Name it e.g. FungusPilosusFlavisModel
  4. The next thing you should do is to add the basic dependencency for the model: The library called VIPSCommon. Right click on Libraries and select Add JAR/Folder. Locate the file (it should be included with this documentation) and add it.

Now you should be ready to code.

Working on the model class

  1. To make sure that the file is compliant with the VIPS Model specifications, change this

    public class FungusPilosusFlavisModel {

    to this

    public class FungusPilosusFlavisModel implements Model{
  2. NetBeans complains. Click on the light bulb that pops up in the gutter, and select Add import for no.bioforsk.vips.model.Model

  3. NetBeans complains again. Click on the light bulb and select Implement all abstract methods

  4. NetBeans creates all the methods that are part of the Model interface. Please note that these methods at the moment do not do anything, except cause an error (throw an UnsupportedOperationException) if they are called. The implementation of the methods is entirely up to you.

Now the class is ready to be programmed and do some calculations.

Create the method for finding when 500 day degrees has been passed

To find out when 500 day degrees (since some date) have passed, you need the mean temperature of each day. All weather observations in VIPS are represented by an instance of the object WeatherObservation. This object has a few important properties:

  • ElementMeasurementTypeId: Rain, mean temperature, leaf wetness etc.
  • TimeMeasured
  • LogInterval: Hourly, Daily, Monthly measurement
  • Value: the numerical value of the weather observation

We need a list of one WeatherObservation with mean temperature per day. So we could start by writing this method:

public Date getDateWhenDayDegreeLimitHasPassed(List<WeatherObservation> observations)
{
}

NetBeans complains, because it can't find the definition of WeatherObservation. Click on the light bulb and select Add import for no.bioforsk.vips.entity.WeatherObservation, and then Add import for java.util.Date). NetBeans still complains, but that's because the method does not return anything yet. So a simple approach could be:

  1. Loop through all the WeatherObservation objects, and add the value to the total day degree sum as we do so
  2. When the threshold of 500 has been reached, return the date of that WeatherObservation object.

Sample code for this could be:

public Date getDateWhenDayDegreeLimitHasPassed(List<WeatherObservation> observations){

// Make sure the observations are in chronological order
Collections.sort(observations);
// Initalize the day degree counter
Double dayDegrees = 0.0;
// Iterate through the list of observations
for(WeatherObservation obs:observations)
{
    // Make sure it's only daily temperature observations that are used
    if(obs.getLogIntervalId()
            .equals(WeatherObservation.LOG_INTERVAL_ID_1D) 
        && obs.getElementMeasurementTypeId()
            .equals(WeatherElements.TEMPERATURE_MEAN))
    {
        // Add to dayDegree sum
        dayDegrees += obs.getValue();
        // If threshold is reached, return the date of the current temperature
        // measurement
        if(dayDegrees >= 500.0)
        {
            return obs.getTimeMeasured();
        }
    }
}
// We have finished looping through the observations, and dayDegrees has
// not passed 500. So we can't return a Date, we must return NULL (nothing)
return null; 
}

Creating the method to calculate the infection risk

We can operate on hourly weather data for leaf wetness and calculate the infection risk. Data in will be a list of weather observations (leaf wetness, hourly). Output data will be a dictionary with timestamp as key, and the infection risk as value. So for instance for 24th July 2014 14:00 UTC there will be only one value.

An example of a solution can be:

public Map<Date, Integer> getInfectionRisk(List<WeatherObservation> observations)
{
    // Create the map with dates and infection risk values
    Map<Date, Integer> riskMap = new HashMap<>();
    // Make sure the observations are in chronological order
    Collections.sort(observations);
    // Counter for consecutive hours of leaf wetness
    Integer consecutiveHoursOfLeafWetness = 0;
    // Loop through the list of observations
    for(WeatherObservation obs:observations)
    {
        // We define a lower threshold for leaf wetnes to be 10mins/hour
        if(obs.getValue() > 10.0)
        {
            // Leaf wetness registered, add to consecutive hours counter
            consecutiveHoursOfLeafWetness++;
        }
        else
        {
            // No leaf wetness, reset counter
            consecutiveHoursOfLeafWetness = 0;
        }
        
        // We set the risk value
        riskMap.put(obs.getTimeMeasured(), consecutiveHoursOfLeafWetness * 2);
    }
    // Return the map with all values
    return riskMap;
}

How can we be sure that these methods work? Testing to the rescue!

In order to ensure that the methods work, we should test them. Ok, maybe these methods are so simple that they do not need testing. But in most models you will have complex calculations, and for that you need testing to ensure correctness.

For Java, the most common method is to use a testing framework called Junit (which is part of a larger family of testing frameworks for different programming languages called «xUnit», see: http://en.wikipedia.org/wiki/XUnit). To set up a test for the model class, right click on it (In the Projects tab and select Tools -> Create/Update tests. Click OK in the dialog box. A test class is created for you, and all public methods in the model class have gotten their corresponding test methods. When you run the test, which you can do by right clicking the model class and select Test File or just hit [CTRL-F6], you will see that all tests fail. This is by design. It’s now up to you to select which test methods to keep.

A weather data file is included in the documentation. Copy this into the test folder

How testing works in JUnit

Let's begin with writing a very simple test method:

@Test
public void HelloTest()
{
    String expected = "Hello Test!";
    String result = "HellOOO Test!";
    
    Assert.assertEquals(expected,result);
}

The method has @Test before the declaration. This is a so called annotation, which helps JUnit find which methods in the class are actually meant to be methods to run during a test. The test consists of two string variables (expected and result). These variables are asserted (or expected) to be equal, and this is tested using the Assert.assertEquals method in JUnit. Of course the two strings are not equal, so the test will fail. You can try that by simply entering the key combination [ALT]+[F6], which runs all tests for the current project. You should see this in the lower part of NetBeans:

Failing JUnit test in NetBeans

Now you can try to make the test pass. You can do that by setting the strings to have the same value. Run the test again ([ALT]+[F6]), then you should see that the test passes:

Passing JUnit test in NetBeans

So now we can add a test for one of the methods that we created in the FungusPilosusFlavisModel class. Let's start with testing if getDateWhenDayDegreeLimitHasPassed can find the correct date for when we have passed 500 day degrees. We use the data from the file JSONWeatherdata.json (you'll find it under Other Test Sources). These data are quite easy to import into a spreadsheet. By doing that, you can add the temperatures, and find that at July 8th 2012, the day degrees have reached a value of 509.5. So we can test this in the following way.

@Test
public void canFindDateWhenDayDegreeLimitPassed()
{
    FungusPilosusFlavisModel instance = new FungusPilosusFlavisModel();
    // We create the expected date
    Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("Europe/Oslo"));
    cal.set(2012, Calendar.JULY, 8, 0, 0, 0);
    cal.set(Calendar.MILLISECOND, 0);
    Date expected = cal.getTime();
    ModelConfiguration config = new WeatherDataFileReader().getModelConfigurationWithWeatherData("/JSONWeatherData.json", "FUNGUSPILO");
    List<WeatherObservation> observations = (List<WeatherObservation>)config.getConfigParameter("observations");
    Date result = instance.getDateWhenDayDegreeLimitHasPassed(observations);
    
    assertEquals(expected,result);
}

If you run the tests now, you'll (hopefully) see that both tests passed. Both tests? Yes, you have two tests:

  • helloTest
  • canFindDateWhenDayDegreeLimitPassed

Each time you change the program and compile it, these tests are run. This means that if you change something in the program (either intentionally or unintentionally) that makes these tests fail, you will be informed. This might not seem so relevant for such a simple program, but believe me, it will save your day when the code gets big and ugly. Also, writing tests helps you think of how the program works. If you write the tests first, you will think more clearly about the problem, in my experience.

Exercise: Write a test for getInfectionRisk()

Putting it together

We now have the most important methods created (and successfully tested). What we need to do now is to get data in (set configuration, get weather data etc) and get the results out in the expected format.