Agenda
• Create Data Operation
• Create Data Product
• Create Transformation Script
• Create Data Transformation
• See Data Analysis
• See Data Array
• Plot Data
This tutorial shows an example of Machine Learning with Wendelin.
Based on data from previous tutorials we will predict air pressure from
humidity using Linear Regression by scikit-learn and visualize results.
Before doing this tutorial make sure you read and completed the following
tutorials
• HowTo Transform Data
• HowTo Resample Data
• HowTo Create Notebook with Graphs
Data Operation
Open your Wendelin dashboard.
In Modules click on Data Operations Module
Add Data Operation
Click on Add to add a new Data Operation.
Add Data Operation
Click Proceed to continue.
Fill the Form
Fill the form to create Data Operation.
Title - we name it Predict Pressure
Reference - data-operation-predict-pressure
Script ID - DataAnalysisLine_predictPressureFromHumidity: this script will do
all the magic. It doesn't exist yet, we will create it later in this tutorial.
At the don't forget to Save the changes.
Validate
Click on Validate on left side panel to validate Data Operation.
Confirm Validation
Click Validate to confirm validation.
Data Product
Now we need to create a new Data Product which will the output Data Product of
the Transformation.
Create Data Product
Create a new Data Product as described in HowTo Create Data Product tutorial
with following values
Title - Predicted Data Array
Quantity Unit - Unit/Piece
Reference - environment-predicted-array
Item Types
• Data Array
Use
• Big Data/Ingestion/Stream Ingestion
At the end don't forget to save the changes and Validate.
Portal Callables
After Data Product is created and validated, navigate to page called Portal
Callables by clicking on Callable on the left side panel.
Portal Callables Cont.
Here we will create and store the prediction script.
Add Transformation Script
Click on Add button to add a new script.
Add Transformation Script Cont.
Choose Python Script as Document Type and click on Create Document to create an
empty python script.
Fill The Form
Define ID, Title and Reference of your script.
We name it DataAnalysisLine_predictPressureFromHumidity as we did in Data
Operation at the beginning of this tutorial.
Next we define the parameters we will give to our script.
in_array - the input dictionary that contains Data Array where raw data is
stored after the previous transformation.
out_array - the output Data Array where results of the prediction will be
stored.
At the end click Save to save the changes.
Transformation Script
The script we write in the textbox area at the bottom of the page.
Transformation Script Cont.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
def predict(zbigarray):
humidity = zbigarray[:]["humidity_mean"]
pressure = zbigarray[:]["pressure_mean"]
humidity = humidity[humidity !=0 ]
pressure = pressure[pressure !=0 ]
humid_train, humid_test, pressure_train, pressure_test = train_test_split(humidity.reshape(-1,1), pressure.reshape(-1,1), test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(humid_train, pressure_train) #training the algorithm
pressure_pred = regressor.predict( zbigarray[:]["humidity_mean"].reshape(-1,1))
df = pd.DataFrame({'Date': zbigarray[:]["date"],'Actual':zbigarray[:]["pressure_mean"], 'Predicted': pressure_pred.flatten()})
return df
out_data_array = out_array["Data Array"]
in_data_array = in_array["Data Array"]
in_zbigarray = in_data_array.getArray()
if in_zbigarray is None:
return
if in_zbigarray.shape[0] == 0:
return
df = predict(in_zbigarray)
ndarray = df.to_records(convert_datetime64=False)
dtype = [('Date', '
%% fetch
js: jio.js
js: ndarray_bundle.js
js: wendelin.js
js: https://cdn.plot.ly/plotly-latest.min.js
%% js
hateoas_url = "https://softinst133633.host.vifib.net/erp5/web_site_module/default_wendelin_front/hateoas/";
jio = jIO.createJIO({
type: "erp5",
url: hateoas_url,
default_view_reference: "view"
});
gadget = {
getSetting: function(property) {
return new RSVP.Queue()
.push(function () {
if (property == "hateoas_url") {
return hateoas_url;
}
return;
});
},
jio_getAttachment: function(id, url, parameter_dict) {
return jio.getAttachment(id, url, parameter_dict);
},
jio_get: function(id) {return jio.get(id);}
}
var prediction_label_list = ["Date", "Actual", "Predicted"];
var prediction_graph = document.getElementById('prediction_plot_div');
plot_prediction();
function plot_prediction() {
return getPredictionData()
.push(function (data) {
console.log("data")
console.log(data)
var layout = {barmode: 'stack','title' :'Predict Presure Based on Humidity'};
Plotly.plot(prediction_graph,data, layout);
});
}
function getPredictionData(start_date, stop_date){
function unpack(rows, key) {
return rows.map(function(row) { return row[key]; });
}
array_id = "data_array_module/90";
prediction_graph_data=[];
var start_index = 0;
var stop_index = undefined;
return jio.allDocs({
query: 'portal_type:"Data Analysis Line" AND ' +
'title: "Predicted Data" AND ' +
'resource_reference:"environment-predicted-array" AND ' +
'simulation_state:"started"'
})
.push(function (result) {
var data_analysis_line_id = result.data.rows[0].id;
return jio.allDocs({
query: 'portal_type:"Data Array" AND ' +
'aggregate_related_relative_url:"' + data_analysis_line_id +'"'
});
})
.push(function (result) {
array_id = result.data.rows[0].id;
return wendelin.getArrayRawSlice(gadget, array_id, 0, 1);
})
.push(function (result) {
array_start_date = wendelin.convertFirstColToDate([[result.data[0]]])[0][0];
if (start_index === undefined) {
start_index = Math.max(0, Math.ceil((start_date - array_start_date) / (frequency*1000))),
stop_index = Math.ceil((stop_date - array_start_date) / (frequency*1000));
}
return wendelin.getArrayRawSlice(gadget, array_id, start_index, stop_index);
})
.push(function(result) {
for (i = 0; i < prediction_label_list.length; i += 1) {
prediction_graph_data = prediction_graph_data.concat(nj.unpack(result.pick( null, prediction_label_list[i])));
}
return prediction_graph_data
})
.push(function(result){
var filtered_graph_data = [];
for (var i=0; i