Commodity Price Prediction Using Machine Learning Approach

Surya
8 min readJan 21, 2021

“Praise, like gold and diamonds, owes its value only to its scarcity”

Causes of taken this Project:

In World, every day many people are going to buy some kind of commodities like gold or silver, diamond, copper, or investment in gold but people are cannot predict that rate. It is very difficult for people are must need to know what is today’s commodity price, tomorrow’s commodity price, and Future commodity price because this is very helpful to investment or buying in the current or future.

Objectives:

The main objective of this project is at predict the future prices of precious metals like gold, silver and copper using Machine Learning techniques, aiming to get the most accurate result of all.

Mathematical Work:

LBMA declared Gold rate in 1 ounce

1 ounce= 31.103 grams

If suppose 1 ounce is =$1923

Ounce to gram= Today Ounce rate/ 31.103

i.e= 1923/31.103= 61.83

Therefore now 1 gram is $ 61.83 Now convert dollar to Rupees

Dollar to Rupee= dollar rate/ dollar in India value

i.e = 61.83/73.42 = Rs. 4539

Now 1 gram in India is Rs. 4539 But our IBJA is applying some factors like import duty, demand in finally give one rate such as suppose import duty is 12% then the amount is raised on 4539+544= 5083 then IBJA has declared 1 gram of gold is 5093.

1 gram = 5093

In 24 carat = 5093

In 22 carat = 24 carat gold rate * 0.916

= 5093 * 0.916= 4665

1 pound of gold rate is 4665*8 = Rs.36,400

Tools and Libraries:

Notebook Editor (Jupyter and Spyder)

Python Programming

Scikit Learn-Machine Learning-Supervised Learning

Flask-Web application

Heroku-Deployment

Methodology

Steps to be followed during Project

Obtain the data

Scrubbing the data

Exploratory data analysis

Feature selection

Split train test data

Various type of Model building

Predict the output

Evaluation Metrics

Select best Model

Create Flask server

Develop web application through Flask

Heroku deployment

Data Collection

Data for this study is collected from January 2001 to the Current date from various sources. The data has 1718 rows in total and 6 columns in total. Data for attributes, such as Silver price, Gold price, Copper price. In proposed system collect the every commodity data from various source then combine all the data in one data frame.

Source

Gold ETF: https://finance.yahoo.com/

Silver: https://finance.yahoo.com/

copper: https://in.investing.com/

Data Preprocessing

Data Preprocess method are used in this Proposed system:

Combine all the data

Create a new column

Handling Missing values

Handling categorical values

Extract Date and Time

Select columns

Why this above methods are needed?

1. Combine all the data

In the Proposed system collect some kind of commodity data like gold, silver, copper these are all collected from various sources so finally combine all the data in one new data frame with the help of merge.

2. Create a new column

In the proposed system predict the commodity rate so combine all data but the user has the chance of confusing in data so the proposed system creates one new column and fills the column when merging the data this is gold this is silver like one that.

3. Handling Null values

If suppose dataset has some null values that are the main disadvantage of predicting the rate. So it must handle through fill null values or drop null values. This Proposed system has some null values but here use the filling method because if suppose drop null values that are not an efficient way of future prediction so that here proposed system using fill method with the help of pandas.

4. Handling Categorical data

In machine learning algorithms don’t support categorical data so the proposed system converts the categorical data to numerical data with the help of the label encoder library.In this Proposed system has commodity and date column had object type but proposed system convert date column with help of date time function another one important feature is commodity that columns has three columns

from sklearn.preprocessing import LabelEncoder

Le=LabelEncoder()

cm[‘commodity’]=Le.fit_transform(cm[‘commodity’])

5. Extract date time

Dataset column has date column that is must need to prediction but that is zipped features object type so first extract with help of pandas datetime library.Here in the Proposed system use date column its must need to prediction but that is object type so proposed system convert that data type with the help of date time extract.

cm[‘Date’] = pd.to_datetime(cm[‘Date’]

cm['year'] = cm['Date'].dt.yearcm['month'] = cm['Date'].dt.monthcm['day'] =cm['Date'].dt.day

6. Select Columns

Dataset has many features but no need to all features to prediction like the name does not help of predicting the rate so need to remove unwanted columns depend on the problem. In this step, I would segregate feature and target variables. I will not use the Close feature of data and will use Adjusted Close of as the target variable.

cm[‘S_3’] = cm[‘Close’].rolling(window=3).mean()

cm[‘S_9’] = cm[‘Close’].rolling(window=9).mean()

cm[‘next_day_price’] = cm[‘Close’].shift(-1)

Through this shift(-1) proposed system predict the next day features approximately its store in new columns

Select input and Output features.

X =cm[[‘commodity’,’year’,’month’,’day’]]

y = cm[‘Close’]

Training Process:

The process of training an ML model involves providing an ML algorithm that is, the learning algorithm with training data to learn from. The term ML model refers to the model artifact that is created by the training process.

In proposed system is under from supervised machine learning algorithms so dependent data can learn from independent data. So that machine can learn from that historical data

X_train, X_test, Y_train, Y_test= train_test_split(X,y,test_size=0.2)

In this proposed system use three types of train test split in first experiment use 80% of training and 20% of testing through sklearn train_test_split.

t = .8

t = int(t*len(cm))

In this proposed system use three types of train test split in second experiment use 70% of training and 30% of testing through sklearn train_test_split.

t = .7

t = int(t*len(cm))

Model Building

The purpose of using machine learning in this project is that the problem is a regression problem because the output is a continuous value. Thus, this proposed system comes under Supervised Machine Learning techniques. In this project we use different types of algorithms to predict the model correctly with best score.

  1. Linear Regression (Ridge)
  2. 2. Decision Tree Regressor
  3. 3. Random forest Regressor

Linear Regression and Ridge: The model produced by Linear Regression depends only on a subset of historical data that means dependent data, because the cost function for building the model ignores any training data close to the model prediction.

from sklearn.linear_model import LinearRegression
linear = LinearRegression().fit(X_train,y_train)
print(“Linear Regression model”)

Experiment-1: Linear Regression

In this proposed system train the model used linear regression. In this linear regression train the model through fit()method when developer used linear regression without any tuning, algorithm result is not efficient but algorithm train with default tuning like linear regression work with n_jobs = -1

In this above figure actual and predicted price doesn’t match. So developer obviously go to next algorithm.

Ridge: Ridge and Lasso these two are under from linear regression these two are used of generalize the model that means low bias and low variance.In ridge function reduce the cost function in says simply its convert from high variance to low variance. Because of ridge are add one more parameter in cost function.

rr = Ridge(alpha=0.01)
rr.fit(X_train, y_train)

Decison Tree Regressor: Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.

from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state=0)
regressor.fit(X,y)

Random Forest Regressor:

Its mean group of trees or combine of multiple decision tree. It creates a forest and makes it somehow random. The forest it builds, is an ensemble of Decision Trees, most of the time trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result. Random decision forests correct for decision trees habit of overfitting to their training set

from sklearn.ensemble import RandomForestRegressor
regr = RandomForestRegressor(max_depth=10, random_state=0)
regr.fit(X, y)

Evaluation Metrics

Evaluation metrics means to analyze the error of the product in this proposed system can analyze the evaluation metrics through Metrics library this metrics library provide mean square error, root mean square error, accuracy in this product full and fully regression problem so that obviously we can go to calculate MSE and RMSE.

Deployment Process in FLASK

When the product is ready to run local server then obviously go to deploy the project because when deploy the product user can easily use.

In this proposed system done by local server then upload the code and files in GitHub when GitHub is done then create Heroku account. Because of this product deploy through Heroku. Heroku provide one URL that URL will be used by user.

Local Server running:

In local server running we can run localhost Running on (http://127.0.0.1:5000/)

Deployment in heroku

After flask is done obviously goes to deploy in heroku becasue when deploy in cloud everyone use in everywhere. In below given how to deploy in heroku https://cmpredictionapi.herokuapp.com

Limitations:

In this proposed system have some limitations such as given below:

In this proposed system predict the commodity through rolling method

In this system cannot predict another kind of commodity like diamond

And, everyone predicts the rate when enter rolling that mean given open and close price, they won’t able to predict without that features that is main limitations of this proposed system.

In future, when the user clicks the Date, Month, and Year system could automatically predict the future rate and recommend whether to buy particular commodity that the user had selected. The current project is going to be launched as a product in which the users can access this project in a mobile application.

https://github.com/jaisurya0508/MetalPrice

--

--