Using COD and CML to build applications that predict stock data

No, not really. You probably won’t be rich unless you work really hard… As nice as it would be, you can’t really predict a stock price based on ML solely,  but now I have your attention! 

Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML). 

In this demo I will try to predict the behavior of open price of stocks based on their historical data, meaning if a stock open price will go up or down. I am not a data scientist, but there are many examples online on how to do that (I took some code samples, fixed them, and adjusted them to work with COD). For this purpose, I will use the LSTM (Long Short-term memory) algorithm. RNN in general (recurrent neural networks) and LSTM specifically works very well with time-series data

For the avoidance of doubt, we are not purporting to be stock market experts, and nothing in this blog post should be taken as financial advice in any way.  This is purely an example of how to develop a solution using Cloudera’s software. 

Main components used in this demo:

  • Cloudera Operational Database (COD), as mentioned in my previous post, is a managed dbPaaS solution available as an experience in Cloudera Data Platform (CDP)
  • CML is designed for data scientists and ML engineers, enabling them to create and manage ML projects from code to production. Main features of CML: 
    • Development Environment for Data Scientists, Isolated, Containerized, and Elastic
    • Production ML Toolkit  – Deploying, Serving, Monitoring, and Governance of ML models
    • App Serving – Build and Serve Custom applications for ML use-cases
    • Pre-packaged applications to insights to business users
      • Simple, drag-and-drop building of dashboards and apps with Cloudera Data Visualization
      • Template applications as starter kits for your use-cases
  • Stock Data – for pulling the stock data, I used alpha vantage service (free version). Basically, it’s a service that allows you to get a daily summary for stocks statistics (open, close, low, high- volume)

Build the application

The first thing we need to do is to create a database in COD. 

1. Log in to Cloudera Data Platform Public Cloud control plane

2. Choose Operational Database and then click on “Create Database”

Create Database

3. Choose your environment and name your database

name your database

4. Once the database is up and running, switch to the JDBC tab

switch to the JDBC tab

5. Set your CDP workload password

6. Now let’s move to CML – Go back to the control plane and click on “Machine learning” and then “Provision workspace”  Give your workspace a name and select the environment you want to use

Provision workspace

Provision workspace 2

7. Once the workspace is provisioned, create a new project, give it a name and use git for the source code. All source code can be found here

 create a new project

8. Once the project is created you will see all files in the project directory:

project directory files

9. Few additional logistics – we need to create environment variables for the project, which will store the access to our database, and alpha vantage API Key. To do this go to the “project settings ->advanced” tab

create environment variables for the project

create environment variables for the project 2

10. Now, let’s start running the project – click on “new session”, give your session a name, choose “python 3” and the resource profile

New Session

11. Either on the terminal or the CLI at the bottom, install all the required libraries by running “pip3 install -r requirements.txt”

running “pip3 install -r requirements.txt” 

12. Next step is to create our table in which the data will be stored in our database. for that purpose, run the setup script:

create our table in which the data will be stored

13. Now, let’s start testing our model! Go to runner.py and run it. Basically, this script does the following:

a) Gets the daily stocks data from the alpha vantage

b) Basic data transformation

c) Store the data in Cloudera Operational Database 

d) Run the model and create the model file (tech_ind_model.py)

e) Run the prediction for the last 120 days

f) Calculate the earnings predicted if we buy and sell stocks at the exact rates 

Test model

14. While the program is running, you will see that it’s collecting data for each stock, uploading it to COD, performing the prediction, and plotting a chart displaying the predicted price vs the current price. Although the chart shows that the trend in most cases is very similar, even a slight difference can impact the total revenue from those recommendations. The output of the run includes the below-mentioned information for each stock:

a) recommendation: keep/buy/sell 

b) last predicted: last predicted price 

c) price: last price

d) predicted: prediction for the next open price

e) signal: high/low – is the recommendation strong/not

15. Based on the last time I ran the program, I would lose money in most stocks based on the recommendations, but at the same time, there were a few that predicted a profit of up to 20%!!!

Now, algorithmic trading in general and prediction of stock prices are topics that have been around for many years. In order to operationalize machine learning use cases like this, you need powerful tools that are easy-to-use, scalable, and enterprise-ready.

Cloudera Operational Database provides sub-second latency for random read/writes and is exactly what is needed for real-time applications like those used in the financial services industry. 

To get started with CML and COD…

Hope you find it useful,

Happy coding!!

Leave a comment

Your email address will not be published. Links are not permitted in comments.