Project Submission: DeCloud ML
A DeCloud AI/ML Platform, powered entirely by Filebase and Akash (prototype, DON’T use in production) is here!
Check it out at: https://gradientdescent.hns.siasky.net/
Main GitHub at GitHub - spacepotahto/decloud-ai-platform
Custom images involved:
UPDATE 10/23: The text below was my original submission, but I decided to also make an equivalent comprehensive video demo:
There’s no audio; I recommend watching at 2x speed
As described in my project proposal above, I’ve basically implemented all three bullet points. If you’re a Data Scientist or ML practitioner, they are likely familiar and desirable features. In the following sections, I’ll showcase the platform with the following example ML pipeline:
- Start with the MNIST handwritten digits image dataset (stored in Filebase S3).
- Deploy a custom compute instance that comes with a hosted Jupyter notebook environment (powered by Akash).
- Within the Jupyter notebook, pull the dataset from Filebase S3, train a simple handwritten digits classifier using Tensorflow and the dataset, and upload the final model to Filebase.
- Deploy a prediction endpoint, by pulling the model from Filebase and using Tensorflow Serving hosted on Akash.
- Use the prediction endpoint to power a simple digits recognizer web application (hosted on Akash).
You need a Filebase account, a Chrome or Brave browser with Keplr wallet extension installed, and an Akash wallet with at least 11AKT in it to follow along.
For preparation, I have the MNIST dataset ready in a bucket called “dataset-3b5d3834-72a8-49f5-9e1a-40f5cd8a77c5”. The original data is from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz, you can download it and put it in your “(data bucket)/MNIST/mnist.npz” too to follow along. I also have another empty bucket called “decloud-ml”, which will be used to store everything done from the web UI (you’ll see).
Create Deployment Certificate and Add Filebase Access Keys
In the DeCloud ML web UI, go to the “Keys & Certs” section to create a deployment certificate and to add your Filebase access keys. Specify the bucket you want to use to store everything in. In this example, it’ll be my “decloud-ml” bucket.
Deploy Jupyter Notebook with TensorFlow
Note that the “Notebooks” tab is empty, and that’s because we haven’t deployed any notebooks yet. Click on the “Create Notebook” button. You should see a form like below that allows you to specify the resources for the compute instance, and the base images. GPU support will come later when Akash supports it, and right now for this prototype you can only choose a Jupyter Notebook that comes with Tensorflow:
Click “Deploy Notebook”, and follow the steps to deploy the instance on Akash:
Open Notebook and Do Data Science Stuff!
Once deployed, you should see a new row in the table in the “Notebooks” tab. Click the “Open” button to launch the Jupyter environment. The password is the one set in the deployment form. Normally you would probably start with a fresh notebook, but for the purpose of this demo, I’ve prepared a sample notebook that contains all the code for loading the data, training the model, and uploading the model:
Here are some screenshots of the nodebook code. The data is downloaded from my Filebase “dataset-3b5d3834-72a8-49f5-9e1a-40f5cd8a77c5” bucket using the Python boto3 package. You can run all the cells via “Run → Run All Cells”, or Shift + Enter on individual cells. After the model finishes training, the model is uploaded to Filebase for downstream use:
Also, since we’re using Tensorflow, we can monitor the model training progress using Tensorboard! Back in the web UI, if you click on the “!” exclamation icon button under “Deployment Info”, you can find the external port that maps to port 6006, from which you can access Tensorboard. For example in our case the mapped port is “31602”, and the deployment is at http://kp7siee655bh577flc4jj6iv2c.ingress.provider-0.prod.sjc1.akash.pub/, so we can access Tensorboard at http://kp7siee655bh577flc4jj6iv2c.ingress.provider-0.prod.sjc1.akash.pub:31602
Deploy Model Prediction Endpoint
After running the notebook, you should see your model (called “mnist”) uploaded to your Filebase bucket:
To serve this model as a REST endpoint, we can deploy a Tensorflow Serving instance using Akash, that pulls the saved model from Filebase. From the web UI, go to the “Models” tab, fill in the necessary information (in our case, project name is “MNIST Classifier” and model is “mnist”, to match the object path names on Filebase), and click the button to deploy:
Example usage of the model prediction endpoint
Once deployed, your model prediction endpoint is accessible at http://THE_HOSTED_URI/v1/models/mnist:predict via REST calls, and can be consumed by e.g. a web application. I built a simple handwritten digits classifier web application hosted on Akash and powered by this REST endpoint as an example: http://tbb1kvviedbtv12i4349el5714.ingress.provider-0.prod.ams1.akash.pub/
If you’re curious you can checkout the webapp code, or even deploy your own on Akash using the provided SDL in my GitHub decloud-ai-platform/demo-ml-web-app at main · spacepotahto/decloud-ai-platform · GitHub
Again, this is just a proof of concept, so please DO NOT use in production. Given the limited hackathon time relative to the complexity of the project, error handling is weak, and security needs improvements, among other things. If you do try it, I highly encourage you clear browser cache after use, or rotate your Filebase keys. If I end up improving this for the future though, the above will be addressed and more.