Akash Filebase AI/ML DeCloud Platform

Machine learning and AI require lots of storage (for storing data, model checkpoints, production models, etc.) and lots of compute (for experimenting with and training different models, serving the models in production, etc.). Filebase and Akash respectively provides both at a fraction of the cost of their centralized counterparts, but there is no AI/ML platform to take advantage of them right now. Let’s change that.

For this hackathon, I propose to build the beginnings of a DeCloud AI Platform, similar to offerings in tradition cloud like Google AI Platform, Amazon SageMaker, Azure ML, etc. Given the limited time of the hackathon, I will be focusing on implementing a web interface that enables what I think are some of the core features common to most traditional offerings:

  • From the web interface UI, users will be able to specify and deploy a custom compute instance that comes with a hosted Jupyter notebook environment at the click of a button. The ability to specify GPUs will come in the future when Akash supports it.
  • Within their Jupyter environment, users will be able to load datasets from Filebase S3 and train their models. Model checkpoints and final models will also be uploaded to Filebase. Users can close their Jupyter deployment and the models will still be available from Filebase.
  • From the web interface, users will be able to see all their models saved to Filebase. There will be a UI that allows users to pick a model to deploy as an endpoint, so that consuming ML apps can use the endpoint to perform model inference.

Such a website will interface with the Keplr wallet extension for Akash deployment payments, and will be powered by AkashJS.

6 Likes

Give this man an Oscar

4 Likes

Happy to see that my choice in delegating to you was a worthwhile one!

2 Likes

2 Likes

I really like this idea! It seems the competition will be fierce :grin:

2 Likes

Nice idea !

1 Like

How are things going with the project?

1 Like

Project Submission: DeCloud ML

A DeCloud AI/ML Platform, powered entirely by Filebase and Akash (prototype, DON’T use in production) is here!

Check it out at: https://gradientdescent.hns.siasky.net/
Main GitHub at GitHub - spacepotahto/decloud-ai-platform
Custom images involved:
https://github.com/spacepotahto/docker-jupyter-s3/pkgs/container/jupyter-s3-tensorflow-notebook
https://github.com/spacepotahto/docker-tensorflow-serving-s3/pkgs/container/tensorflow-serving-s3
https://github.com/users/spacepotahto/packages/container/package/demo-mnist-flask
https://github.com/users/spacepotahto/packages/container/package/demo-mnist-web-app

UPDATE 10/23: The text below was my original submission, but I decided to also make an equivalent comprehensive video demo:

There’s no audio; I recommend watching at 2x speed :slight_smile:

As described in my project proposal above, I’ve basically implemented all three bullet points. If you’re a Data Scientist or ML practitioner, they are likely familiar and desirable features. In the following sections, I’ll showcase the platform with the following example ML pipeline:

  1. Start with the MNIST handwritten digits image dataset (stored in Filebase S3).
  2. Deploy a custom compute instance that comes with a hosted Jupyter notebook environment (powered by Akash).
  3. Within the Jupyter notebook, pull the dataset from Filebase S3, train a simple handwritten digits classifier using Tensorflow and the dataset, and upload the final model to Filebase.
  4. Deploy a prediction endpoint, by pulling the model from Filebase and using Tensorflow Serving hosted on Akash.
  5. Use the prediction endpoint to power a simple digits recognizer web application (hosted on Akash).

Prerequisites

You need a Filebase account, a Chrome or Brave browser with Keplr wallet extension installed, and an Akash wallet with at least 11AKT in it to follow along.

Filebase Setup

For preparation, I have the MNIST dataset ready in a bucket called “dataset-3b5d3834-72a8-49f5-9e1a-40f5cd8a77c5”. The original data is from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz, you can download it and put it in your “(data bucket)/MNIST/mnist.npz” too to follow along. I also have another empty bucket called “decloud-ml”, which will be used to store everything done from the web UI (you’ll see).

Create Deployment Certificate and Add Filebase Access Keys

In the DeCloud ML web UI, go to the “Keys & Certs” section to create a deployment certificate and to add your Filebase access keys. Specify the bucket you want to use to store everything in. In this example, it’ll be my “decloud-ml” bucket.
decloud-ml-keys

Deploy Jupyter Notebook with TensorFlow

Note that the “Notebooks” tab is empty, and that’s because we haven’t deployed any notebooks yet. Click on the “Create Notebook” button. You should see a form like below that allows you to specify the resources for the compute instance, and the base images. GPU support will come later when Akash supports it, and right now for this prototype you can only choose a Jupyter Notebook that comes with Tensorflow:

Click “Deploy Notebook”, and follow the steps to deploy the instance on Akash:
decloud-ml-notebook

Open Notebook and Do Data Science Stuff!

Once deployed, you should see a new row in the table in the “Notebooks” tab. Click the “Open” button to launch the Jupyter environment. The password is the one set in the deployment form. Normally you would probably start with a fresh notebook, but for the purpose of this demo, I’ve prepared a sample notebook that contains all the code for loading the data, training the model, and uploading the model:
decloud-ml-jupyter

Here are some screenshots of the nodebook code. The data is downloaded from my Filebase “dataset-3b5d3834-72a8-49f5-9e1a-40f5cd8a77c5” bucket using the Python boto3 package. You can run all the cells via “Run → Run All Cells”, or Shift + Enter on individual cells. After the model finishes training, the model is uploaded to Filebase for downstream use:





Also, since we’re using Tensorflow, we can monitor the model training progress using Tensorboard! Back in the web UI, if you click on the “!” exclamation icon button under “Deployment Info”, you can find the external port that maps to port 6006, from which you can access Tensorboard. For example in our case the mapped port is “31602”, and the deployment is at http://kp7siee655bh577flc4jj6iv2c.ingress.provider-0.prod.sjc1.akash.pub/, so we can access Tensorboard at http://kp7siee655bh577flc4jj6iv2c.ingress.provider-0.prod.sjc1.akash.pub:31602

Deploy Model Prediction Endpoint

After running the notebook, you should see your model (called “mnist”) uploaded to your Filebase bucket:

To serve this model as a REST endpoint, we can deploy a Tensorflow Serving instance using Akash, that pulls the saved model from Filebase. From the web UI, go to the “Models” tab, fill in the necessary information (in our case, project name is “MNIST Classifier” and model is “mnist”, to match the object path names on Filebase), and click the button to deploy:

Example usage of the model prediction endpoint

Once deployed, your model prediction endpoint is accessible at http://THE_HOSTED_URI/v1/models/mnist:predict via REST calls, and can be consumed by e.g. a web application. I built a simple handwritten digits classifier web application hosted on Akash and powered by this REST endpoint as an example: http://tbb1kvviedbtv12i4349el5714.ingress.provider-0.prod.ams1.akash.pub/

1_YeLmT7ow8kAy3gP0hYjDaQ

If you’re curious you can checkout the webapp code, or even deploy your own on Akash using the provided SDL in my GitHub decloud-ai-platform/demo-ml-web-app at main · spacepotahto/decloud-ai-platform · GitHub

Disclaimer

Again, this is just a proof of concept, so please DO NOT use in production. Given the limited hackathon time relative to the complexity of the project, error handling is weak, and security needs improvements, among other things. If you do try it, I highly encourage you clear browser cache after use, or rotate your Filebase keys. If I end up improving this for the future though, the above will be addressed and more.

4 Likes

I shouldn’t applaud my hackathon competitors but this is pretty awesome.

3 Likes

Thanks, love your work on FileBox too!!

1 Like

Updated the above post with a video demo.

1 Like

Outstanding job! Congratulations!

2 Likes

This is tremendous. So cool. I’m an ML Engineer, so it’s amazing to see this kind of use-case developed.

Gonna give your validator some love next time I delegate.

2 Likes

Thanks, much appreciated!!

1 Like

Thanks! I use to do a lot of ML too, so this project is very close to heart. Really glad you think it’s cool!

2 Likes

Really cool project @SpacePotato ! I hope more people will start to use it :slight_smile:

2 Likes

Thanks @baktun14 ! :blush:

1 Like

Appreciate

1 Like