Sharing my first deploy experience - feedback welcome

Hello!

I just spent number of hours trying to deploy an app and wanted to share some issues that may already probably be known:

  1. The app is peer to peer and wants to tell peers about the IP:PORT it is running on, however it is impossible to tell the non-port 80 ephemeral port from within the container.
    – I tried to leverage the port 80 mapping, but then the container external IP and the port 80 proxy IP no longer matches.
    – I tried abusing the port 80 mapping, but since I need 2 ports to publish to peers I was always 1 short. If only there was a magical way for the container to be able to ask about itself/exports/port mappings via a metadata filesystem or http address like aws e2c metadata…

1a) I guess we need smarter live-reconfig container software, but that will take some time.

  1. I didn’t find a way to read the docker console logs (docker logs … container) but it sure would be good to know if my container ever booted.

  2. I assume if I run SSL, it will be on a random ephemeral port since only port 80 is mapped. I also assume that shoving TLS into port 80 won’t work well. That kind of makes it hard to use cloudflare or other ssl terminators that don’t have a port mapping feature.

While yes, it’s true I can make cloudflare ssl terminate the proxied port 80, it’s less than ideal to leave cloudflare->akash in full plain-text.

  1. I found it so strange to create a deployment, pay for a lease, then have to send-manifest the same YML file to the provider. – if I have secrets in the deploy.yml, am I blindly sending strange parts to strange people? Better to have a separate deploy-resource-request.yml then a provider-runtime-manifest.yml?

  2. Minor thing, but it seems various commands reply with JSON and YML without much of a pattern. Maybe blockchain vs provider? no idea. Would be cool to eventually make it all either JSON/YML, then later both (AWS API allows a preference specification). I’m just thinking this is a small barrier to entry for some folks who will already struggle digging through layers of nested objects to dig out key/values.

  3. I’m ashamed to say that even though I’ve worked with JSON/YML for years that by the time I wrote a program to auto-parse the responses to get to the next step, the system provided bids were closed and I failed to make a lease. Then I had to figure out why my 5AKT were no longer with me (newbie issue), and how in the world to get my 5AKT back (thought I lost 5AKT due to late lease). All is well, but just sharing my confusion. If bid times were open longer for fools like me, I guess I could have learned how to shut down my deploy the next day. Which reminds me, why must I close my deploy just to get new bids? Wouldn’t it be nicer to just be able to request new bids? Why tell me the bids are closed when clearly I’m asking for bids in order to make a leasing decision? Would be better to provide a historical bid API if I was interested in why I failed. I’m sure there’s good reason for things to be the way the are right now. I just completely missed it on this first pass.

Anyways, I just wanted to share my experience so that devs/team know where I stumbled/know what might be important to others.

I’m sure I would have never noticed any of this if I just was deploying port 80 HTTP sites… which are… admittedly getting very rare these days.

Not a urgent request to fix/change anything.

7 Likes

Thank you so much for this @awef. You’ve identified some things that we are working on resolving, and also some new pain points that we should consider.

A few quick responses:

  1. I didn’t find a way to read the docker console logs (docker logs … container) but it sure would be good to know if my container ever booted.

Check out the akash provider lease-logs and akash provider lease-events commands.

  1. I assume if I run SSL, it will be on a random ephemeral port since only port 80 is mapped. I also assume that shoving TLS into port 80 won’t work well. That kind of makes it hard to use cloudflare or other ssl terminators that don’t have a port mapping feature.

We’re planning on adding SSL to the shared ingresses and also adding IP Address leasing through the marketplace (Elastic IP in other systems). Either of these may have made your experience better, hopefully one of them will be available soon.

  1. I found it so strange to create a deployment, pay for a lease, then have to send-manifest the same YML file to the provider. – if I have secrets in the deploy.yml, am I blindly sending strange parts to strange people? Better to have a separate deploy-resource-request.yml then a provider-runtime-manifest.yml?

deploy.yml is a source for both on-chain data for creating orders, and off-chain data for running workloads. The on-chain data includes resources and provider requirements, but does not include workload runtime information (no image, environment variables, etc…)

You can limit the providers that are able to bid on your orders by using “audited attributes”. This feature models off-chain reputation that we use every day when deploying to other cloud platforms. The examples in our documentation only allow bids from providers that the Akash team is working closely with, for instance.

  1. Minor thing, but it seems various commands reply with JSON and YML without much of a pattern. Maybe blockchain vs provider? no idea. Would be cool to eventually make it all either JSON/YML, then later both (AWS API allows a preference specification). I’m just thinking this is a small barrier to entry for some folks who will already struggle digging through layers of nested objects to dig out key/values.

Yes, this is very irritating. The underlying framework that we are using has been quite inconsistent with this. We have a large CLI overhaul planned to address this.

You can try using -o json on most things to standardize it.

I’m sure I would have never noticed any of this if I just was deploying port 80 HTTP sites… which are… admittedly getting very rare these days.

Agreed. A lot more features coming down the pike. This was basically an MVP to get it in the hands of users - this kind of feedback is exactly what we were looking for. Thanks!

6 Likes

Good to hear about your future plans for SSL.

For audited attributes, while I believe the team has done their work, I think that non-technicals could value the time-investment by the team much higher if we could download a HTML/PDF/message that says, your deployment is running on certified audited attribute servers as governed by “Audit XYZ by Auditor A on Date D according to terms T”.

This brings the trust level from “trust this 1 bit value” to “These are the full terms you can trust” – as far as a non-technical person looking for a ToS could understand.

Obviously not urgent, but I think it would make the teams efforts at delivering quality much more recognizable and give tech nerds a little extra firepower in explaining to their boss/customer why this is okay and trustable. (since Crypto is still in the spooky bucket for many folks)

4 Likes

Works well and helped me find my container exceeded storage limits. I didn’t know there were storage limits. Doesn’t seem the SDL allows for disk space requirements but certainly would be good to be rejected ahead of time if I want 1TB of disk, but it’s obvious I can’t get it.

Would be good to document whether killed pods / containers continue to incur compute charges.

Although lease-events showed my container warning / evicted:

{
  "type": "Warning",
  "time": "0001-01-01T00:00:00Z",
  "reason": "Evicted",
  "note": "Pod ephemeral local storage usage exceeds the total limit of containers 1073741824. ",
  "object": {
    "kind": "Pod",
  }
}

{
  "type": "Warning",
  "time": "0001-01-01T00:00:00Z",
  "reason": "Evicted",
  "note": "Pod ephemeral local storage usage exceeds the total limit of containers 1073741824. ",
  "object": {
    "kind": "Pod",
  }
}


lease-status shows little sign of problems (would be good if it did)

{
  "name": "web",
  "available": 1,
  "total": 1,
  "uris": [
    "xyz.ingress.sjc1p0.mainnet.akashian.io"
  ],
  "observed_generation": 1,
  "replicas": 1,
  "updated_replicas": 1,
  "ready_replicas": 1,
  "available_replicas": 1,
  "should_add_number_of_killed_or_evicted_pods": "haha a few"
}


I think it would be good to be able to docker exec sh into a running container.

Ah and… it would be good to be able to keep the akash key password in memory for a short period of time like ssh-agent / ssh-add with an expiry time. Make learning akash easier to learn/use.

for ssh-agent / ssh-add, you can use a backend type of pass which uses gpg-agent in the backend.

1 Like

Thanks again for the great feedback, @awef.

This is actually done but hasn’t been released yet. Add provider lease-shell command by hydrogen18 · Pull Request #1293 · ovrclk/akash · GitHub

2 Likes

It does, see here.

Ah, totally missed that. Thanks!