Akash provider started bleeding fees with repeating failed "Withdraw Lease" every 5 minutes

this is mainly a copy from the Discord’s “Akash Network Devs” group, #providers from July 5th 2021 ; posting it here as Singh Balvinder (Dehazelabs) suggested.

Hi,

I’ve spotted some weird failed “Withdraw Lease” transactions coming every 5 minutes with 1000uakt fee.

Jul 05 10:09:07 lamborghini start-provider.sh[1029866]: I[2021-07-05|10:09:07.983] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: 59B97254DDF3E6F5A75B8D635375FA95C7BCE6F48356F7CCA44F8AD77F2FDB09\n  Raw Log: []\n  Logs: []" err=null
Jul 05 10:14:08 lamborghini start-provider.sh[1029866]: I[2021-07-05|10:14:08.025] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: 1778728BC7E54F96D1E4B13EA8E9590C0B18EA81B3D46895CF926CB30A4C8951\n  Raw Log: []\n  Logs: []" err=null
Jul 05 10:19:08 lamborghini start-provider.sh[1029866]: I[2021-07-05|10:19:08.061] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: 12B87AD8BD8D7C4E67C374F5728369FF64C93DA02AA5FD0AB8A3F179F246000E\n  Raw Log: []\n  Logs: []" err=null
Jul 05 10:24:09 lamborghini start-provider.sh[1029866]: I[2021-07-05|10:24:09.678] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: 9D2EE00EA7577BA9B67449C6F52AC2D3AB54DC7B1FE7DD20F29CD06A87A04994\n  Raw Log: []\n  Logs: []" err=null

The reason of failed TXs is “failed to execute message; message index: 0: payment closed”

example TX: D2EE00EA7577BA9B67449C6F52AC2D3AB54DC7B1FE7DD20F29CD06A87A04994

entire list of those transactions attached to my provider’s address https://www.mintscan.io/akash/account/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0

It seems all of them are caused by this address (the client) – https://www.mintscan.io/akash/account/akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda

before proceeding investigating what’s going on, I’ve stopped my akash provider. Once I’ve stopped it, the last transaction was of the Close Bid type and has failed too failed to execute message; message index: 0: unknown lease for bid (1000uakt fee). TX: C69BEA60167545285ED45C36B4A9FD324A6D82F998A8609C94E9DBE17D029703 without making any changes to the akash provider, I’ve started it again:

Jul 05 10:33:23 lamborghini start-provider.sh[2246051]: I[2021-07-05|10:33:23.136] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: 6118F8426E89C514853866926A124057984698EAEC5CD1BA3D9C9B0426B3CB48\n  Raw Log: []\n  Logs: []" err=null
Jul 05 10:33:23 lamborghini start-provider.sh[2246051]: I[2021-07-05|10:33:23.136] bid complete                                 module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1

what I’m seeing now is the Create Bid transaction succeeded! with same 1000uakt fee. TX: C69BEA60167545285ED45C36B4A9FD324A6D82F998A8609C94E9DBE17D029703

It seems quite dangerous to me that someone can just cause my provider to start bleeding the funds every 5 minutes :confused:

it looks like that his Create Bid transaction TX: 6118F8426E89C514853866926A124057984698EAEC5CD1BA3D9C9B0426B3CB48 was to Deposit 50 AKT. I can see now that my provider’s address (akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0) 50 AKT are now gone … (temporarily, perhaps?)

akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda 's deployment sequence (DSEQ) was 1597593 and is currently closed

$ akash query deployment get   --owner akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda   --node $AKASH_NODE   --dseq 1597593
deployment:
  created_at: "1597594"
  deployment_id:
    dseq: "1597593"
    owner: akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda
  state: closed
  version: GuoOgKOl6wiMH9pmkv1T+SDP6f9aN3NB9Ai6J47UCuU=
escrow_account:
  balance:
    amount: "0"
    denom: uakt
  id:
    scope: deployment
    xid: akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1597593
  owner: akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda
  settled_at: "1597631"
  state: closed
  transferred:
    amount: "42700"
    denom: uakt
groups:
- created_at: "1597594"
  group_id:
    dseq: "1597593"
    gseq: 1
    owner: akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda
  group_spec:
    name: westcoast
    requirements:
      attributes: []
      signed_by:
        all_of: []
        any_of: []
    resources:
    - count: 1
      price:
        amount: "5000"
        denom: uakt
      resources:
        cpu:
          attributes: []
          units:
            val: "500"
        endpoints:
        - kind: SHARED_HTTP
        memory:
          attributes: []
          quantity:
            val: "536870912"
        storage:
          attributes: []
          quantity:
            val: "536870912"
  state: closed

and here is when he attempted creating his deployment on my provider, it was just a single attempt:

"2021-06-30T20:03:30Z","06A05BB4A34CC63FD698791CB5EE768397D661317711BA054C51FED68CE3E6A5","/akash.deployment.v1beta1.MsgCreateDeployment",""
"2021-06-30T20:04:26Z","9D961BCBEBB0BE9DFB58715DC60C59E02696CE2E47EB64780DD4D80F568DEA52","/akash.market.v1beta1.MsgCreateLease","akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda","1597593",1,1,"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"
"2021-06-30T20:07:16Z","EB7CFD0B7ADE6B893AD84E20B160D70FA4E4FEC36C77ADD64BC7143D53A51D03","/akash.deployment.v1beta1.MsgCloseDeployment",""

Thinking on how do I unlock my 50 AKT now, I’ve decided to try stopping my akash provider once again:

I hit:

E[2021-07-05|11:09:50.293] error unreserving reservation
root@lamborghini:~# systemctl stop akash-provider
root@lamborghini:~# journalctl --no-pager -u akash-provider --since '1 min ago' -l
-- Logs begin at Sun 2021-07-04 11:07:53 UTC, end at Mon 2021-07-05 11:10:25 UTC. --
Jul 05 11:09:30 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:30.158] syncing sequence                             cmp=client/broadcaster local=1409 remote=1409
Jul 05 11:09:40 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:40.158] syncing sequence                             cmp=client/broadcaster local=1409 remote=1409
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:50.158] syncing sequence                             cmp=client/broadcaster local=1409 remote=1409
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.293] shutting down                                module=provider-service cmp=balance-checker
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:50.293] draining order monitors                      module=bidengine-service qty=2
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.293] shutdown complete                            module=provider-service cmp=balance-checker
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:50.293] shutting down                                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:50.293] shutting down                                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.293] unreserving reservation                      module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.293] draining deployment managers...              module=provider-cluster cmp=service qty=0
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.293] unreserving reservation                      module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: E[2021-07-05|11:09:50.293] error unreserving reservation                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1 err="not running"
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: E[2021-07-05|11:09:50.294] error unreserving reservation                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1 err="not running"
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.294] closing bid                                  module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: D[2021-07-05|11:09:50.294] closing bid                                  module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: E[2021-07-05|11:09:50.294] closing bid                                  module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1 err="context canceled"
Jul 05 11:09:50 lamborghini systemd[1]: Stopping Akash Provider...
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: E[2021-07-05|11:09:50.294] closing bid                                  module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1 err="context canceled"
Jul 05 11:09:50 lamborghini start-provider.sh[2246051]: I[2021-07-05|11:09:50.294] Waiting on provider attributes service       module=bidengine-service
Jul 05 11:09:50 lamborghini systemd[1]: akash-provider.service: Succeeded.
Jul 05 11:09:50 lamborghini systemd[1]: Stopped Akash Provider.

Starting akash provider:

Jul 05 11:11:00 lamborghini start-provider.sh[2272342]: I[2021-07-05|11:11:00.030] group fetched                                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:11:00 lamborghini start-provider.sh[2272342]: I[2021-07-05|11:11:00.030] requesting reservation                       module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:11:00 lamborghini start-provider.sh[2272342]: D[2021-07-05|11:11:00.031] reservation requested                        module=provider-cluster cmp=service cmp=inventory-service order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1 resources="group_id:<owner:\"akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda\" dseq:1598194 gseq:1 > state:open group_spec:<name:\"westcoast\" requirements:<signed_by:<> > resources:<resources:<cpu:<units:<val:\"500\" > > memory:<quantity:<val:\"536870912\" > > storage:<quantity:<val:\"536870912\" > > endpoints:<> > count:1 price:<denom:\"uakt\" amount:\"5000\" > > > created_at:1598195 "
Jul 05 11:11:00 lamborghini start-provider.sh[2272342]: I[2021-07-05|11:11:00.031] Reservation fulfilled                        module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1
Jul 05 11:11:00 lamborghini start-provider.sh[2272342]: I[2021-07-05|11:11:00.031] Fulfillment already exists                   module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1

Looks like my 50 AKT stuck in a limbo state forever?

I’ve just figured I can initiate that Withdraw Lease transaction manually, which makes me think that is what my Akash provider was trying to perform every 5 minutes, bleeding the 1000uakt in fee’s each time.

here is same transaction, failed for the same reason failed to execute message; message index: 0: payment closed
TX: A0A7A0E443D03368D84EBA3A7E171B958E9BE69BC76B1FB5E21ED7B77089BD94

that’s how I’ve manually issued it from my provider:

$ akash tx market lease withdraw --chain-id akashnet-2 --dseq 1597593 --gseq 1 --oseq 1 --from andy --provider akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 --owner akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda --fees 1000uakt

Yay! I’ve finally figured it and got my 50 AKT back to my own provider.

so the Create Bid transaction that happened right after I’ve restarted akash provider was somehow related to some weirdness when my akash provider started to bleed trying to perform Withdraw Lease (probably before attempting the Close Bid) as you can see I’ve described above.

Now, how I’ve got my 50 AKT back.
Looking into the Create Bid, I took a note of the dseq, gseq, oseq, order owner params and have issued akash tx market bid close with these:

$ akash query tx 6118F8426E89C514853866926A124057984698EAEC5CD1BA3D9C9B0426B3CB48
...
      deposit:
        amount: "50000000"
        denom: uakt
      order:
        dseq: "1598194"
        gseq: 1
        oseq: 1
        owner: akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda
...
$ akash tx market bid close --chain-id akashnet-2 --dseq 1598194 --gseq 1 --oseq 1 --from andy --provider akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 --owner akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda --fees 1000uakt

That’s all :slight_smile:

But, still need to figure how to prevent provider from bleeding due to repeatedly failing (every 5 mins) Withdraw Lease txs …

2 Likes

Thanks for reporting this. I’m escalating this issue to the lead developers.

Thank you so much for the detailed report @andy01!

I’m sorry that it took this much digging into.

Provider funds are held in an escrow account until they’re with drawn - similar to withdrawing staking rewards (and for the same reason: performance.) Because of this, we have a process that withdraws funds for a lease if the provider’s account balance falls below a certain amount.

The account balance is checked every five minutes; your account’s balance is apparently below the threshold, and so it is trying to withdraw every five minutes.

So that is one problem for your case that we should address.

Secondly, it is apparently not catching that the bid had been closed. This could be from a missed event, but we should handle that case by looking at the result status as well.

Both of these issues are relatively easy to fix. Thanks again for the report, it’s very helpful. If you need more funds to test around with, please let us know and we will facilitate.

Cheers,
-Adam

Edit: tickets for this report:

1 Like

Thank you @abozanich for filing the github reports!

IIUC, the tooLow is BidMinDeposit (DefaultBidMinDeposit) which is currently set to 50 AKT.

The OrderMaxBids is defaultOrderMaxBids which currently is 20.

Does it mean that, to be on a safe side, the provider I’m running with these defaults should at least have the balance of [OrderMaxBids x BidMinDeposit] + t ? (i.e. 50 AKT * 20 MaxBids + 10 = 1010 AKT) so it can accommodate for all 20 bids ? // I’m introducing t as an extra variable for extra funds to allow the provider sign the transaction it needs during its lifetime

Sorry for noob question, I still haven’t really read the whitepaper yet… which I probably should, one day :slight_smile:

Thanks in advance!

That would definitely be enough, but it’s not quite necessary. The OrderMaxBids that you’re looking at limits the number of bids that can be made for a single order - 20 providers are currently allowed to bid on an order.

You need BidMinDeposit for every open bid that you have. Losing bids are closed automatically. It’s good to have some room so that you can bid on as many leases as you are able to fulfill, but room for 20 bids might be excessive.

I’ll DM you so that we can get some more tokens in your hands to play around with. I really appreciate you kicking the tires!

@abozanich

I figured I am missing 50 AKT on my provider (I had 110 AKT, was seeing only 60 AKT) today, despite there was no active deployment.
https://www.mintscan.io/akash/account/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0

I had to manually close this bid in order to get my 50 AKT free again…

akash tx market bid close --chain-id akashnet-2 --dseq 1842661 --gseq 1 --oseq 1 --from andy --provider akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 --owner akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda --fees 1000uakt

TX https://www.mintscan.io/akash/txs/FB08AD4E2FEB0C243EAEB1D9B384B41BE5FA1F625E82A9B07AF3A40ED50927C0

I had to hunt it down by checking my akash provider logs for “error unreserving reservation”, and trying to close each of those:

Jul 24 19:43:06 lamborghini start-provider.sh[2407606]: E[2021-07-24|19:43:06.727] error unreserving reservation                module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1931824/1/1 err="not running"
Jul 24 19:43:06 lamborghini start-provider.sh[2407606]: E[2021-07-24|19:43:06.727] error unreserving reservation                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1842661/1/1 err="not running"
Jul 24 19:43:06 lamborghini start-provider.sh[2407606]: E[2021-07-24|19:43:06.727] error unreserving reservation                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1598194/1/1 err="not running"
Jul 24 19:43:06 lamborghini start-provider.sh[2407606]: E[2021-07-24|19:43:06.727] error unreserving reservation                module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1 err="not running"

Now, I’m wondering, is there a better and easier way to see the following:

  • how many AKT’s my Akash provider currently has in escrow in total? (as it’s not easy to see, unless one can remember how many AKTs one had)
  • which deployment owner is holding my Akash provider’s deposit in its custody? (as the only way I could find this out is through hunting down the messages “error unreserving reservation” and trying to close bid for each deployment manually)

Luckily, I kept the track of how many AKT’s my provider has so I could spot something was going on…
Otherwise, I would just forget about how many does my provider have on its balance…

I guess, some folks can forget this too one day.

It’d be also nice if it’d be possible to see in the Akash explorer the following:

  • the amount of AKT’s held in escrow deposit;
  • maybe some information about the active/inactive deployments too; (including the broken ones which I had to manually hunt down and close the bid for…)

https://www.mintscan.io/akash
https://akash.bigdipper.live

There is a configuration option that should help with this:

--bid-timeout duration                        time after which bids are cancelled if no lease is created (default 5m0s)

The provider should close bids automatically if they aren’t accepted in the configured time period. This should return the deposit to the wallet used by your provider. The only exception I can think of to this is that if you restart the provider, I don’t think it is going to close the bids which were opened previously.

bid-timeout is set to 5min by default, so that’s not the issue nor the solution to the issue.

The akash provider restart could potentially be the cause, but I’m more worried about that there is no easy way one can detect these ghost bids as they reduce the providers balance due to escrow deposits.

Should I open a separate issue for this?

1 Like

it also happened with my provider.
I got my 50 AKT back after manually closing the bid

akash query market bid list --provider akash1rt2qk45a75tjxzathkuuz6sq90jthekehnz45z --state open

akash tx market bid close --chain-id akashnet-2 --dseq 1965165 --gseq 1 --oseq 1 --from dehazelabs --provider akash1rt2qk45a75tjxzathkuuz6sq90jthekehnz45z --owner akash15egam89a3qruarxh3xe24fuuadea0ajre85meh --fees 200uakt -b sync -y```

Looks like the issue is still present.

I’ve noticed I’m losing about 1 AKT a day.

I’ve got 4 deployments:

$ akash query market lease list --node $AKASH_NODE --provider akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 --state=active --page 1 --limit 999999 -o json | jq -r '.leases[] | [(.lease.lease_id | .provider, .owner, .dseq, .gseq, .oseq), (.lease.price.amount|tonumber), .lease.state, (.escrow_payment.withdrawn.amount|tonumber / pow(10; 6))] | @csv' | column -s',' -t | sort -n -k6,6
"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"  "akash1yd0e6wkrc6h0xj75yru0eqfr7hsqlenl9jecw6"  "2334461"  1  1  4  "active"  0.55738  <<<< this one I've just killed by a mistake
"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"  "akash1aa5jv8ufkrgmjkqellk9t6thjcgvxuxf2yaaz4"  "2350086"  1  1  5  "active"  0.618605
"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"  "akash1uk60wngpcm6vd50f232jyccw8pn9efzgld57c4"  "2396400"  1  1  8  "active"  0.619184
"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"  "akash1yd0e6wkrc6h0xj75yru0eqfr7hsqlenl9jecw6"  "2378262"  1  1  8  "active"  0.764328

And my provider is issuing a lot of akash.market.v1beta1.MsgWithdrawLease TX’s spending 1000uakt on those:

here file.md · GitHub

Or, when sorted:

$ akash query txs --events "message.sender=akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0&message.action=withdraw-lease" --page 76 --limit 100  | jq -r '.txs[] | [.timestamp, .txhash[0:8], .height, (.tx | (.auth_info.fee.amount[0].amount|tonumber / pow(10; 6)), (.body.messages[] | ."@type", (.bid_id | (.dseq, .gseq, .oseq,  .owner, .provider))))] | @csv' | tr ',' ' ' | awk '{print $6 $9}' | sort -n | uniq -c
     12 "2334461""akash1yd0e6wkrc6h0xj75yru0eqfr7hsqlenl9jecw6"
     13 "2350086""akash1aa5jv8ufkrgmjkqellk9t6thjcgvxuxf2yaaz4"
     12 "2378262""akash1yd0e6wkrc6h0xj75yru0eqfr7hsqlenl9jecw6"
     12 "2396400""akash1uk60wngpcm6vd50f232jyccw8pn9efzgld57c4"

All of them are related to 4 currently active deployments on my Akash Provider.

My Akash Provider was built from sources - master branch, this commit chore: Export DeploymentIDOptions (#1360) · ovrclk/akash@bd78206 · GitHub

I’ve opened a github issue Akash Provider is bleeding · Issue #1363 · ovrclk/akash · GitHub

The settings for this are configurable. See here