1

I'm trying to abort a backup on a NDB cluster like this:

ndb_mgm> abort backup 2304081512
Abort of backup 2304081512 ordered

But there is no change, the backup seems to be stalled :

ndb_mgm> all report backup
Node 2: Backup not started
Node 3: Local backup status: backup 2304081512 started from node 12
 #Records: 1950 #LogRecords: 8705
 Data: 635312 bytes Log: 2676612 bytes
Node 4: Local backup status: backup 2304081512 started from node 12
 #Records: 1768 #LogRecords: 7211
 Data: 522256 bytes Log: 2133088 bytes

An if I try to start a new one, I've got an error because a backup is already running (2304081512):

ndb_mgm> start backup 2304091121 snapshotstart
Connected to Management Server at: 192.168.169.40:1186
Waiting for completed, this may take several minutes
Node 3: Backup request from 12 failed to start. Error: 1302
Backup failed
*  3001: Could not start backup
*        A backup is already running: Permanent error: Application error

All the nodes seem to be okay excepted an anoying message I got on the Node 2 NDB engine:

2023-04-07 10:31:57 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'.

I really don't know how to get out of this and I need help

Best regards Julien

1 Answer 1

0

I had to perform a rolling update on that cluster this weekend in order to upgrade the vms to debian 12 (bookworm). NDB backups were stopped for 3 months and just for curiosity, I ran one after the update and... surprise, it worked !

I really don't know what happened but performing the following tasks on each node one by one solved the issue:

  • stop vrrp to force requests on another node.
  • stop local ndb and mysqld engines
  • stop local ndb-master
  • updgrade operating system to latest bullseye
  • upgrade operating system to bookworm
  • reboot vm
  • start ndb, mysqld and ndb-master
  • control logs and wait for the replication to occur
  • if all is okay, switch to the next node

I don't think this answer is acceptable but it may help someone in the same situation.

(Note that the blocking backup 2304081512 disapeared from all the nodes but I don't know when)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .