How to detect robocopy's failure to delete from source?

3

On my SQL Server instance I use SQL Agent to run a daily backup job in two steps.

One of the steps uses Robocopy to move local backup files to network storage.

The command to move the files looks like this:

robocopy M:\backups \\NAS\backups$\Database /MOV /MIR /XJ /NP /COPY:DT

The output of the command in the job step history looks like this:

-------------------------------------------------------------------------------     
ROBOCOPY     ::     Robust File Copy for Windows                                
-------------------------------------------------------------------------------      
Started : Fri Jul 20 00:55:42 2012     
Source : M:\backups\       
Dest : \\NAS\backups$\Database        
Files : *.*            
Options : *.* /S /E /COPY:DT /MOV /PURGE /MIR /NP /XJ /R:1000000 /W:30     
------------------------------------------------------------------------------  
3  M:\backups\
      *EXTRA File       15.5 m  GeoDisplay_Full_2012-07-19-000004.bak
      *EXTRA File       41.3 m  GeoDisplay2_Full_2012-07-19-000004.bak
      *EXTRA File      264.1 g  Webstore_Full_2012-07-19-000004.bak
      New File          15.5 m  GeoDisplay_Full_2012-07-20-000002.bak
      New File          41.4 m  GeoStore_Full_2012-07-20-000002.bak
      New File         302.1 g  Webstore_Full_2012-07-20-000002.bak

      2012/07/20 04:34:50 ERROR 32 (0x00000020) Deleting Source File M:\backups\Webstore_Full_2012-07-20-000002.bak  The process cannot access the file because it is being used by another process.       
------------------------------------------------------------------------------
      Total    Copied   Skipped  Mismatch    FAILED    Extras
Dirs :         1         0         1         0         0         0     
Files :         3         3         0         0         0         3     
Bytes : 302.187 g 302.187 g         0         0         0 264.181 g     
Times :   3:38:57   3:38:45                       0:00:00   0:00:11         
Speed :            24720063 Bytes/sec.     
Speed :            1414.493 MegaBytes/min.       
Ended : Fri Jul 20 04:34:50 2012.  
Process exit code 3.

The text output clearly shows ERROR 32. Robocopy failed to honor the /MOV switch (delete from source after copy) because another process had a handle on one file when robocopy tried to delete it.

Robocopy returned exit code 3 (new files copied to destination and extra files deleted from destination). This is correct but incomplete because there is no way to tell from the error code that any operation failed.

SQL Agent considers only the return code, not the command output, when determining the success of a command shell operation. I could modify the Robocopy command to save the output to disk, and parse the output in an extra job step, but this requires extra programming and would add another dependency to the backup job.

Is there any way to detect this kind of failure without searching Robocopy's text output for the string ERROR 32?

windows-server-2008
robocopy
sql-server
asked on Server Fault Jul 20, 2012 by Iain Samuel McLean Elder • edited Jul 25, 2012 by Iain Samuel McLean Elder

2 Answers

5

Whether it's RoboCopy or some other method, I prefer to log all output and then post-process the log. I personally prefer Perl for that task but use whatever you're comfortable with. I aslo like to have the test checking script email me the results, showing just whether it succeeded or in the case of a failure the relevant lines from the log.

In my opinion, any backup operation that does not include reporting the results is a disaster waiting to happen, simply because you cannot possibly have confidence in an operation that is not checked. Humans do an appallingly poor job of checking logs, so make the extra effort to write the script. Without it you can just about be guaranteed that some critical operation will fail one day and you'll be none the wiser for it, possibly risking far more than just the data.

answered on Server Fault Jul 22, 2012 by John Gardeniers
4

The exit code description "new files copied to destination and extra files deleted from destination" is not technically accurate. You should not rely only on the description.

The exit code 0x3 is a bit flag. The friendly message should be interpreted as follows:

"One of more files were copied successfully"

PLUS

"Extra files or directories were detected. Examine the log file for more information."

Code    Meaning
0   No errors occurred and no files were copied.
1   One of more files were copied successfully.
2   Extra files or directories were detected.  Examine the log file for more information.
4   Mismatched files or directories were detected.  Examine the log file for more information.
8   Some files or directories could not be copied and the retry limit was exceeded.
16  Robocopy did not copy any files.  Check the command line parameters and verify that Robocopy has enough rights to write to the destination folder.

Robocopy exit codes
https://blogs.technet.com/b/deploymentguys/archive/2008/06/16/robocopy-exit-codes.aspx

The upshot is John is correct. There's no way around examining the logs if you want to have a robust backup operation. Additionally, you may want to have a script that runs after the Robocopy operation that removes the handle to the dangling files, then removes the files.

answered on Server Fault Jul 22, 2012 by Greg Askew

User contributions licensed under CC BY-SA 3.0