CrashPlan and Filebot, beware of extended metadata/attributes!

Recently I decided to rename some of my media collection with Filebot, which is great tool for batch renaming media content. I also use CrashPlan on my main file server that backs up my data to the CrashPlan cloud (CrashPlan Central). An interesting issue occurred recently where CrashPlan started getting stuck in a weird cycle and keep disconnecting from the backup destination regularly after the renaming of some files was performed. I was slightly puzzled by this as a rename of files shouldn’t cause any issues, thanks to CrashPlan’s data deduplication processes, CrashPlan will only send the changed data blocks to the cloud rather than the whole file. After some troubleshooting I found the root cause and its all to do with metadata!
Extended file attributes and NTFS
Because my file server operating system is Windows Server 2012 R2 Essentials, the underlying filesystem of my SMB shares is NTFS, I could of used ReFS (as I’m using Storage Spaces), but I chose to stick with NTFS for familiarity when I first configured it. Because NTFS supports extended file attributes, Filebot will store additional information against files it manipulates for various reasons. One example is to provide an easy revert solution in case the renaming process incorrectly labels a file as something it actually isn’t, while rare, this can happen, so you can run a simple revert command to undo this damage. In addition various data about the media content can be stored in extended metadata as well.
Illegal characters in my metadata? Its more likely than you think!
Quite by chance, both CrashPlan and Filebot are Java based applications. While this doesn’t directly mean anything particular it did help with finding the root cause of the issue. After the batch rename operation, CrashPlan had gotten somewhat stuck in a strange backup cycle and was analysing one of the changed media files, once it got to 100%, the connection to CrashPlan Central would immediately disconnect, the application would be “waiting for a connection” and then repeat this process for the same file over and over gain. After some digging in the logs, I find a Java stacktrace is being thrown on the same file each time. Below is the edited stacktrace information to provide some clarity:
STACKTRACE:: com.code42.exception.DebugException: BQ:: Caught unexpected exception...closing session! at com.code42.backup.save.BackupQueue.handleWorkerException(BackupQueue.java:1958) at com.code42.backup.save.BackupQueue.access$1000(BackupQueue.java:97) at com.code42.backup.save.BackupQueue$TodoWorker.handleException(BackupQueue.java:1828) at com.code42.utils.AWorker.run(AWorker.java:150) at java.lang.Thread.run(Unknown Source) Caused by: java.nio.file.InvalidPathException: Illegal char <:> at index 47: X:\Example\Folder\File.mp4:net.filebot.filename at sun.nio.fs.WindowsPathParser.normalize(Unknown Source) at sun.nio.fs.WindowsPathParser.parse(Unknown Source) at sun.nio.fs.WindowsPathParser.parse(Unknown Source) at sun.nio.fs.WindowsPath.parse(Unknown Source) at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source) at java.io.File.toPath(Unknown Source) at com.code42.io.FileUtility.getSafePath(FileUtility.java:653) at com.code42.io.FileUtility.getSafePath(FileUtility.java:637) at com.code42.io.path.FileId.getFileId(FileId.java:139) at com.code42.backup.save.BackupQueue.addResourceFileTodos(BackupQueue.java:1135) at com.code42.backup.save.BackupQueue.processTodo(BackupQueue.java:931) at com.code42.backup.save.BackupQueue.access$800(BackupQueue.java:97) at com.code42.backup.save.BackupQueue$TodoWorker.doWork(BackupQueue.java:1811) at com.code42.utils.AWorker.run(AWorker.java:148)
The issue is the “java.nio.file.InvalidPathException” and the illegal character mentioned at the mentioned index position (47), in this case a colon within the “:net.filebot.filename” portion is triggering the error. This colon however is not in the filename or anywhere obvious. It is actually hiding behind the metadata of the file that you cannot see directly unless you go looking for it, or analyse it with some form of tool. Filebot had added additional information and properties to every file it has renamed, so all of sudden all files would now throw an exception when CrashPlan tried to analyse or send data, but interestingly CrashPlan did not throw an error or indicate any problems via the application interface, this was only visible via the application logs. Something which perhaps needs to be potentially addressed.
Disabling metadata and extended attributes
Filebot has had this feature for sometime and it can be disabled. It is however enabled by default and hence is going to keep writing metadata to any rename jobs performed. In order to disable the functionality, you can edit the filebot.launcher.l4j file, on Windows this will be typically located at in the C:\Program Files\FileBot directory.
Within this config file there will be a block like this:
# use NTFS extended attributes for storing metadata -DuseExtendedFileAttributes=true -DuseCreationDate=false
Ensure that you set useExtendedFileAttributes to false. Make sure not to remove the “-D” part of that line, this is actually how Java properties are passed to the application. In order to edit the file you will need to be running Notepad or a similar text editor application with elevated privileges.
Now when launching Filebot in the future this will prevent the extended attributes being added to any file of any subsequent rename jobs.
Removing extended attributes and metadata
As I had renamed over 400 files already, I needed to clear this metadata, otherwise CrashPlan was going to have problems sending the changed data blocks of files up to the cloud. Fortunately this is easily doable by running a clear operation via command line.
filebot -script fn:xattr --action clear X:\Example\Folder\Path
Replace the example folder path and drive letter with the actual directory required and Filebot will begin removing the metadata from any files it finds it on.
CrashPlan is happy again
Once the extended attributes were cleared CrashPlan was able to start analysing and sending the changed data blocks without any issues and was no longer disconnecting from CrashPlan Central. I’ve reported this discovery to the CrashPlan developers as perhaps this situation can be handled better, particularly as there was no error or warning to the user, and not everyone is technically minded to go and read Java logs files and debug a stacktrace! While not the fault of CrashPlan, I still think the discovery of an illegal character could be handled better, rather than a silent Java stacktrace error being thrown.
Overall however, an interesting problem to discover and relatively easy fix to prevent it happening in the future. CrashPlan can now get back to doing what it does best, safely backing up my files to the cloud!
CrashPlan, Code42, Data for Life, and the stylized C are trademarks of Code42 Software, Inc.