Software Engineering
third-party-libraries
Updated Wed, 28 Sep 2022 19:45:28 GMT

How do I handle malformed compressed input data, which crashes external library?


I have a java application, which recieves compressed files as input. The application then reads the header information of said files and passes the compressed bytes to an external native library for decompression (JNI). In one of the files we recieved, there was a corrupt blob of compressed bytes within the input file wich leads to a hard crash of the dynamically loaded library and our application(no exceptions etc.).

Upon inspection of the compressed array which is passed to the library we verified that the data is indeed corrupt, while the header information is fine.

The question I have is:

How can I prevent my application from crashing from these corrupted input files?

Thoughts:

  • To me it seems, that there is no way to inspect compressed data for validity... without decompressing.
  • Inspecting the header file for some kind of sanity check is not enough, as the header information is well formed.
  • Changing the called library to be more robust for the malformed data would effectively result in forking the external library, which I want to avoid if possible.

Any pointers are appreciated.




Solution

If you are going to use a native library which might crash (regardless whether the input data is malformed, or that library has a bug), the only safe way to prevent your own application against being "crashed" as well is to run the library in a separate process. Unfortunately, this often means some extra work, since you will have to implement some kind of interprocess communication between your app and the "wrapper app" for the library you will probably need to build.

For most real-world cases, processes (and only processes) provide a sufficient isolation level to protect other apps from being shut down when a library gets a stackoverflow, or tries some illegal memory access.

The only alternative to the former suggestion or forking is to ask the libraries' author for implementing better error handling, or make a pull request to them in case you are willing to implement the missing error handling by yourself. However, even if the author is willing to assist, a native library of certain complexity always bears a certain risk of introducing certain kinds of bugs which cannot be handled by a simple try/catch in your Java application. If you want to be safe, try both: ask the author for a library change and wrap it into its own process.





Comments (3)

  • +8 – For completeness: switching libraries could also be a alternative. — Aug 26, 2022 at 11:28  
  • +0 – If asking the author consider sending a sample of the corrupt data that can cause the problem. Reproducible bugs are more likely to get squished. — Aug 26, 2022 at 12:11  
  • +0 – @JonasH: that is not wrong, but often easier said than done. There must be another lib for the same purpose available, it must fulfill ones functional and nonfunctional requirements (including license terms, performance, security, etc), and it should have higher robustness than the former one. That are a lot of extra preconditions. — Aug 29, 2022 at 15:56