Scan File Uploads for Scripts, Macros, and Executables in Java

Learn how to scan files for a wide range of potential threats, including scripts, macros, executables, and more.

Aug 20, 2024

As much as viruses and malware remain prevalent threats in our contemporary security landscape, they aren’t the only threats worth worrying about. While the practice of systematically studying and categorizing virus and malware signatures has become more efficient and effective in the last few decades (many such attacks use identifiable versions of the same virus and malware libraries), other file-based attack methods have picked up steam and become increasingly more difficult to identify.

Attacks with malicious code in the form of scripts, macros, and executables aren’t new by any stretch of the imagination; they are, however, much easier to conceal from traditional antivirus software, and they’re often capable of causing comparable (or even greater) damage than traditional virus and malware attacks. Mistakenly opening a macro-enabled Office document can rapidly escalate into a remote code execution attack, and accessing a disguised executable file (dressed up as a JPG file, for example) can quickly corrupt or exfiltrate sensitive data without the victim’s knowledge.

Rather than attempting to study and categorize these attacks in the same way we’ve approached identifying viruses and malware, it’s best to take an even more stringent security approach. Scripts, macros, and executables are unique and identifiable file types at their core, which means applying strict content verification policies against those file types will identify them and set them apart from other content in a file upload process. On the one hand, this approach may deny perfectly legitimate file uploads from entering our system, but on the other, should we really be taking a risk with anything identifiable as executable content in the first place?

Tutorial: Scan Files for Custom Content Threats in Java

In this tutorial, we’ll learn how to take advantage of a virus scanning API that simultaneously checks files for two categories of threats:

Virus and malware signatures (sourced from a continuously updated database of 17+ million signatures)
Custom content threats including scripts, macros, executables, invalid files (i.e., disguised files), XML external entities & JSON insecure deserialization files, password protected files, and more.

We’ll learn how to structure our API calls in Java and quickly increase our web applications’ protection from a broad range of potential threats.

Step 1: Install the Maven SDK

We’ll kick off our walkthrough by adding references to the repository and dependency in our pom.xml. Jitpack is used to dynamically compile the library.

Let’s add the following repository reference:

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

And the following dependency reference:

<dependencies>
<dependency>
    <groupId>com.github.Cloudmersive</groupId>
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
    <version>v4.25</version>
</dependency>
</dependencies>

Step 2: Add the Import Classes

With installation out of the way, we’ll now add the import classes to the top of our file:

// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ScanApi;

Step 3: Configure an API Key

In our penultimate step, we’ll add in a snippet to capture our API key and authorize our connection.

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

If we don’t have a Cloudmersive API key, we can get one by visiting the Cloudmersive website and creating a free account. Free accounts allow a limit of 800 API calls per month with zero commitments (our API call limit will reset in perpetuity unless we decide to scale up our plan).

Step 4: Instance the API & Configure Request Variables

We’ll now set our request variables, which include our file path and a variety of customizable threat rules against the content types mentioned above. I’d recommend setting all of the below values to “false”, but the idea is to make sure you can still allow certain threatening file types if they’re completely necessary in your specific file upload/file scanning scenario.

ScanApi apiInstance = new ScanApi();
File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on.
Boolean allowExecutables = true; // Boolean | Set to false to block executable files (program code) from being allowed in the input file.  Default is false (recommended).
Boolean allowInvalidFiles = true; // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document.  Default is false (recommended).
Boolean allowScripts = true; // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file.  Set to true to allow these file types.  Default is false (recommended).
Boolean allowPasswordProtectedFiles = true; // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords.  Set to true to allow these file types.  Default is false (recommended).
Boolean allowMacros = true; // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
Boolean allowXmlExternalEntities = true; // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
Boolean allowInsecureDeserialization = true; // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
Boolean allowHtml = true; // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats.  Set to true to allow these file types.  Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
String restrictFileTypes = "restrictFileTypes_example"; // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files.  All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false.  Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
try {
    VirusScanAdvancedResult result = apiInstance.scanFileAdvanced(inputFile, allowExecutables, allowInvalidFiles, allowScripts, allowPasswordProtectedFiles, allowMacros, allowXmlExternalEntities, allowInsecureDeserialization, allowHtml, restrictFileTypes);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling ScanApi#scanFileAdvanced");
    e.printStackTrace();
}

Our try/catch statement will return any relevant error messages if our request fails for any reason, and we can easily track those to make sure suspicious files were scanned properly.

That’s all there is to it! Now we have a simple, low code solution capable of detecting a wide range of potential threats.

Cloudmersive Technical Blog

Ready for more?