Scan GCP Storage for Viruses & Malware in Python

Maintain a high degree of cloud object storage security with a powerful in-storage scanning API.

Jul 05, 2024

Scalable object storage is one of the greatest advantages modern cloud computing has to offer. Pay-as-you-go cloud storage models allow small & fast-growing online businesses to capture sudden increases in client-side (website/web app) uploads without suffering from the debilitating overhead costs of server provisioning and management.

It’s not all sunshine and roses, however. Increased participation in cloud storage subscriptions in the last decade has also turned object storage containers into attractive vectors for malware distribution.

Most businesses paying for object storage containers intend to disseminate the files they’re storing to a number of (internal or external) clients, and threat actors know that. Those threat actors also know that cleverly obfuscated malware stands a good chance of bypassing generic file upload security policies. By successfully uploading a malicious file to an object storage container, they might be able to launch attacks against a number of different victims.

Tutorial: Scan GCP Objects for Viruses, Malware, and Other Threats in Python

Scanning files for threats in storage provides crucial security redundancy for fast-expanding object storage containers. In this tutorial, we’ll learn how to call an API that scans objects for a wide range of malicious content types (including viruses, malware, and custom content threats) directly from our GCP storage container. We’ll use Python code examples to structure our API call. Before we get started, we’ll need to gather the following information from our GCP storage account:

Bucket Name
Object Name (i.e., file name)
JSON Credential File (service account credential for GCP)

Step 1: Install the SDK

To kick off our tutorial, we’ll first install the Cloudmersive Virus API client using the following pip install command:

pip install cloudmersive-virus-api-client

Step 2: Add the Imports

With installation out of the way, we’ll next add the following imports to the top of our controller:

from __future__ import print_function
import time
import cloudmersive_virus_api_client
from cloudmersive_virus_api_client.rest import ApiException
from pprint import pprint

Step 3: Configure an API Key

We’ll use an API key to handle authorization for our cloud scanning API calls. To get an API key, we just need to create a free account on the Cloudmersive website. Our free API key will give us a limit of 800 API calls per month with zero commitments (we can scale up our plan for more API calls if/when our storage containers experience a higher volume of uploads).

When we have our API key, we can include the following configuration snippet in our file and replace the ‘YOUR_API_KEY’ placeholder text with our API key string:

# Configure API key authorization: Apikey
configuration = cloudmersive_virus_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'

Step 4: Instance the API & Call the Virus Scanning Function

We’ll now wrap up our API call by creating an instance of the API and calling the cloud object scanning function.

# create an instance of the API class
api_instance = cloudmersive_virus_api_client.ScanCloudStorageApi(cloudmersive_virus_api_client.ApiClient(configuration))
bucket_name = 'bucket_name_example' # str | Name of the bucket in Google Cloud Storage
object_name = 'object_name_example' # str | Name of the object or file in Google Cloud Storage.  If the object name contains Unicode characters, you must base64 encode the object name and prepend it with 'base64:', such as: 'base64:6ZWV6ZWV6ZWV6ZWV6ZWV6ZWV'.
json_credential_file = '/path/to/inputfile' # file | Service Account credential for Google Cloud stored in a JSON file.
allow_executables = true # bool | Set to false to block executable files (program code) from being allowed in the input file.  Default is false (recommended). (optional)
allow_invalid_files = true # bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document.  Default is false (recommended). (optional)
allow_scripts = true # bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file.  Set to true to allow these file types.  Default is false (recommended). (optional)
allow_password_protected_files = true # bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords.  Set to true to allow these file types.  Default is false (recommended). (optional)
allow_macros = true # bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended). (optional)
allow_xml_external_entities = true # bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
restrict_file_types = 'restrict_file_types_example' # str | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files.  All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false.  Set restrictFileTypes parameter to null or empty string to disable; default is disabled. (optional)

try:
    # Advanced Scan an Google Cloud Platform (GCP) Storage file for viruses
    api_response = api_instance.scan_cloud_storage_scan_gcp_storage_file_advanced(bucket_name, object_name, json_credential_file, allow_executables=allow_executables, allow_invalid_files=allow_invalid_files, allow_scripts=allow_scripts, allow_password_protected_files=allow_password_protected_files, allow_macros=allow_macros, allow_xml_external_entities=allow_xml_external_entities, restrict_file_types=restrict_file_types)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling ScanCloudStorageApi->scan_cloud_storage_scan_gcp_storage_file_advanced: %s\n" % e)

This API will perform a signature-based virus and malware scan (referencing a database of 17+ million signatures), and it’ll also verify the contents of each file to ensure those contents conform with the presented extension. If, for example, a threat actor uploads a JPG image file that really contains executable content, this API will detect the hidden executable content and return a “ContainsExecutable”: true response. We can refer to the below JSON response model to better understand the full scope of our threat diagnostic:

{
  "Successful": true,
  "CleanResult": true,
  "ContainsExecutable": true,
  "ContainsInvalidFile": true,
  "ContainsScript": true,
  "ContainsPasswordProtectedFile": true,
  "ContainsRestrictedFileFormat": true,
  "ContainsMacros": true,
  "ContainsXmlExternalEntities": true,
  "ContainsInsecureDeserialization": true,
  "ContainsHtml": true,
  "ContainsUnsafeArchive": true,
  "ContainsOleEmbeddedObject": true,
  "VerifiedFileFormat": "string",
  "FoundViruses": [
    {
      "FileName": "string",
      "VirusName": "string"
    }
  ],
  "ErrorDetailedDescription": "string",
  "FileSize": 0,
  "ContentInformation": {
    "ContainsJSON": true,
    "ContainsXML": true,
    "ContainsImage": true,
    "RelevantSubfileName": "string"
  }
}

Conclusion

There’s no such thing as “too careful” when it comes to content security. If we allow threat actors to distribute malware through weakly secured object storage containers, we run the risk of damaging our reputation and creating insurmountable setbacks for our business. Implementing an in-storage scanning API to our own security applications is one of the many ways we can quickly improve object storage security.

Interested to learn more about Cloudmersive security APIs? Feel free to contact us directly, and we’ll be happy to answer any questions you have.

Cloudmersive Technical Blog

Ready for more?