How to Split Text Files by Line in Python

Learn how to extract plain text content line-by-line into an efficient object array.

Jul 25, 2024

Each line of a plain-text (txt) file is neatly separated from the next, which makes it an excellent format for storing code, data, and normalized rich text content. These clearly defined boundaries allow us to easily slice and dice content for a variety of purposes, including parsing, script debugging, text mining (sentiment analysis, keyword extraction, etc.), and more.

In this tutorial, we’ll learn how to call an API that splits text files into a series of strings - exactly one string per line of the original txt document. It’ll return our text content in an array of objects, with each object containing the specific line number the text content came from. This supports multiple newline types - meaning we can perform uniform text splitting operations on txt files originating from Windows and Unix-like systems. It makes processing text lines in downstream operations extremely simple, efficient, and organized.

Step 1: Install the SDK

To kick things off, we’ll first run the below pip install command to install the SDK:

pip install cloudmersive-convert-api-client

Step 2: Add the Imports:

Next up, we’ll add the following imports to our controller:

from __future__ import print_function
import time
import cloudmersive_convert_api_client
from cloudmersive_convert_api_client.rest import ApiException
from pprint import pprint

Step 3: Configure an API Key

We’ll now copy the below configuration snippet into our file, and we’ll enter an API key within this snippet to authorize our requests:

# Configure API key authorization: Apikey
configuration = cloudmersive_convert_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'

To get an API key, we just need to visit the Cloudmersive website and create a free account. With a free account, we’ll get a limit of 800 API calls per month (the total resets at the beginning of each month).

Step 4: Create an Instance of the API Class

To wrap up our API call, we’ll now instance the API and call the text splitting function with our file path:

# create an instance of the API class
api_instance = cloudmersive_convert_api_client.SplitDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Input file to perform the operation on.

try:
    # Split a single Text file (txt) into lines
    api_response = api_instance.split_document_txt_by_line(input_file)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling SplitDocumentApi->split_document_txt_by_line: %s\n" % e)

When our operation is complete, we can process each object in our response array separately, and we can use the line number data to efficiently organize our information.

That’s all there is to it - now we can streamline text processing in our Python applications using just a few lines of code!

Cloudmersive Technical Blog

Ready for more?