Automate S3 File Processing with Python and Lambda

In this article we will learn how to Automate S3 File Processing with Python and Lambda. Managing file uploads and processing data manually can be time-consuming and error-prone, especially at scale. Thankfully, AWS provides a seamless way to automate this workflow using S3 and Lambda. Today we’ll walk through how to set up an event-driven system where any CSV file uploaded to an S3 bucket automatically triggers a Lambda function to process the data using Python. Whether you’re dealing with logs, reports, or real-time data feeds, this solution helps you build a lightweight, serverless pipeline that runs reliably without the need to manage infrastructure.

Table of Contents

Prerequisites

An active AWS Account.
IAM Role with Lambda & S3 permissions.
Basic understanding of Python and AWS Console.

Step #1:Create an AWS S3 Bucket

From the AWS Management Console, navigate to S3 (Simple Storage Service). This bucket will serve as our data landing zone. Click “Create bucket”.

Automate S3 File Processing with Python and Lambda 1

Select Bucket type: General purpose. Provide a unique bucket name (e.g., my-python-aws-lambda-bucket). Remember, S3 bucket names must be globally unique.

Automate S3 File Processing with Python and Lambda 2

You can leave other options as default for this tutorial. Click “Create bucket”.

Automate S3 File Processing with Python and Lambda 3

Now, let’s create a sample CSV file that we’ll use for testing. Open any plain text editor (e.g., Notepad, VS Code). Add the following content.

EmployeeID,Name,Department,Salary
101,Alice Smith,Engineering,75000
102,Bob Johnson,Marketing,60000
103,Charlie Brown,HR,55000
104,Diana Miller,Sales,80000

Automate S3 File Processing with Python and Lambda 4

Save the file as Employee_Details.csv

Navigate back to your newly created S3 bucket in the AWS console. Click the “Upload” button.

Automate S3 File Processing with Python and Lambda 5

Click “Add files” and select your Employee_Details.csv file. Click “Upload”.

Automate S3 File Processing with Python and Lambda 6

Below you can see the file is uploaded successfully.

Automate S3 File Processing with Python and Lambda 7

Step #2:Create an IAM Role for Lambda in AWS

Every AWS service needs permissions to interact with other services. Our Lambda function will need permission to read from S3 and to write logs to CloudWatch.

In the left navigation pane, click on “Roles”.

Automate S3 File Processing with Python and Lambda 8

Then click “Create role”.

Automate S3 File Processing with Python and Lambda 9

For “Select type of trusted entity,” choose “AWS service”. For “Use case,” select “Lambda” and click “Next”.

Automate S3 File Processing with Python and Lambda 10

In the “Add permissions” window, search for and select the following managed policies:

AWSLambdaBasicExecutionRole: Grants permissions for Lambda to write logs to CloudWatch.
AmazonS3ReadOnlyAccess: Grants read-only access to S3 buckets.

Click “Next”.

Automate S3 File Processing with Python and Lambda 11

Automate S3 File Processing with Python and Lambda 12

Enter a descriptive Role name. e.g. Lambda-S3-Trigger-Role. Optionally, add a description and tags. Click “Create role”.

Automate S3 File Processing with Python and Lambda 13

You can see below our role Lambda-S3-Trigger-Role is create successfully.

Automate S3 File Processing with Python and Lambda 14

Step #3:Create the AWS Lambda Function with S3 Triggers

This is where we set up the serverless function that will execute our Python code.

rom the AWS Management Console, navigate to Lambda. Click “Create function”.

Automate S3 File Processing with Python and Lambda 15

Select “Author from scratch”.

Configure the basic information:

Function name: lambda-s3-trigger
Runtime: Choose your Python version (e.g., Python 3.8, Python 3.9, or Python 3.13 if available).
Architecture: x86_64 (default)
Under “Change default execution role,” select “Use an existing role” and choose the IAM role you created Lambda-S3-Trigger-Role.
Click “Create function”.

Automate S3 File Processing with Python and Lambda 16

Now, let’s configure the trigger so that our Lambda function runs automatically when a file is uploaded to S3.

Once the Lambda function is created, you’ll see its overview page. Click on the “Add trigger” button.

Automate S3 File Processing with Python and Lambda 17

In the “Add trigger” configuration:

Select S3 from the dropdown list.
Bucket: Select the S3 bucket you created my-python-aws-lambda-bucket.
Event types: Select “PUT” for finer control.
Suffix: Enter .csv so the function only triggers for CSV files.
Acknowledge the recursive invocation warning (read it carefully!).
Click “Add”.

Automate S3 File Processing with Python and Lambda 18

Now our trigger is added to lambda function.

Automate S3 File Processing with Python and Lambda 19

Step #4:Update the Lambda Code with Python Script

This is the Python code that will read and process your CSV data.

Open your lambda-s3-trigger function in the Lambda console. Navigate to the “Code” tab.

Automate S3 File Processing with Python and Lambda 20

Replace the default code with the following Python script. Important: Click the “Deploy” button to save and deploy your changes.

import json
import boto3
import io
from io import StringIO
import pandas as pd

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
        # Extract bucket name and file name from the S3 event
        s3_Bucket_Name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_File_Name = event["Records"][0]["s3"]["object"]["key"]

        print(f"Processing file: {s3_File_Name} from bucket: {s3_Bucket_Name}")

        # Get the object from S3
        object = s3_client.get_object(Bucket=s3_Bucket_Name, Key=s3_File_Name)
        body = object['Body']

        # Read the CSV content into a Pandas DataFrame
        csv_string = body.read().decode('utf-8')
        dataframe = pd.read_csv(StringIO(csv_string))

        # Print the first 3 rows of the DataFrame to CloudWatch Logs
        print("Successfully read CSV data. Head (first 3 rows):")
        print(dataframe.head(3))

    except Exception as err:
        print(f"An error occurred: {err}")
        # You might want to log the full traceback for debugging in a real scenario
        # import traceback
        # traceback.print_exc()

    return {
        'statusCode': 200,
        'body': json.dumps('Lambda function executed successfully!')
    }

Automate S3 File Processing with Python and Lambda 21

Step #5:Test the Lambda Function

Before we enable full functionality, let’s manually test our code.

In the Lambda console, click the “Test” button.
Choose “Create new event”.
Event name: Enter s3-python-test-event.
Event sharing settings: Select Private.
Event template: Choose s3-put.
In the test event JSON, replace the placeholder bucket name (example-bucket) and object key (test/key) with your S3 bucket name my-python-aws-lambda-bucket and the name of the CSV file you uploaded Employee_Details.csv.
Click “Test”.

Automate S3 File Processing with Python and Lambda 22

You will observe an error in the “Execution Results” and “Function Logs” similar to: ModuleNotFoundError: No module named 'pandas'. This is expected because the Pandas library is not natively available in the default Lambda runtime environment.

Automate S3 File Processing with Python and Lambda 23

To resolve the ModuleNotFoundError, we need to attach a Lambda Layer containing the Pandas library.

On your Lambda function’s overview page, scroll down to the “Layers” section. Click “Add a layer”.

Automate S3 File Processing with Python and Lambda 24

In the “Add layer” window:

Select “AWS layers”.
Search for a layer containing “Pandas” that matches your Python runtime e,g, AWSSDKPandas3.13.
Select the latest “Version” of the chosen Pandas layer.
Click “Add”.

Automate S3 File Processing with Python and Lambda 25

As you can see our layer is added.

Automate S3 File Processing with Python and Lambda 26

Now that the Pandas layer is attached, let’s run the test again.

Go back to your Lambda function’s “Code” tab.
Click the “Test” button.
Run the s3-test-event again.

The “Status” in the “Execution Results” should change to “Succeeded”. In the “Function Logs” (scroll down in the execution results), you should now see the output of print(dataframe.head(3)), which will display the first 3 rows of the data from your Employee_Details.csv file, confirming that Pandas was successfully imported and used to read the CSV.

Automate S3 File Processing with Python and Lambda 27

Step #6:Trigger Lambda with a Real S3 Upload

With our Lambda function now fully operational, let’s see it in action by uploading a new file to S3.

Create another CSV file on your local machine, perhaps named Employee_Details_1.csv, with slightly different data (or entirely new data).

ID,Name,City
201,John Doe,New York
202,Jane Doe,Los Angeles
203,Peter Pan,London

Automate S3 File Processing with Python and Lambda 28

Navigate back to your S3 bucket in the AWS console. Click “Upload”. Select your new Employee_Details_1.csv file and upload it.

Automate S3 File Processing with Python and Lambda 29

Now our file Employee_Details_1.csv is uploaded successfully.

Automate S3 File Processing with Python and Lambda 30

Step #7:Verify Execution in CloudWatch Logs

The Lambda function should have executed automatically after the upload. To see its output from your Lambda function’s overview page, click on the “Monitor” tab. Click the “View CloudWatch logs” button.

Automate S3 File Processing with Python and Lambda 31

This will take you to the CloudWatch Logs console. Click on the most recent “Log stream”

Automate S3 File Processing with Python and Lambda 32

Inside the log stream, you will find the print statements from your Lambda function, including the head(3) output of the data from Employee_Details_1.csv.

Automate S3 File Processing with Python and Lambda 33

Conclusion:

By combining AWS S3 and Lambda, you’ve created a powerful serverless solution that automatically reacts to file uploads and processes data on the fly. This setup not only improves efficiency but also minimizes the operational overhead. Using Python and Pandas adds flexibility, allowing you to manipulate and analyze your data as needed. You can extend this system to perform complex ETL tasks, push data to databases, or trigger downstream services.

Related Articles:

Secure SFTP Setup Using AWS Transfer Family

Reference:

AWS Official Page