How to Automate Excel and CSV Reports with Python

Why automate Excel and CSV reports?

Many business reports start with raw CSV or Excel files. The manual workflow usually includes opening a spreadsheet, cleaning rows, fixing column names, calculating totals, creating summaries, and saving a final report.

Python can turn this repeated workflow into a reusable script. This saves time, reduces manual mistakes, and makes the reporting process easier to repeat every week or month.

What this workflow does

A practical Excel and CSV automation workflow usually includes these steps:

Step 01

Read Files

Load data from CSV, XLSX, or XLS files into a Pandas DataFrame.

Step 02

Validate Data

Check required columns and stop the script if important data is missing.

Step 03

Clean Rows

Normalize dates, text fields, quantities, prices, and invalid rows.

Step 04

Export Report

Create an Excel file with cleaned data and useful summary sheets.

Required input format

For this example, the input file should include these columns:

date
product
category
quantity
unit_price

sample sales_sample.csv

date,product,category,quantity,unit_price
2026-01-05,USB Cable,Accessories,5,8.5
2026-01-07,Wireless Mouse,Accessories,2,24.9
2026-02-10,External SSD,Storage,2,89.9
2026-03-01,Monitor,Hardware,1,210

Python example

The following example shows the core idea: read data, calculate revenue, create summaries, and export the result as an Excel file.

generate_report.py

from pathlib import Path
import pandas as pd


def generate_report(input_file: str, output_file: str) -> None:
    input_path = Path(input_file)
    output_path = Path(output_file)

    if not input_path.exists():
        raise FileNotFoundError(f"Input file not found: {input_path}")

    df = pd.read_csv(input_path)

    required_columns = {"date", "product", "category", "quantity", "unit_price"}
    missing_columns = required_columns - set(df.columns)

    if missing_columns:
        raise ValueError(f"Missing required columns: {', '.join(missing_columns)}")

    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df["quantity"] = pd.to_numeric(df["quantity"], errors="coerce").fillna(0).astype(int)
    df["unit_price"] = pd.to_numeric(df["unit_price"], errors="coerce").fillna(0)

    df = df.dropna(subset=["date"])
    df = df[df["quantity"] > 0]

    df["total_revenue"] = df["quantity"] * df["unit_price"]
    df["month"] = df["date"].dt.to_period("M").astype(str)

    monthly_summary = (
        df.groupby("month", as_index=False)
        .agg(total_revenue=("total_revenue", "sum"))
        .sort_values("total_revenue", ascending=False)
    )

    category_summary = (
        df.groupby("category", as_index=False)
        .agg(total_revenue=("total_revenue", "sum"))
        .sort_values("total_revenue", ascending=False)
    )

    with pd.ExcelWriter(output_path, engine="openpyxl") as writer:
        df.to_excel(writer, sheet_name="Cleaned Data", index=False)
        monthly_summary.to_excel(writer, sheet_name="Monthly Summary", index=False)
        category_summary.to_excel(writer, sheet_name="Category Summary", index=False)

    print(f"Report saved to: {output_path}")


if __name__ == "__main__":
    generate_report(
        input_file="sample_data/sales_sample.csv",
        output_file="output/sales_report.xlsx",
    )

What the Excel report contains

A useful automated report should not only copy raw data into a new file. It should structure the output in a way that helps the user understand the data faster.

Cleaned Data: normalized rows with calculated revenue.
Monthly Summary: revenue grouped by month.
Category Summary: revenue grouped by category.
Product Summary: revenue grouped by product.
Report Info: metadata such as source file, total rows, and date range.

Where this is useful

This kind of automation can be used in many small business workflows:

Monthly sales reports
Inventory summaries
Client order reports
Finance exports
Marketing campaign reports
CSV cleanup workflows

The best automation projects are usually small but repetitive. If a report is created often, it is a good candidate for Python automation.

Portfolio project

I created a GitHub repository for this exact workflow. The project includes sample data, a reusable report generator, a demo script, documentation, and basic tests.

Conclusion

Python is a strong tool for automating Excel and CSV reporting tasks. A well-structured script can read raw files, clean data, create summaries, and export a useful report without repeating the same manual spreadsheet work.