Invoice Processing
Automate invoice processing by extracting structured data from PDF documents.
At this time, extracting structured data only works for images. So, this guide assumes you PDF invoice has a single page. If your invoice has multiple pages, you must run the extract-text operation on each page separately.
Automating invoice processing can significantly reduce manual data entry, prevent errors, and speed up your accounts payable workflow. With the Scan Documents API, you can extract structured data from your invoices, making it easy to integrate with your accounting system.
This guide will demonstrate how to build an automated invoice processing solution using the Scan Documents API.
See in Postman
This guide's API calls are available as a Postman collection. You can use it to quickly test the API and see how it works.
Business Problem
Your company receives hundreds of invoices in PDF format every month. Manually extracting information like the invoice number, due date, line items, and total amount is a tedious and error-prone process. You need a way to automate this data extraction to improve efficiency and accuracy.
Solution
The Scan Documents API can help you solve this problem by extracting structured data from your PDF invoices. Here's how:
Step 1: Upload the Invoice PDF
First, you'll upload the invoice PDF to the API.
Upload a File
Creates a new file
curl -X POST "https://api.scan-documents.com/v1/files" \
-H "x-api-key: YOUR_API_KEY" \
-F name="Invoice" \
-F file="@/path/to/your/invoice.pdf"
The API will respond with a file object for each uploaded image. Take note of the file IDs from each response, as you'll need them for the next step.
{
"id": "file_hvx41hshvvy1shop",
"name": "Invoice",
"type": "application/pdf",
"properties": {
"size": 29537,
"page_count": 1
},
"task_id": null,
"created_at": "2025-08-23T15:46:37.000Z"
}
Download Sample Invoice PDF
You can use this sample invoice PDF to follow along with the guide.
Step 2: Convert PDF to Image
Next, convert the PDF to an image format that the extract-text
operation can process. We'll convert it to a high-resolution PNG image.
Render PDF to Image
Creates a task to render a PDF file into one or more image files.
curl -X POST "https://api.scan-documents.com/v1/pdf-operations/render"
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "file_ajw8fgkjlmzxrzz3"
}'
The API will respond with a task object. Once the task is completed, its result
will contain the ID of the rendered image file.
{
"id": "task_xmcg7g55pu45myau",
"operation": "render",
"status": "completed",
"parameters": {
"input": "file_hvx41hshvvy1shop",
"dpi": 300
},
"result": {
"generated_files": [
{
"id": "file_d2l001p7fw1yhcb7",
"name": "Receipt - 1",
"type": "image/png",
"properties": {
"size": 588331,
"width": 2380,
"height": 3368
},
"task_id": "task_xmcg7g55pu45myau",
"created_at": "2025-08-23T15:48:20.000Z"
}
]
},
"callback_url": null,
"created_at": "2025-08-23T15:48:13.000Z",
"updated_at": "2025-08-23T15:48:21.000Z"
}
Step 3: Extract Structured Data
Now, we'll use the extract-text
operation with a JSON schema to extract the invoice number, due date, line items, and total amount.
Here's the JSON schema we'll use:
{
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The invoice number."
},
"due_date": {
"type": "string",
"description": "The due date of the invoice in YYYY-MM-DD format."
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "The description of the line item."
},
"quantity": {
"type": "number",
"description": "The quantity of the line item."
},
"unit_price": {
"type": "number",
"description": "The unit price of the line item."
},
"amount": {
"type": "number",
"description": "The total amount for the line item."
}
},
"required": ["description", "quantity", "unit_price", "amount"]
}
},
"total_amount": {
"type": "number",
"description": "The total amount of the invoice."
}
},
"required": ["invoice_number", "due_date", "line_items", "total_amount"]
}
Now, let's make the API call:
Extract Text
Creates a task to extract text from a specified image file.
curl -X POST "https://api.scan-documents.com/v1/image-operations/extract-text" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "file_d2l001p7fw1yhcb7",
"format": "json",
"schema": {
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The invoice number."
},
"due_date": {
"type": "string",
"description": "The due date of the invoice in YYYY-MM-DD format."
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "The description of the line item."
},
"quantity": {
"type": "number",
"description": "The quantity of the line item."
},
"unit_price": {
"type": "number",
"description": "The unit price of the line item."
},
"amount": {
"type": "number",
"description": "The total amount for the line item."
}
},
"required": ["description", "quantity", "unit_price", "amount"]
}
},
"total_amount": {
"type": "number",
"description": "The total amount of the invoice."
}
},
"required": ["invoice_number", "due_date", "line_items", "total_amount"]
}
}'
The result of this task will be a JSON object containing the extracted data:
{
"id": "task_dqnr9ijylyl8ynf3",
"operation": "extract-text",
"status": "completed",
"parameters": {
"input": "file_d2l001p7fw1yhcb7",
"format": "json",
"schema": {
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The invoice number."
},
"due_date": {
"type": "string",
"description": "The due date of the invoice in YYYY-MM-DD format."
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "The description of the line item."
},
"quantity": {
"type": "number",
"description": "The quantity of the line item."
},
"unit_price": {
"type": "number",
"description": "The unit price of the line item."
},
"amount": {
"type": "number",
"description": "The total amount for the line item."
}
},
"required": [
"description",
"quantity",
"unit_price",
"amount"
]
}
},
"total_amount": {
"type": "number",
"description": "The total amount of the invoice."
}
},
"required": [
"invoice_number",
"due_date",
"line_items",
"total_amount"
]
}
},
"result": {
"format": "json",
"content": "{\n \"invoice_number\": \"26B34523-DRAFT\",\n \"due_date\": \"2022-02-05\",\n \"line_items\": [\n {\n \"description\": \"Shoes\",\n \"quantity\": 1,\n \"unit_price\": 48.99,\n \"amount\": 48.99\n }\n ],\n \"total_amount\": 48.99\n}"
},
"callback_url": null,
"created_at": "2025-08-23T15:54:36.000Z",
"updated_at": "2025-08-23T15:54:46.000Z"
}
The parsed content will look like this:
{
"invoice_number": "26B34523-DRAFT",
"due_date": "2022-02-05",
"line_items": [
{
"description": "Shoes",
"quantity": 1,
"unit_price": 48.99,
"amount": 48.99
}
],
"total_amount": 48.99
}
With the structured data extracted, you can now easily integrate it into your accounting system, ERP, or any other application, streamlining your entire accounts payable process.