Task Operations

Learn about the task operations in the API.

Overview

Task Operations represent asynchronous jobs that you can initiate through the Scan Documents API to process or transform a File.

When you request an operation (like extracting text from an image or merging PDF documents), the API creates a Task object to track its progress. You can then query the status of this task using its unique ID.

Example Task Object

Here's what a typical File object representing a PNG image might look like:

{
  "id": "task_euyrvozb9302uwhq",
  "operation": "extract-text",
  "status": "completed",
  "parameters": {
    "input": "file_abc123xyz",
    "format": "markdown"
  },
  "result": {
    "format": "markdown",
    "content": "**This** is the *extracted* text content"
  },
  "created_at": "2021-05-03T10:00:00Z",
  "updated_at": "2021-05-03T10:05:00Z"
}

Now, let's break down the properties of this Task object.

Properties

Every Task object shares a common structure, regardless of the specific operation being performed:

id
string

A unique identifier for the task (e.g., task_euyrvozb9302uwhq). You use this ID to check the task's status.

operation
string

A string indicating the type of operation requested (e.g., extract-text, convert, merge).

status
string

The current state of the task. See Task Statuses below.

parameters
object

An object containing the specific inputs you provided when creating the task (e.g., the input file ID, target format, quality settings). The structure varies depending on the operation.

result
object

An object containing the outcome of the task. Its structure depends on the status and operation.

  • If status is pending or processing, this object is usually empty.
  • If status is completed, this object contains the successful output (e.g., extracted text content, list of generated file IDs).
  • If status is failed, this object contains error details (error message and details).
created_at
string

The date and time when the task was created, in ISO format (e.g., 2021-05-03T10:00:00Z).

updated_at
string

The date and time when the task's status was last updated, in ISO format (e.g., 2021-05-03T10:05:00Z).

Task Statuses

A task can be in one of the following states:

  • pending: The task has been accepted but has not yet started processing.
  • processing: The task is currently being executed.
  • completed: The task finished successfully. The result object contains the output.
  • failed: The task could not be completed due to an error. The result object contains details about the failure.

Available Operations

Tasks are initiated by making POST requests to specific endpoints under /v1/image-operations/ or /v1/pdf-operations/.

Image Operations

These operations work on image files (image/png, image/jpeg, image/webp).

PDF Operations

These operations work on PDF files (application/pdf).

Error Handling

If a task encounters an issue, its status will change to failed. The result object will then contain:

error
string

A string describing the error.

details
object

An object containing additional context or specifics about the error, if available.

Common reasons for failure include providing an invalid file ID, using incorrect parameters (e.g., invalid page range, unsupported format), or internal processing errors.

Example

Here is an example of a failed task object:

{
  "id": "task_euyrvozb9302uwhq",
  "operation": "extract-text",
  "status": "failed",
  "parameters": {
    "input": "file_abc123xyz",
    "format": "markdown"
  },
  "result": {
    "error": "Source file not found.",
    "details": {
        "file_id": "file_abc123xyz",
        "reason": "The file might have been deleted."
    }
  },
  "created_at": "2021-05-03T10:00:00Z",
  "updated_at": "2021-05-03T10:05:00Z"
}

Waiting for Task Completion

Operations are asynchronous, meaning they may take time to complete.

You can check the status of a task by making a GET request to the task's endpoint, or listen for a webhook notification for the event task.completed to be triggered when the task is completed.

On this page