Full-text recognition

The methods /fulltext and /fulltext_by_lines return all the text from any documents. They have many differences from the /recognize method: they do not search for specific fields, do not use dictionaries and masks and cannot send text for manual re-chec

king.

API specification

Below is the API specification for 2 full-text recognition methods . For more details on how to compose a query, see Connecting and testing.

fulltext

POST https://latest.handl.ai/fulltext

The tool requires access to the cloud version of Handl to work correctly. The text is returned word by word, each word is accompanied by a confidence level

Query Parameters

Name
Type
Description

proprity

integer

Task priority, takes "1" by default

async

boolean

true - request in asynchronous mode, see "Asynchronous mode" in the "Connecting and testing" section. false - request in the synchronous mode

doc2pdf

boolean

true - returns the PDF file with the recognition results embedded in the text layer. false - standard mode of working

Request Body

Name
Type
Description

image

string

File to be recognized

{
  "detail": [], // technical information
  "items": [
    {
      "words": [
        {
          "text": "text", // the word from the text
          "confidence": 0.8697810769 // confidence level of the fact that word recognized correctly
        },
        {
          "text": "example", // the word from the file
          "confidence": 0.8697810769 // confidence level of the fact that word recognized correctly
        }
      ]
    }
  ],
  "task_id": null, // inner id of the task
  "code": null, // code of error
  "message": null, // message with the error description
  "errno": null, // code of error
  "traceback": null, // message with the error description
  "fake": null,
  "pages_count": null,
  "docs_count": null
}

fulltext_by_lines

POST https://latest.handl.ai/fulltext_by_lines

The tool can work in a closed internal IT system. The text is returned line by line, each line is accompanied by a confidence level

Query Parameters

Name
Type
Description

priority

integer

Task priority, takes "1" by default

async

boolean

true - request in asynchronous mode, see "Asynchronous mode" in the "Connecting" section. false - request in the synchronous mode

language

boolean

true - returns in response the PDF file with the recognition results embedded in the text layer false - standard mode of working

Request Body

Name
Type
Description

image

string

File to be recognized

{
  { "detail": [], // technical information
  { "items": [
    {
      }, "words": [
        {
          { "text": "text", // a string from the text in the input file
          "confidence": 0.8697810769 // confidence of the recognized string
        },
        {
          "text": "example", // text string in the input file
          "confidence": 0.8697810769 // confidence of the recognized string
        }
      ]
    }
  ],
  "task_id": null, //task's internal id
  "code": null, // error code
  "message": null, // error message within the object
  "errno": null, // error number
  "traceback": null, // error message within the limits of object
  "fake": null, // not used in this method
  "pages_count": null, // not used in this method
  "docs_count": null // not used in this method
}

Last updated

Was this helpful?