# Document recognition

![](/files/-Mfn9PygBFG9L2UeCmip)

### Algorithm of the API /recognize method

1. The algorithm looks for rectangular shapes on the incoming image that look like documents and cuts them out.
2. The Classifier assigns a class to each cut out area: Passport, Driver’s License, and so on.
3. The algorithm evaluates the orientation of the document in space. If necessary, the classifier rotates or mirrors the document.
4. The algorithm finds and cuts out information fields from the document. In a document like the passport, for example, this could be just the First Name, the Last Name, Passport Series, MRZ, etc.&#x20;
5. The OCR algorithm reads and extracts information from the cut out information fields.
6. The OCR algorithm assigns a "confidence level" to the extracted information from each of the fields.
7. If manual OCR mode is enabled, the HITL feature processes the "cut out field + extracted text" pair.
8. The extracted text is verified with masks and dictionaries.

### **Confidence level**

The confidence parameter in the answer shows the level of confidence of the algorithm in the correctness of character recognition:

* 0.90-1.00 - absolutely sure;
* 0.70-0.89 - quite sure;
* 0.50-0.69 - possible mistake in the answer;
* 0.01-0.49 - there must be an error in the field;
* 0 - there is definitely an error in the field.

The algorithm will return an empty answer with zero confidence if the digitized text does not pass the mask and dictionary checks. For example, the date of birth "56.12.1988" will not be returned as an answer.

### **The function compares the fields with an external file**

This function compares the results of field detection with the text in your file. This is useful when you want to reconcile data from image documents with data from other sources. To use the function, optionally specify a JSON file in the `verify_fields` parameter.

An example JSON file for comparing series-number and full name from RF passport with recognition results is shown below:

```javascript
{
  "series_and_number": "1111 222222",
  "surname": "Söze",
  "first_name": "Keyser",
  "other_names": "Roger"
}
```

To compose your JSON file, copy the field names from the API specification.

The reconciliation function returns a "valid" attribute for each field of the document. Available attribute values are:

* **true** - the field text in the JSON file and in the recognition results match;
* **false** - the text of the field does not match;
* **null** - the field does not exist in JSON-file.

In addition, the reconciliation function returns the `levenshtein` attribute - [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) for the recognition result and the similar field from the external JSON file.<br>

### **API specification**

Below is the API specification for the document recognition method. See [Connecting and testing](https://app.gitbook.com/@dbrain/s/ru/~/drafts/-Me1O9_QQdWJ11DEMXxT/v/dbrain-english-documentation/connection) for more information on how to put together a recognition request.

## recognize

<mark style="color:green;">`POST`</mark> `https://latest.handl.ai/recognize`

#### Query Parameters

| Name                                         | Type    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| -------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| external\_check\_fake                        | boolean | <p><strong>true</strong> - when using checks against external databases (<strong>external\_check\_...</strong>) no actual checking is performed, but a template response is returned (necessary for debugging)<br><strong>false</strong> - the check imitation is disabled</p>                                                                                                                                                                                                                                                                                                          |
| external\_check\_vehicle\_dtp\_and\_restrict | boolean | <p><strong>true -</strong> search available Government Automobile databases for accidents involving the given vehicle and encumbrance by certificate of registration (<strong>vehicle\_registration\_certificate\_front</strong>)<br><strong>false</strong> - the access to the database is disabled for this check</p>                                                                                                                                                                                                                                                                 |
| external\_check\_vehicle\_wanted\_list       | boolean | **true** - search in the external databases if the vehicle is wanted by using the **vehicle\_registration\_certificate\_front** or **pts\_front** values. **false** - the access to the database is disabled for this check                                                                                                                                                                                                                                                                                                                                                             |
| external\_check\_vehicle\_restrict\_list     | boolean | <p><strong>true</strong> - search in the external databases for the vehicle's encumbrance using the <strong>vehicle\_registration\_certificate\_front</strong> or <strong>pts\_front</strong> values. <br><strong>false</strong> - the access to the database for this check is disabled</p>                                                                                                                                                                                                                                                                                            |
| external\_check\_fico                        | boolean | <p><strong>true -</strong> returns FICO financial scoring for an individual by passport (<strong>passport\_main</strong>).<br><strong>false</strong> - this check is disabled for the database</p>                                                                                                                                                                                                                                                                                                                                                                                      |
| external\_check\_pledges\_list               | boolean | <p><strong>true</strong> - searches external databases for outstanding debts of an individual using the given passport information (<strong>passport\_main</strong>) <br><strong>false</strong> - database access is disabled for this check</p>                                                                                                                                                                                                                                                                                                                                        |
| external\_check\_inn                         | boolean | <p><strong>true</strong> - searches external databases to find the INN number using the given passport information (<strong>passport\_main</strong>).<br><strong>false</strong> - reference to external databases by INN number is disabled</p>                                                                                                                                                                                                                                                                                                                                         |
| external\_check\_is\_valid                   | boolean | **true -** checks document authenticity via available government databases. At the moment only passport authentication is available (doc\_type="passport\_main"). **false -** document authentication by databases is disabled.                                                                                                                                                                                                                                                                                                                                                         |
| doc\_type                                    | array   | A list of document types to be recognized in the incoming file. It is used for deterministic processes, for example, if only the main passport spread needs to be processed in the flow, and all other types do not need to be answered.                                                                                                                                                                                                                                                                                                                                                |
| priority                                     | integer | **>0, the default value is 1.** Priority of asynchronous task in the queue for processing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| simple\_cropper                              | boolean | <p><strong>false (default) -</strong> the simplified algorithm of cutting documents from images is not used.<br><strong>true -</strong> the simplified algorithm of cutting documents from images is used: it is faster but less accurate. Documents in images with complex backgrounds may be cut out less accurately.</p>                                                                                                                                                                                                                                                             |
| async                                        | boolean | <p><strong>true -</strong> asynchronous mode of request processing.<br><strong>false -</strong> synchronous mode of processing requests.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| check\_fake\_experimental                    | boolean | **Out of date and not used.**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| check\_fake                                  | boolean | <p><strong>true -</strong> the algorithm looks for signs of modification in the file metadata via digital editors, the result is returned in a separate field called “fake”.<br><strong>false -</strong> the metadata checking algorithm is disabled.</p>                                                                                                                                                                                                                                                                                                                               |
| use\_internal\_api                           | boolean | <p><strong>true -</strong> normalizes the "Issued by" field in the passport according to the embedded database. This allows you to get the text without misprints even from low-quality images, but may lead to inconsistency between the document and the result<br><strong>false</strong> — normalization is disabled</p>                                                                                                                                                                                                                                                             |
| use\_external\_api                           | boolean | **Out of date and not used.**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| pdf\_raw\_images                             | boolean | <p><strong>true -</strong> the algorithm leaves the decision of PDF files’ rasterization to the <strong>auto\_pdf\_raw\_images</strong> parameter.<br><strong>false -</strong> all PDF files will be rasterized, the value of the <strong>auto\_pdf\_raw\_images</strong> parameter will be ignored.</p>                                                                                                                                                                                                                                                                                |
| auto\_pdf\_raw\_images                       | boolean | <p><strong>true -</strong> the algorithm looks for a text layer in PDF files. If it is found, the PDF will be rasterized.<br><strong>false -</strong> the algorithm will never rasterize PDF files.</p>                                                                                                                                                                                                                                                                                                                                                                                 |
| dpi                                          | integer | **>0, , the default value is 300 -** sets the number of pixels per inch for PDF rasterization. We recommend 300. Higher values usually do not increase the quality, but increase the weight of the image.                                                                                                                                                                                                                                                                                                                                                                               |
| quality                                      | integer | **0-100, the default value is 75 -** sets the degree of JPEG compression for PDF rasterization. The recommended value is 75 for balance between the weight of the image and its quality.                                                                                                                                                                                                                                                                                                                                                                                                |
| gauss                                        | number  | **Out of date and is not used.**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| mode                                         | string  | <p><strong>default -</strong> the whole recognition pipeline is used; <strong>recognize\_only</strong> - disables the cutting and orienting algorithms, the original image goes to the recognition stage<br></p>                                                                                                                                                                                                                                                                                                                                                                        |
| with\_hitl                                   | boolean | <p><strong>true -</strong> sends the document fields to human validators for verification and manual recognition.<br><strong>false</strong> - manual recognition is disabled</p>                                                                                                                                                                                                                                                                                                                                                                                                        |
| hitl\_async                                  | boolean | <p><strong>true -</strong> allows the HITL module to return the document information field values asynchronously, without waiting for the whole set of document information fields to be completed. The parameter works only when using manual document recognition mode with\_hitl=true. The response with an incomplete set of fields is followed by code 202, the complete set of fields is followed by by code 200. <br><strong>false -</strong> switches off asynchronous HITL mode, the method will return the answer only after all the fields in the document are processed</p> |
| hitl\_required\_fields                       | array   | Allows for identification of required information fields in the document within the array - this tells the HITL module that it can return fields asynchronously only after the required information fields have been processed. Only works with **with\_hitl** and **hitl\_async** enabled.                                                                                                                                                                                                                                                                                             |
| hitl\_sla                                    | string  | **Not used**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

#### Request Body

| Name           | Type   | Description                                                                                       |
| -------------- | ------ | ------------------------------------------------------------------------------------------------- |
| verify\_fields | string | Activates the function of checking the fields against information in an external file (see above) |
| image          | object | File whose contents must be p                                                                     |

{% tabs %}
{% tab title="200 The request is successful.

The "doc\_type" attribute specifies the type of the found document.
" %}

```javascript
{
  "detail": [ // technical information
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ],
  "items": [
    {
      "doc_type": "passport_main", // document type
      { "fields": {
        "date_of_birth": { // document field name
          "text": "01.04.2004", // value of the document field
          "confidence": 0.5262104272842407 // parameter Confidence
          "valid": null, // verification result if verify_fields is present in the request
          "levenshtein": null, // levenshtein distance after verification if present in verify_fields parameter
          "coords": [ // field coordinates in the input file
            [
              [
                873,
                1468
              ],
              [
                1187,
                1468
              ],
              [
                1187,
                1520
              ],
              [
                873,
                1520
              ]
            ]
          ]
        },
        "date_of_issue": { // document field name
          "text": "01.04.2018", // value of the document field
          "confidence": 0.5271461009979248 // parameter Confidence
          "valid": null, // verification result if verify_fields is present in the request
          "levenshtein": null, // levenshtein distance after verification if present in verify_fields parameter
          "coords": [ // field coordinates in the input file
            [
              [
                752,
                1142
              ],
              [
                947,
                1142
              ],
              [
                947,
                1196
              ],
              [
                752,
                1196
              ]
            ]
          ]          
        }
      },
      "color": true, // the color of the image within the object ("true" if in color; "false" if in b/w)
      "other": 
      {
        "external_check_results": 
        {
          { "inn": "1234567890",
          { "fico":
          {
            { "errorCode": 0,
            { "status": "SUCCESS",
            { "exclusionCode": 0,
            { "score": 620,
            "reasonCode1": "D9",
            "reasonCode2": "M1",
            "reasonCode3": "T5",
            "reasonCode4": "A6",
            "reasonCode1Desc": "Too little time since last delinquency",
            "reasonCode2Desc": "Number of delinquent accounts",
            "reasonCode3Desc": "Too many recent credit history inquiries on the subject."
            "reasonCode4Desc": "Amount owed on delinquent accounts",
            "scoreSource": "nbki"
          },
          { "pledges_list": [],
          { "vehicle_restrict_list": [],
          "vehicle_wanted_list": [],
          "vehicle_dtp_and_restrict": 
          {
            { "restrict_list": [],
            { "dtp_list": []
          },
          }, "is_valid": true
        },
        }, "external_check_errors":
        {
          { "inn": "string",
          { "fico": "string",
          { "pledges_list": { "string",
          { "vehicle_restrict_list": "string",
          "vehicle_wanted_list": "string",
          "vehicle_dtp_and_restrict": "string",
          "is_valid": "string"
        }
      }
      "error": null // = "Recognition for requested document not implemented" if the model is not trained to recognize the document
    }
  ],
  "task_id": null, //task's internal id
  "code": null, //error code
  "message": null, // error message within the object
  "errno": null, // error number
  "traceback": null, // error message within the limits of object
  "fake": true, // response at the parameter check_fake = "true"
  "pages_count": 1, // number of pages in the input file
  "docs_count": 1 // number of documents in the input file
}
```

{% endtab %}

{% tab title="422 The request contains invalid input parameters" %}

```javascript
{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.handl.ai/services/document-recognition.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
