WCX_TESS - C++ image to text & PDF to text converter in the form of TC packer plugin.
Based on code from Tesseract, Poppler, Leptonica and/or OpenCV libraries.
Text recognition here working using "trained models" from Tesseract.
Russian and English models are included in basic archive (*.traineddata files).
If you need any other models, download it and write language codes into "redtess.json" config.
You need "langs" key for this. Mixed records such as "eng, rus" are allowed.
You will see all these values in TC panel as virtual archive's files with txt extension.
There is "Fast" version of "trained models" by default.
It works fast, though can have some problems (but no so bad!).
But you can get "Best" version of models using this link:
https://github.com/tesseract-ocr/tessdata_best
And replace tessdata folder content.
Or use normal models:
https://github.com/tesseract-ocr/tessdata
Also you can enable support of many other image formats (see "formats" key in config).
You can use any of Leptonica or OpenCV supported pictures with this plugin.
Multi-page at this moment enabled for TIFF format.
PDF get rasterized in memory before recognition, so try to tune DPI in configuration file.
Leptonica is default library for plugin, but you can switch to OpenCV. |