Use PaddleOCR
| 604 words
I guess OCR is not something new for us. While there are a lot of open source artificial intelligence engines to achieve this, I need a easy-to-use one.
Recently I got a task to convert images into text. The image number is fairly big. So it’s just impossible to OCR them one by one manually. So I wrote a python script to handle this tedious task.
Basic Processing
The original images contain a identical useless frame around the part that I need. So a crop is required because it will improve the performance (of course, the image is smaller) and there are unrelated texts in the frame.
Cropping is a easy problem. Just install Pillow package with pip:
| |
Then use Pillow to do the cropping:
| |
Now we have cropped images with original filename prefixed by crop-.
Install PaddlePaddle
It’s not easy to install PaddlePaddle with pip because it needs to run some native compilation. Anaconda is also complex to install and generates a lot of garbage files. The cleanest way is to use Docker and with vscode Remote Connect extensions.
Of course you need to install docker first, which is basically out of this blog’s scope.
Then run the following command to create and run the PaddlePaddle image:
| |
Something to note
You can change the mounted volumes to what you want to process.
This image is pulled from
Baidu(the company creates PaddlePaddle) registry, which is fast in China. You can also pull it fromDockerHub.This image’s PaddlePaddle is based on cpu. Of course you have a cpu in your computer. But if you have a GPU or even CUDA, you can select another image with correct tag. But cpu image is almost always work and using GPU is harder to configure.
I don’t known why
--network=hostis needed. The container does not publish any ports. But it can access Internet faster or VSCode Remote Connect needs it?
Install PaddleOCR
This image above only contain PaddlePaddle. PaddleOCR is another package based on it and needs individual install. However, this time we can just use pip again.
| |
Coding
The next step is to write python codes. Also the easiest part! You can connect to the container you just created with vscode and then happy coding!
| |
Now you can do any other things to the the image_text_list .
Finally
Now just run the script. Or even better, customize it.
By the way, PaddleOCR is far more accurate than tesseract in Chinese. Maybe because it is created by Baidu, a Chinese local company or I missed some configuration. For English, I haven’t tested.