本文最后更新于：2020年9月27日晚上

简介

tesserocr是 Python 的一个 OCR 识别库
其本身其实是对tesseract做的一层python API封装

文档信息

内容	地址
tesserocr GitHub	https://github.com/sirfz/tesserocr
tesserocr PyPi	https://pypi.org/project/tesserocr/
tesseract 下载地址	https://digi.bib.uni-mannheim.de/tesseract/
tesseract 语言包	https://github.com/tesseract-ocr/tessdata
tesseract 文档	https://github.com/tesseract-ocr/tesseract/wiki/Documentation

安装

安装tesseract

首先先下载其核心tesseract

一般选择稳定版下载，也就是那些不带dev/beta之类的字样的版本
下载好了以后安装上去，安装的时候，有附带语言包的选项
要是有梯，直接勾上就能顺便把想安装的语言包安装上

手动安装tesseract语言包

官方Github：https://tesseract-ocr.github.io/tessdoc/Data-Files
官方github里有得下载

下载好以后，放在tesserocr安装目录这个路径下即可

添加tesseract到环境变量中

tesserocr安装

pip install tesserocr pillow
可以先尝试着直接用pip来安装
若是失败的话，可以去下载对应Python版本的whl来进行安装
whl下载地址：
https://github.com/simonflueckiger/tesserocr-windows_build/releases

测试是否安装成功

import tesserocr from PIL import Image img = Image.open(r'example/0_Basic_usage_of_the_library/tesserocr/pic/0_hello_world.png') chars = tesserocr.image_to_text(img) print(chars)
这里的图片路径需要换成你自己的

可能发生的错误

RuntimeError: Failed to init API, possibly an invalid tessdata path:......

将tesseract安装目录下的tessdata复制到其指定的地点即可

python 爬虫 OCR

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

win10-优化上一篇

Python-aiohttp-异步web服务-爬虫下一篇

tesserocr-OCR

简介