PYTHON FOR CHARACTER RECOGNITION – TESSERACT
Tesseract is an optical character recognition tool in Python. It is used to detect embedded characters in an image. Tesseract, when integrated with powerful libraries like OpenCV, can be used to combine the tasks of localizing text (Text detection) in an image along with understanding what the text is (Text recognition).
INSTALLATION PYTHON (3.X):
Open terminal/ command prompt and type:
~pip install pytesseract
~pip install opencv-python
OPENING A SIMPLE IMAGE:
Import cv2.
Import pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread() function and pass the name of the image as parameter.
To resize the image use cv2.resize() function and pass the required resolution.
Use cv2.imshow(‘window_name’, image_name).
Add a cv2.waitKey(0) to display image for infinity.
import pytesseract import cv2 img = cv2.imread('test.jpg') img = cv2.resize(img, (720, 480)) cv2.imshow('Result', img) cv2.waitKey(0)
CONVERTING IMAGE TO STRING
- Import cv2, pytesseract.
- Save the test image in the same directory.
- Create a variable to store the image using cv2.imread() function and pass the name of the image as parameter.
- Use cv2.imshow(‘window_name’, Image_name).
- To convert to string use pytesseract.image_to_string(‘image_name’) and store it in a variable.
- Print the string.
- Add a cv2.waitKey(0) to display image for infinity.
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
print(pytesseract.image_to_string(img))
cv2.imshow('Result', img)
cv2.waitKey(0)
CONVERTING IMAGE-TEXT TO AUDIO
To convert image to audio we first need to convert image to text and text to audio.
Import tesseract and cv2
Import os.
Open command prompt and type ~pip install gtts.
From gtts import gTTS.
Follow the above steps to convert image to string.
Store the extracted string in a variable.
Play the audio using gTTS() function and pass the parameter as text, language.
Save the audio using save() function.
Play the audio using os.system(‘file_name’)
import pytesseract import cv2 from gtts import gTTS import os img = cv2.imread('test.jpg') img = cv2.resize(img, (600, 360)) hImg, wImg, _ = img.shape boxes = pytesseract.image_to_boxes(img) xy = pytesseract.image_to_string(img) for b in boxes.splitlines(): b = b.split(' ') x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4]) cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (50, 50, 255), 1) cv2.putText(img, b[0], (x, hImg - y + 13), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (50, 205, 50), 1) cv2.imshow('Detected text', img) audio = gTTS(text = xy, lang = 'en', slow = False) audio.save("saved_audio.wav") os.system("saved_audio.wav")