Image showing text extraction and summarization process with EasyOCR and GPT-3.

OCR and Text Summarization with EasyOCR and GPT-3 Tutorial

Acquiring Advanced Skills: YOLOv7 and GPT-3 at Your Fingertips

By the end of this AI tutorial, you'll know how to use EasyOCR for text extraction from various sources like photos, and harness the power of OpenAI's GPT-3 for text summarization!

Unraveling EasyOCR: A Software Powerhouse

EasyOCR, a private entity, is a powerhouse in the realm of software publishing, consultancy, and supply. They excel in creating ready-made software, operating systems software, business applications software, and computer games software for all platforms. They also offer custom software solutions after thorough analysis of user needs and problems.

YOLOv7 Unveiled: The Future of Object Detection

YOLOv7, the latest addition to the YOLO family of single-stage object detectors, is a game-changer in the field of object detection. It processes image frames through a backbone to extract features, which are then mixed and combined in a 'neck', and finally passed to the 'head' of the network. Here, it predicts the locations and classes of objects, drawing bounding boxes around them. A post-processing step via non-maximum suppression (NMS) is conducted to arrive at its final prediction.

YOLOv7, the brainchild of WongKinYiu and Alexey Bochkovskiy (AlexeyAB), introduces several changes to the YOLO network and training routines to enhance bounding box accuracy and inference speed. It employs extended efficient layer aggregation, model scaling techniques, re-parameterization planning, and an auxiliary head for coarse-to-fine predictions. The YOLOv7 GitHub repository provides all the necessary code to start training YOLOv7 on custom data, defined in PyTorch and written in Python.

Getting Started With YOLOv7

Installing Dependencies

We will start by downloading the necessary libraries:

pip install easyocr
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Coding

For this tutorial, I will use VSC, but you can use whatever environment you want, including notebooks or Google Colab.

Note: I will use a single file for this tutorial, but feel free to split code into modules.

Import Dependencies

import easyocr
import torch

Text Extraction from Images

For this task, we will use EasyOCR. We will create a class that will be able to extract text for us!

In __init__ method we define the Reader for English, it will use GPU if it is available, and download the models to ./models directory if they are not downloaded yet.

The __call__ method allows us to directly call extract_text method using only the instance of the class like a function, for example:

reader = EasyOCRReader()
extracted_text = reader(image)

The last method is extract_text. It takes an image as an argument and returns a list of extracted text and the image with bounding boxes. It filters out texts with confidence less than 45%. The method returns a tuple: list of extracted texts and image with bounding boxes.

Results

Here are the results after extracting text from an image:

  • Image with bounding boxes
  • Extracted text

Not that bad!

Text Summarization

We've already done a great job! But it's not over yet. Now we move on to text summary using GPT-3.

In this case, we also create a class, that will handle our requests to GPT.

Firstly, I will create a .env file and put my OpenAI API key in here.

Now I will define the class for GPT-3.

GPT-3 Class Definition

class GPT3:
    def __init__(self, api_key):
        self.api_key = api_key
        self.model = 'text-davinci-002'

    def __call__(self, prompt):
        return self.predict(prompt)

    def predict(self, prompt):
        content = openai.Completion.create(model=self.model, prompt=prompt)
        return content['choices'][0]['text']

    def summarize(self, text):
        return self.predict(f'Summarize this text: {text}')

Testing the Code!

After completing the work, our code looks like this:

// Insert your complete code here

I will run the code again and see what happens.

Final Results

We can evaluate the results:

  • Image with bounding boxes
  • Extracted text and summarization results

Wow! Look at this! We really can create a simple app that summarizes our text from a normal photo. Hope you can make good use of it.

Here I leave you a link to the entire repository. Have fun!

How Many AI Apps Can I Build?

That's a silly question because the only limitations are your resources. If you have a really good idea that can solve a real-world problem, you're halfway there. You also need to actually build it. And to market it as well. But we can help you with all of those steps.

Just join our AI Hackathons and tell our amazing community of over 52,000 AI builders from all around the world about your idea. Then build it with them in 7 days and apply it to our AI Slingshot program. It's really easy, right? Lablab.ai is a place for innovation, and we welcome you to join!

Back to blog

Leave a comment