250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Python] EasyOCR ์„ ์ด์šฉํ•œ ์ด๋ฏธ์ง€์—์„œ ํ•œ๊ธ€ ์ธ์‹ํ•˜๊ธฐ ๋ณธ๋ฌธ

๐Ÿ–ฅ๏ธ Computer Vision/ocr

[Python] EasyOCR ์„ ์ด์šฉํ•œ ์ด๋ฏธ์ง€์—์„œ ํ•œ๊ธ€ ์ธ์‹ํ•˜๊ธฐ

ใ…… ใ…œ ใ…” ใ…‡ 2022. 10. 19. 17:26
728x90

EasyOCR ์„ ์‚ฌ์šฉํ•˜๋ฉด ์†์‰ฝ๊ฒŒ ์ด๋ฏธ์ง€์—์„œ ํ•œ๊ธ€ ํ…์ŠคํŠธ๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋‹ค. 

ํ˜„์žฌ ์•ฝ 80์—ฌ๊ฐœ์˜ ์–ธ์–ด๋ฅผ ์ง€์›ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์˜คํ”ˆ์†Œ์Šค์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ˆ„๊ตฌ๋‚˜ ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋จผ์ €, pip์„ ์ด์šฉํ•˜์—ฌ ์„ค์น˜๋ฅผ ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. 

!pip install easyocr

 

 

 

ํ•„์š”ํ•œ ๋ชจ๋“ˆ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

import matplotlib.pyplot as plt
from imutils.perspective import four_point_transform
from imutils.contours import sort_contours
import imutils
from easyocr import Reader
import cv2
import requests
import numpy as np
from PIL import ImageFont, ImageDraw, Image
import os
import re
import tqdm

 

 

 

๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ

train.csv์™€ test.csv ๋‚ด์— ๊ฐ๊ฐ์˜ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ๋กœ๊ฐ€ ๋“ค์–ด์žˆ์Œ

train, test ํด๋”๋‚ด์—๋Š” ์ด๋ฏธ์ง€๋ฐ์ดํ„ฐ์…‹์ด pngํ˜•ํƒœ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์Œ

 

 

 

csv ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

train_df = pd.read_csv('../data/train.csv')
test_df = pd.read_csv('../data/test.csv')

 

 

 

 

 

image๋ฅผ Loadํ•˜๋Š” ํ•จ์ˆ˜ ์ƒ์„ฑ

def load_image(img_path):
    image_path = os.path.join('../data', img_path[2:])
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return img

 

 

 

๊ฒฐ๊ณผ ๋ฐ˜ํ™˜

tqdm.pandas()
results = []

for i_path in test_df['img_path']:
    img = load_image(i_path)
    
    reader = Reader(lang_list=['ko'], gpu=True)
    result = reader.readtext(img, detail = 0)
    text = ''.join(result)
    results.append(text)

 

 

 

 

ํŠน์ˆ˜๋ฌธ์ž, ์ˆซ์ž, ๊ณต๋ฐฑ ์ œ๊ฑฐ

n_result = [re.sub(r"[^\uAC00-\uD7A30-9a-zA-Z\s]", "", x) for x in results]
n_result = [re.sub(r"[0-9]", "", x) for x in n_result]
n_result = [x.replace(" ", "") for x in n_result]

n_result

 

 

 

 

 

๊ฒฐ๊ณผ๋ฅผ csvํŒŒ์ผ๋กœ ์ƒ์„ฑ

sub_df = pd.read_csv('../data/sample_submission.csv')
sub_df['text'] = n_result
sub_df.to_csv('easyocr.csv',index=False)
728x90
Comments