250x250
Link
๋์ GitHub Contribution ๊ทธ๋ํ
Loading data ...
Notice
Recent Posts
Recent Comments
์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- datascience
- ํ์ด์ฌ
- DBSCAN
- ๋น ๋ฐ์ดํฐ
- ๋ฐ์ดํฐ๋ถ์์ค์ ๋ฌธ๊ฐ
- ๋น ๋ฐ์ดํฐ๋ถ์๊ธฐ์ฌ
- ๋์ํ๋ณธ
- ADP
- Python
- ๋ฐ์ดํฐ๋ถ์์ ๋ฌธ๊ฐ
- ADsP
- ์๋ํด๋ผ์ฐ๋
- iloc
- t-test
- PCA
- ์ธ๋์ํ๋ง
- ์ฃผ์ฑ๋ถ๋ถ์
- ํฌ๋กค๋ง
- ๋ ๋ฆฝํ๋ณธ
- pandas
- ๊ตฐ์งํ
- dataframe
- numpy
- ํ ์คํธ๋ถ์
- ๋ฐ์ดํฐ๋ถ์
- ์ค๋ฒ์ํ๋ง
- ๋ฐ์ดํฐ๋ถ๊ท ํ
- LDA
- opencv
- Lambda
Archives
Data Science LAB
[Python] EasyOCR ์ ์ด์ฉํ ์ด๋ฏธ์ง์์ ํ๊ธ ์ธ์ํ๊ธฐ ๋ณธ๋ฌธ
๐ฅ๏ธ Computer Vision/ocr
[Python] EasyOCR ์ ์ด์ฉํ ์ด๋ฏธ์ง์์ ํ๊ธ ์ธ์ํ๊ธฐ
ใ ใ ใ ใ 2022. 10. 19. 17:26728x90
EasyOCR ์ ์ฌ์ฉํ๋ฉด ์์ฝ๊ฒ ์ด๋ฏธ์ง์์ ํ๊ธ ํ ์คํธ๋ฅผ ์ธ์ํ ์ ์๋ค.
ํ์ฌ ์ฝ 80์ฌ๊ฐ์ ์ธ์ด๋ฅผ ์ง์ํ๊ณ ์์ผ๋ฉฐ, ์คํ์์ค์ด๊ธฐ ๋๋ฌธ์ ๋๊ตฌ๋ ๋ฌด๋ฃ๋ก ์ฌ์ฉํ ์ ์๋ค.
๋จผ์ , pip์ ์ด์ฉํ์ฌ ์ค์น๋ฅผ ํด์ฃผ์ด์ผ ํ๋ค.
!pip install easyocr
ํ์ํ ๋ชจ๋ ๋ถ๋ฌ์ค๊ธฐ
import matplotlib.pyplot as plt
from imutils.perspective import four_point_transform
from imutils.contours import sort_contours
import imutils
from easyocr import Reader
import cv2
import requests
import numpy as np
from PIL import ImageFont, ImageDraw, Image
import os
import re
import tqdm
๋ฐ์ดํฐ์ ๊ตฌ์ฑ
train.csv์ test.csv ๋ด์ ๊ฐ๊ฐ์ ์ด๋ฏธ์ง ๋ฐ์ดํฐ์ ๊ฒฝ๋ก๊ฐ ๋ค์ด์์
train, test ํด๋๋ด์๋ ์ด๋ฏธ์ง๋ฐ์ดํฐ์ ์ด pngํํ๋ก ์ ์ฅ๋์ด ์์
csv ํ์ผ ๋ถ๋ฌ์ค๊ธฐ
train_df = pd.read_csv('../data/train.csv')
test_df = pd.read_csv('../data/test.csv')
image๋ฅผ Loadํ๋ ํจ์ ์์ฑ
def load_image(img_path):
image_path = os.path.join('../data', img_path[2:])
img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img
๊ฒฐ๊ณผ ๋ฐํ
tqdm.pandas()
results = []
for i_path in test_df['img_path']:
img = load_image(i_path)
reader = Reader(lang_list=['ko'], gpu=True)
result = reader.readtext(img, detail = 0)
text = ''.join(result)
results.append(text)
ํน์๋ฌธ์, ์ซ์, ๊ณต๋ฐฑ ์ ๊ฑฐ
n_result = [re.sub(r"[^\uAC00-\uD7A30-9a-zA-Z\s]", "", x) for x in results]
n_result = [re.sub(r"[0-9]", "", x) for x in n_result]
n_result = [x.replace(" ", "") for x in n_result]
n_result
๊ฒฐ๊ณผ๋ฅผ csvํ์ผ๋ก ์์ฑ
sub_df = pd.read_csv('../data/sample_submission.csv')
sub_df['text'] = n_result
sub_df.to_csv('easyocr.csv',index=False)
728x90
'๐ฅ๏ธ Computer Vision > ocr' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Python] Naver Cloud ํ๊ธ OCR (0) | 2022.10.18 |
---|---|
[Python] Mac์์ tesseract ์ค์นํ๊ธฐ ๋ฐ ํ๊ธ ์ถ๊ฐ (0) | 2022.09.15 |
Comments