RE Archives - icodebroker

[PYTHON/COMMON] sub 함수 : 문자열 리스트 항목에서 숫자 단어 제거하기

■ sub 함수를 사용해 문자열 리스트 항목에서 숫자 단어를 제거하는 방법을 보여준다. ▶ main.py


import re

resultList1 = ['1 서울 ', '인천 2 ', '3 수원 ', ' 대전 4', '5 광주 ', '6 대구 ', ' 부산 7']

resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1]

print(resultList2)

"""
['서울', '인천', '수원', '대전', '광주', '대구', '부산']
"""

import re

resultList1 = ['1 서울 ', '인천 2 ', '3 수원 ', ' 대전 4', '5 광주 ', '6 대구 ', ' 부산 7']

resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1]

print(resultList2)

"""

['서울', '인천', '수원', '대전', '광주', '대구', '부산']

"""

※ r"\b\d+\b" : 하나 이상의 숫자로 이루어진

[PYTHON/COMMON] Match 클래스 : 매칭 결과 이름 사용하기

■ Match 클래스에서 매칭 결과 이름을 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(r"(?P<area_code>\d{2,3})-(?P<exchange_number>\d{3,4})-(?P<user_number>\d{4})")

match = pattern.match("02-123-4567")

print(match.group("user_number")) # 4567

print(match.start("user_number")) # 7

print(match.groupdict()) # {'area_code': '02', 'exchange_number': '123', 'user_number': '4567'}

import re

pattern = re.compile(r"(?P<area_code>\d{2,3})-(?P<exchange_number>\d{3,4})-(?P<user_number>\d{4})")

match = pattern.match("02-123-4567")

print(match.group("user_number")) # 4567

print(match.start("user_number")) # 7

print(match.groupdict()) # {'area_code': '02', 'exchange_number': '123', 'user_number': '4567'}

[PYTHON/COMMON] Pattern 클래스 : match 메소드 사용하기

■ Pattern 클래스의 match 메소드를 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(r"(\d{2,3})-(\d{3,4})-(\d{4})")

print(bool(pattern.match("02-123-4567"))) # True

print(bool(pattern.match("02-가123-4567"))) # False

print(bool(pattern.match("3402-123-4567"))) # False

print(bool(pattern.match("032-123-4567"))) # True

import re

pattern = re.compile(r"(\d{2,3})-(\d{3,4})-(\d{4})")

print(bool(pattern.match("02-123-4567"))) # True

print(bool(pattern.match("02-가123-4567"))) # False

print(bool(pattern.match("3402-123-4567"))) # False

print(bool(pattern.match("032-123-4567"))) # True

[PYTHON/COMMON] Match 클래스 사용하기

■ Match 클래스를 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(r"(\d{2,3})-(\d{3,4})-(\d{4})")

match = pattern.match("02-123-4567")

print(match.groups()) # ('02', '123', '4567')

print(match.group()) # 02-123-4567

print(match.group(1)) # 02

print(match.start()) # 0

print(match.end()) # 11

print(match.start(2)) # 3

print(match.end(2)) # 6

print(match.string[match.start(2):match.end(3)]) # 123-4567

import re

pattern = re.compile(r"(\d{2,3})-(\d{3,4})-(\d{4})")

match = pattern.match("02-123-4567")

print(match.groups()) # ('02', '123', '4567')

print(match.group()) # 02-123-4567

print(match.group(1)) # 02

print(match.start()) # 0

print(match.end()) # 11

print(match.start(2)) # 3

print(match.end(2)) # 6

print(match.string[match.start(2):match.end(3)]) # 123-4567

[PYTHON/COMMON] compile 함수 : “.*?”을 사용해 lazy(non-greedy) 방식으로 문자열 구하기

■ compile 함수에서 ".*?"을 사용해 lazy(non-greedy) 방식으로 문자열을 구하는 방법을 보여준다. ▶ 예제 코드 (PY)


import urllib.request
import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r"<.*?>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""
['<!doctype html>', '<html>', '<head>', '<title>', '</title>', '<meta charset="utf-8" />', '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />', '<meta name="viewport" content="width=device-width, initial-scale=1" />', '<style type="text/css">', '</style>', '</head>', '<body>', '<div>', '<h1>', '</h1>', '<p>', '</p>', '<p>', '<a href="https://www.iana.org/domains/example">', '</a>', '</p>', '</div>', '</body>', '</html>']
"""

import urllib.request

import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r"<.*?>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""

['<!doctype html>', '<html>', '<head>', '<title>', '</title>', '<meta charset="utf-8" />', '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />', '<meta name="viewport" content="width=device-width, initial-scale=1" />', '<style type="text/css">', '</style>', '</head>', '<body>', '<div>', '<h1>', '</h1>', '<p>', '</p>', '<p>', '<a href="https://www.iana.org/domains/example">', '</a>', '</p>', '</div>', '</body>', '</html>']

"""

[PYTHON/COMMON] compile 함수 : 웹 페이지 TITLE 태그에서 제목 구하기

■ compile 함수를 사용해 웹 페이지의 TITLE 태그에서 제목을 구하는 방법을 보여준다. ▶ 예제 코드 (PY)


import urllib.request
import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r".*?<title.*?>(.*)</title>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""
['Example Domain']
"""

import urllib.request

import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r".*?<title.*?>(.*)</title>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""

['Example Domain']

"""

[PYTHON/COMMON] compile 함수 : “.*”을 사용해 greedy 방식으로 문자열 구하기

■ compile 함수에서 ".*"을 사용해 greedy 방식으로 문자열을 구하는 방법을 보여준다. ▶ 예제 코드 (PY)


import urllib.request
import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r"<.*>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""
['<!doctype html>\\n<html>\\n<head>\\n    <title>Example Domain</title>\\n\\n    <meta charset="utf-8" />\\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\\n    <style type="text/css">\\n    body {\\n        background-color: #f0f0f2;\\n        margin: 0;\\n        padding: 0;\\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\\n        \\n    }\\n    div {\\n        width: 600px;\\n        margin: 5em auto;\\n        padding: 2em;\\n        background-color: #fdfdff;\\n        border-radius: 0.5em;\\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\\n    }\\n    a:link, a:visited {\\n        color: #38488f;\\n        text-decoration: none;\\n    }\\n    @media (max-width: 700px) {\\n        div {\\n            margin: 0 auto;\\n            width: auto;\\n        }\\n    }\\n    </style>    \\n</head>\\n\\n<body>\\n<div>\\n    <h1>Example Domain</h1>\\n    <p>This domain is for use in illustrative examples in documents. You may use this\\n    domain in literature without prior coordination or asking for permission.</p>\\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\\n</div>\\n</body>\\n</html>']
"""

import urllib.request

import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r"<.*>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""

['<!doctype html>\\n<html>\\n<head>\\n <title>Example Domain</title>\\n\\n <meta charset="utf-8" />\\n <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\\n <meta name="viewport" content="width=device-width, initial-scale=1" />\\n <style type="text/css">\\n body {\\n background-color: #f0f0f2;\\n margin: 0;\\n padding: 0;\\n font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\\n \\n }\\n div {\\n width: 600px;\\n margin: 5em auto;\\n padding: 2em;\\n background-color: #fdfdff;\\n border-radius: 0.5em;\\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\\n }\\n a:link, a:visited {\\n color: #38488f;\\n text-decoration: none;\\n }\\n @media (max-width: 700px) {\\n div {\\n margin: 0 auto;\\n width: auto;\\n }\\n }\\n </style> \\n</head>\\n\\n<body>\\n<div>\\n <h1>Example Domain</h1>\\n <p>This domain is for use in illustrative examples in documents. You may use this\\n domain in literature without prior coordination or asking for permission.</p>\\n <p><a href="https://www.iana.org/domains/example">More information...</a></p>\\n</div>\\n</body>\\n</html>']

"""

[PYTHON/COMMON] compile 함수 : MULTILINE(또는 M) 정규식 컴파일 옵션 사용하기

■ compile 함수에서 MULTILINE(또는 M) 정규식 컴파일 옵션을 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

text = """나무가 춤을 추면
바람이 불고,
나무가 잠잠하면
바람도 자오"""

pattern = re.compile("^.+", re.MULTILINE)

list1 = pattern.findall(text)

print(list1)

"""
['나무가 춤을 추면', '바람이 불고,', '나무가 잠잠하면', '바람도 자오']
"""

import re

text = """나무가 춤을 추면

바람이 불고,

나무가 잠잠하면

바람도 자오"""

pattern = re.compile("^.+", re.MULTILINE)

list1 = pattern.findall(text)

print(list1)

"""

['나무가 춤을 추면', '바람이 불고,', '나무가 잠잠하면', '바람도 자오']

"""

[PYTHON/COMMON] compile 함수 : IGNORECASE(또는 I) 정규식 컴파일 옵션 사용하기

■ compile 함수에서 IGNORECASE(또는 I) 정규식 컴파일 옵션을 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

text = "Apple is a big company and apple is very delicious."

pattern = re.compile("apple", re.IGNORECASE)

list1 = pattern.findall(text)

print(list1)

"""
['Apple', 'apple']
"""

import re

text = "Apple is a big company and apple is very delicious."

pattern = re.compile("apple", re.IGNORECASE)

list1 = pattern.findall(text)

print(list1)

"""

['Apple', 'apple']

"""

[PYTHON/COMMON] compile 함수 : 정규 표현식 컴파일하기

■ compile 함수를 사용해 정규 표현식을 컴파일하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(r"app\w*")

list1 = pattern.findall("application orange apple banana")

print(list1)

"""
['application', 'apple']
"""

import re

pattern = re.compile(r"app\w*")

list1 = pattern.findall("application orange apple banana")

print(list1)

"""

['application', 'apple']

"""

[PYTHON/COMMON] sub 함수 : 패턴과 일치하는 문자열 N번 변경하기

■ sub 함수를 사용해 패턴과 일치하는 문자열을 N번 변경하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

target = re.sub(r"[:,|\s]", ", ", "apple:orange banana|tomato", 2)

print(target)

"""
apple, orange, banana|tomato
"""

import re

target = re.sub(r"[:,|\s]", ", ", "apple:orange banana|tomato", 2)

print(target)

"""

apple, orange, banana|tomato

"""

[PYTHON/COMMON] sub 함수 : 변경할 문자열에서 매칭 문자열 사용하기

■ sub 함수에서 변경할 문자열에서 매칭 문자열을 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

target1 = re.sub(r"\b(\d{4}-\d{4})\b", r"<I>\1</I>", "Copyright Derick 1990-2009")

print(target1)

"""
Copyright Derick <I>1990-2009</I>
"""

target2 = re.sub(r"\b(?P<year>\d{4}-\d{4})\b", r"<I>\g<year></I>", "Copyright Derick 1990-2009")

print(target2)

"""
Copyright Derick <I>1990-2009</I>
"""

import re

target1 = re.sub(r"\b(\d{4}-\d{4})\b", r"<I>\1</I>", "Copyright Derick 1990-2009")

print(target1)

"""

target2 = re.sub(r"\b(?P<year>\d{4}-\d{4})\b", r"<I>\g<year></I>", "Copyright Derick 1990-2009")

print(target2)

"""

[PYTHON/COMMON] sub 함수 : 패턴과 일치하는 문자열 변경하기

■ sub 함수를 사용해 패턴과 일치하는 문자열을 변경하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

target = re.sub("-", "", "001208-1234567")

print(target)

"""
0012081234567
"""

import re

target = re.sub("-", "", "001208-1234567")

print(target)

"""

0012081234567

"""

[PYTHON/COMMON] findall 함수 : 검색 문자열에서 패턴과 매칭되는 모든 경우의 리스트 구하기

■ findall 함수를 사용해 검색 문자열에서 패턴과 매칭되는 모든 경우의 리스트를 구하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

list1 = re.findall(r"app\w*", "application orange apple banana")

print(list1)

"""
['application', 'apple']
"""

import re

list1 = re.findall(r"app\w*", "application orange apple banana")

print(list1)

"""

['application', 'apple']

"""

[PYTHON/COMMON] split 함수 : 여러 줄의 문자열 분리하기

■ split 함수를 사용해 여러 줄의 문자열을 분리하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

text = """누나!
이 겨울에도
눈이 가득히 왔습니다."""

list1 = re.split("\n+", text)

print(list1)

"""
['누나!', '이 겨울에도', '눈이 가득히 왔습니다.']
"""

import re

text = """누나!

이 겨울에도

눈이 가득히 왔습니다."""

list1 = re.split("\n+", text)

print(list1)

"""

['누나!', '이 겨울에도', '눈이 가득히 왔습니다.']

"""

[PYTHON/COMMON] search 함수 : 패턴 존재 검사하기

■ search 함수를 사용해 패턴 존재를 검사하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

text1 = "35th"

match1 = re.search("[0-9]*th", text1)

if match1 == None:
    print("일치하지 않습니다.")
else:
    print(match1.string)

text2 = "     35th"

match2 = re.search("[0-9]*th", text2)

if match2 == None:
    print("일치하지 않습니다.")
else:
    print(match2.string)

"""
35th
     35th
"""

import re

text1 = "35th"

match1 = re.search("[0-9]*th", text1)

if match1 == None:

print("일치하지 않습니다.")

else:

print(match1.string)

text2 = " 35th"

match2 = re.search("[0-9]*th", text2)

if match2 == None:

print("일치하지 않습니다.")

else:

print(match2.string)

"""

35th

"""

[PYTHON/COMMON] split 함수 : 문자열 분리하기

■ split 함수를 사용해 문자열을 분리하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

list1 = re.split("[:. ]+", "apple orange:banana  tomato") # ':', '.', ' ' 문자를 분리자로 사용한다.

print(list1)

"""
['apple', 'orange', 'banana', 'tomato']
"""

import re

list1 = re.split("[:. ]+", "apple orange:banana tomato") # ':', '.', ' ' 문자를 분리자로 사용한다.

print(list1)

"""

['apple', 'orange', 'banana', 'tomato']

"""

[PYTHON/COMMON] match 함수 : 패턴 존재 검사하기

■ match 함수를 사용해 패턴 존재를 검사하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

text1 = "35th"

match1 = re.match("[0-9]*th", text1)

if match1 == None:
    print("일치하지 않습니다.")
else:
    print(match1.string)

text2 = "     35th"

match2 = re.match("[0-9]*th", text2)

if match2 == None:
    print("일치하지 않습니다.")
else:
    print(match2.string)

"""
35th
일치하지 않습니다.
"""

import re

text1 = "35th"

match1 = re.match("[0-9]*th", text1)

if match1 == None:

print("일치하지 않습니다.")

else:

print(match1.string)

text2 = " 35th"

match2 = re.match("[0-9]*th", text2)

if match2 == None:

print("일치하지 않습니다.")

else:

print(match2.string)

"""

35th

일치하지 않습니다.

"""

[PYTHON/COMMON] ? 메타 문자 사용하기

■ ? 메타 문자를 사용하는 방법을 보여준다. ▶ 예제 코드 1 (PY)


import re

pattern = re.compile("<.*>");

match = pattern.search("<html><head><title>Title</title>")

print(match.group())

"""
<html><head><title>Title</title>
"""

import re

pattern = re.compile("<.*>");

match = pattern.search("<html><head><title>Title</title>")

print(match.group())

"""

<html><head><title>Title</title>

"""

▶ 예제 코드 2 (PY)


import re

pattern = re.compile("<.*?>");

match = pattern.search("<html><head><title>Title</title>")

print(match.group())

"""
<html>
"""

import re

pattern = re.compile("<.*?>");

match = pattern.search("<html><head><title>Title</title>")

print(match.group())

"""

<html>

"""

※ Greedy vs.

[PYTHON/COMMON] Pattern 클래스 : sub 메소드를 사용해 입력 함수 사용하기

■ Pattern 클래스의 sub 메소드를 사용해 입력 함수를 사용하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

def ConvertToHexadecimalString(match):
    value = int(match.group())
    return hex(value)

pattern = re.compile(r"\d+")

result = pattern.sub(ConvertToHexadecimalString, "Call 65490 for printing, 49152 for user code.")

print(result)

"""
Call 0xffd2 for printing, 0xc000 for user code.
"""

import re

def ConvertToHexadecimalString(match):

value = int(match.group())

return hex(value)

pattern = re.compile(r"\d+")

result = pattern.sub(ConvertToHexadecimalString, "Call 65490 for printing, 49152 for user code.")

print(result)

"""

Call 0xffd2 for printing, 0xc000 for user code.

"""

[PYTHON/COMMON] Pattern 클래스 : sub 메소드를 사용해 참조 구문 사용하기

■ Pattern 클래스의 sub 메소드를 사용해 참조 구문을 사용하는 방법을 보여준다. ▶ 예제 코드 1 (PY)


import re

pattern = re.compile(r"(?P<name>\w+)\s+(?P<phone>\d+[-]\d+[-]\d+)")

result = pattern.sub("\g<phone> \g<name>", "park 111-2222-3333")

print(result)

"""
111-2222-3333 park
"""

import re

pattern = re.compile(r"(?P<name>\w+)\s+(?P<phone>\d+[-]\d+[-]\d+)")

result = pattern.sub("\g<phone> \g<name>", "park 111-2222-3333")

print(result)

"""

111-2222-3333 park

"""

▶ 예제 코드 2 (PY)

[PYTHON/COMMON] Pattern 클래스 : subn 메소드를 사용해 문자열 바꾸기

■ Pattern 클래스의 subn 메소드를 사용해 문자열을 바꾸는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile("(blue|white|red)")

result = pattern.subn("color", "blue socks and red shoes")

print(result)

"""
('color socks and color shoes', 2)
"""

import re

pattern = re.compile("(blue|white|red)")

result = pattern.subn("color", "blue socks and red shoes")

print(result)

"""

('color socks and color shoes', 2)

"""

[PYTHON/COMMON] Pattern 클래스 : sub 메소드를 사용해 문자열 바꾸기

■ Pattern 클래스의 sub 메소드를 사용해 문자열을 바꾸는 방법을 보여준다. ▶ 예제 코드 1 (PY)


import re

pattern = re.compile("(blue|white|red)")

result = pattern.sub("color", "blue socks and red shoes")

print(result)

"""
color socks and color shoes
"""

import re

pattern = re.compile("(blue|white|red)")

result = pattern.sub("color", "blue socks and red shoes")

print(result)

"""

color socks and color shoes

"""

▶ 예제 코드 2 (PY)


import re

pattern = re.compile("(blue|white|red)")

result = pattern.sub("color", "blue socks and red shoes", count = 1)

print(result)

"""
color socks and red shoes
"""

import re

pattern = re.compile("(blue|white|red)")

result = pattern.sub("color", "blue socks and red shoes", count = 1)

print(result)

"""

color socks and red shoes

"""

[PYTHON/COMMON] 긍정형 전방 탐색하기

■ 긍정형 전방 탐색을 하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(".+(?=:)")

match = pattern.search("http://google.com")

print(match.group())

"""
http
"""

import re

pattern = re.compile(".+(?=:)")

match = pattern.search("http://google.com")

print(match.group())

"""

http

"""

[PYTHON/COMMON] 부정형 전방 탐색하기

■ 부정형 전방 탐색을 하는 방법을 보여준다. ▶ 예제 코드 (PY)


import re

pattern = re.compile(".*[.](?!bat$|exe$).*$")

match = pattern.search("autoexec.bat\nprogram.exe\ndata.csv")

print(match.group())

"""
data.csv
"""

import re

pattern = re.compile(".*[.](?!bat$|exe$).*$")

match = pattern.search("autoexec.bat\nprogram.exe\ndata.csv")

print(match.group())

"""

data.csv

"""