■ compile 함수에서 ".*?"을 사용해 lazy(non-greedy) 방식으로 문자열을 구하는 방법을 보여준다.
▶ 예제 코드 (PY)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import urllib.request import re httpResponse = urllib.request.urlopen("http://www.example.com") htmlBytes = httpResponse.read() httpResponse.close() html = str(htmlBytes).encode("utf-8").decode("cp949") pattern = re.compile(r"<.*?>", re.I | re.S) list1 = pattern.findall(html) print(list1) """ ['<!doctype html>', '<html>', '<head>', '<title>', '</title>', '<meta charset="utf-8" />', '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />', '<meta name="viewport" content="width=device-width, initial-scale=1" />', '<style type="text/css">', '</style>', '</head>', '<body>', '<div>', '<h1>', '</h1>', '<p>', '</p>', '<p>', '<a href="https://www.iana.org/domains/example">', '</a>', '</p>', '</div>', '</body>', '</html>'] """ |