[파이썬] Beautiful Soup 익히기

728x90

시작에 앞서

해당 내용은 <파이썬으로 데이터 주무르기> -민형기 저, BJPUBLIC 출판사 의 내용을 토대로 작성되었습니다.
보다 자세한 내용은 해당 교재를 확인하여 주시기 바랍니다.

Beautiful Soup 익히기

Beautiful Soup: 인터넷에서 웹페이지를 가져오는 모듈
먼저 아래의 html 코드를 다운 받는다.

그리고 Beautiful Soup에서 bs4를 import 한다

from bs4 import BeautifulSoup

파일로 다운받은 html 을 읽어보기 (open 명령으로 읽기 옵션('r'))

page = open("../data/03. test_first.html",'r').read()
soup = BeautifulSoup(page, 'html.parser')
print(soup.prettify())
# prettify 는 들여쓰기 옵션

children: soup 변수에서 한 단계 아래에서 포함된 태그 조회

list(soup.children)

파이썬 내에서 html 태그 접속하기

html = list(soup.children)[2]
html

html의 children 조사하기

html의 children 중 body 태그(3) 조사하기 -1

html의 children 중 body 태그(3) 조사하기 -2

find / find_all : 접근해야 할 태그를 알고 있는 경우 (find는 하나만 찾을 때)

p 태그의 class 가 outer-text인 것 찾기

class 이름으로만 outer-text 찾기

id가 fist인 태그 찾기

head에 있는 내용 찾기

next_sibling

next_sibling.next_sibling

get_text() : 태그 안에 있는 텍스트만 가져오기

href 속성 찾아서 링크 주소 얻기

for each in links:
    href = each['href']
    text= each.string
    print(text + '->' + href)

728x90

Recordian, 기록의 힘을 믿는다.