날씨, 미세먼지 농도

날씨와 미세먼지 농도를 가져옵시다.

네이버에서 제공하는 서비스 중 날씨와 미세먼지 페이지가 있습니다.

위 두 페이지를 크롤링 후 날씨와 미세먼지 농도를 화면에 표현하는 어플리케이션을 만들어 봅시다.

날씨

우리가 크롤링 해야되는 타겟입니다.

우측 붉은 사각형 텍스트가 좌측에 표현됩니다. 우리가 주의 깁게 봐야되는 부분은 class 이름입니다.

<li class="nm">흐리고 비</li>
<span class="temp"><strong>11</strong>℃"</span>
<span class="rain"><strong>80</strong>%</span>

위 3개의 테그를 가져오면 서울 경기의 오늘 오전 날씨를 가져올 수 있습니다.

Naver URL : https://weather.naver.com/rgn/cityWetrMain.nhn

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests
URL = "https://weather.naver.com/rgn/cityWetrMain.nhn"  #지역별 날씨
html = requests.get(URL).text
soup = BeautifulSoup(html, 'html.parser')
print(soup)

02 HTML을 파싱하기 위해 BeatifulSoup 클레스를 가져옵니다. 03 데이터를 가져오기위해 requests 모듈을 가져옵니다. 04 데이터가 있는 URL을 지정합니다. 05 requsts.get(URL)을 통해 데이터를 가져옵니다. requests.get(URL).text 로 가져온 데이터 중 필요 데이터만 꺼내옵니다. 06 BeatifulSoup에 받아온 데이터를 넣고 html.parser 일을 시킵니다.

위 코드를 실행하면 페이지의 모든 텍스트가 출력됩니다. 우리가 필요한 데이터는 class nm, temp, rain 에 있습니다. 이제 필요한 데이터를 하나식 가져오겠습니다.

from bs4 import BeautifulSoup
import requests
URL = "https://weather.naver.com/rgn/cityWetrMain.nhn"  #지역별 날씨
html = requests.get(URL).text
soup = BeautifulSoup(html, 'html.parser')
watherTable = soup.find_all("li", class_="nm")
print(watherTable)

06 li 테그를 사용하는 class nm을 모두 가져옵니다.

그리고 실행을 해보면

이런 결과가 나오게 됩니다. 여기서 주의해야 될 점은 빨간 테두리를 보면 리스트라는 것을 알 수 있습니다. li테그를 사용하고 class nm이 여러게 있다는 것이죠.

위와 같은 방법으로 온도와 강수확률도 가져오겠습니다.

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests
URL = "https://weather.naver.com/rgn/cityWetrMain.nhn"  #지역별 날씨
html = requests.get(URL).text
soup = BeautifulSoup(html, 'html.parser')
watherTable = soup.find_all("li", class_="nm")
print(watherTable)
tempTable = soup.find_all("span", class_="temp")
print(tempTable)
rainTable = soup.find_all("span", class_="rain")
print(rainTable)

화면과 나온 값을 비교해보면 도시 별 날씨 가 순차적으로 나오는 것을 확인할 수 있습니다.

우리가 원했던 오늘 오전 서울 경기의 날씨만 빼오기 위해서는 리스트의 첫 번째 값을 가져오면 됩니다.

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests
URL = "https://weather.naver.com/rgn/cityWetrMain.nhn"  #지역별 날씨
html = requests.get(URL).text
soup = BeautifulSoup(html, 'html.parser')
watherTable = soup.find_all("li", class_="nm")
tempTable = soup.find_all("span", class_="temp")
rainTable = soup.find_all("span", class_="rain")
print(watherTable[0].text)
print(tempTable[0].text)
print(rainTable[0].text)

10 첫 번째 값을 가져옵니다.

원했던 결과를 가져왔습니다.

미세먼지 또 한 위와 같은 과정을 통해 tag, class, id를 찾아 접근하는 방법을 사용하면 쉽게 구 할 수 있습니다.

미세먼지

네이버 날씨 : https://weather.naver.com/air/airFcast.nhn

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests
URL = "https://weather.naver.com/air/airFcast.nhn"  #미세먼지 
html = requests.get(URL).text
soup = BeautifulSoup(html, 'html.parser')
airTable = soup.find_all("div", class_="list_air_inn")
for air in airTable:
    today = air.find_all("li")
    for data in today:
        print(data.text)

PreviousCrawling Library 설치 NextSQLlite

Last updated 6 years ago