爬虫:爬取豆瓣电影_鸭王2

时间: 2024-10-23 13:46:29

(部分内容来自网络，其真实性存疑，为了避免对您造成误导，请谨慎甄别。)

豆瓣电影是一个非常受欢迎的电影评分网站，用户可以在上面找到各种电影的评分和评论。其中，电影《鸭王2》是一部备受关注的续集电影。为了获取更多关于这部电影的信息，我决定使用爬虫技术来爬取豆瓣电影《鸭王2》的相关数据。

首先，我需要安装Python的爬虫库，我选择使用Requests和BeautifulSoup库来实现爬虫功能。这两个库都可以通过pip安装。

接下来，我需要使用Requests库发送请求获取网页的内容。我首先使用Requests库发送一个GET请求，获取到《鸭王2》在豆瓣电影的页面。

import requests
url = 'https://movie.douban.com/subject/26279289/'
response = requests.get(url)
content = response.text

得到网页的内容后，我需要使用BeautifulSoup库来解析网页内容，提取出我需要的数据。我首先找到电影的标题和评分。

from bs4 import BeautifulSoup
soup = BeautifulSoup(content, 'html.parser')
title = soup.find('span', property='v:itemreviewed').text
rating = soup.find('strong', class_='ll rating_num').text

接下来，我可以找到电影的导演、主演和上映日期等信息。

director = soup.find('a', rel='v:directedBy').text
actors = [actor.text for actor in soup.find_all('a', rel='v:starring')]
release_date = soup.find('span', property='v:initialReleaseDate').text

最后，我可以找到电影的剧情简介和电影的标签。

summary = soup.find('span', class_='all hidden').text.strip()
tags = [tag.text for tag in soup.find_all('span', property='v:genre')]

通过以上的代码，我可以获取到《鸭王2》在豆瓣电影的相关数据。我可以将这些数据保存到一个字典中，方便后续的处理和分析。

movie_info = {
    'title': title,
    'rating': rating,
    'director': director,
    'actors': actors,
    'release_date': release_date,
    'summary': summary,
    'tags': tags
}

最后，我可以将获取到的电影信息输出到控制台或者保存到文件中。

import json
output_file = 'movie_info.json'
with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(movie_info, f, ensure_ascii=False, indent=4)
print('电影信息已保存到文件：', output_file)

通过以上的代码，我成功地使用爬虫技术爬取了豆瓣电影《鸭王2》的相关数据，并将其保存到了一个JSON文件中。这样，我就可以方便地进行后续的数据分析和处理。