Python归档 - 帝讯博客

Python 网络爬虫入门实战 – 实战篇

小编 — Mon, 30 Mar 2026 02:05:58 +0000

欢迎来到今天的 Python 实战教程！今天我们来学习 Python 网络爬虫入门实战。

一、环境准备

pip install requests beautifulsoup4 lxml

二、基础请求

import requests

# GET 请求
response = requests.get('https://www.example.com')
print(response.status_code)
print(response.text)

三、解析 HTML

from bs4 import BeautifulSoup

html = '''

标题

”’ soup = BeautifulSoup(html, ‘lxml’) print(soup.h1.text)

四、实战示例

# 爬取新闻标题
import requests
from bs4 import BeautifulSoup

url = 'https://news.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')

headlines = soup.find_all('h2', class_='headline')
for h in headlines:
    print(h.text)

五、注意事项

遵守 robots.txt 协议
控制请求频率
使用 User-Agent
合法合规使用数据

六、总结

网络爬虫是 Python 实战的重要应用。建议多实践。

关注我们获取更多 Python 实战教程！

Python 网络爬虫入门实战 – 实战篇最先出现在帝讯博客。

Python 文件批量处理与自动化 – 进阶篇

小编 — Mon, 30 Mar 2026 02:05:57 +0000

欢迎来到今天的 Python 实战教程！今天我们来深入学习 Python 文件批量处理与自动化。这是每个 Python 开发者都必须掌握的核心技能。无论是处理日志文件、批量重命名、数据清洗。还是自动化办公。都离不开文件操作。

一、文件读写基础

1.1 打开文件的正确方式

Python 使用 open() 函数打开文件。推荐使用 with 语句。它会自动关闭文件：

# 推荐方式 - with 语句自动关闭文件
with open('data.txt', 'r', encoding='utf-8') as f:
    content = f.read()
# 文件自动关闭。无需手动 f.close()

# 不推荐 - 需要手动关闭
f = open('data.txt', 'r', encoding='utf-8')
content = f.read()
f.close()  # 忘记关闭会导致资源泄漏

1.2 文件打开模式详解

模式	说明	文件不存在
`'r'`	只读（默认）	报错
`'w'`	写入（清空原文件）	创建新文件
`'a'`	追加（在末尾添加）	创建新文件
`'x'`	独占创建	报错（如果已存在）
`'b'`	二进制模式	配合其他模式使用
`'t'`	文本模式（默认）	配合其他模式使用

1.3 读取文件的三种方法

# 方法 1：read() - 读取整个文件
with open('data.txt', 'r', encoding='utf-8') as f:
    content = f.read()  # 返回字符串
    print(f"文件大小：{len(content)} 字节")

# 方法 2：readline() - 逐行读取
with open('data.txt', 'r', encoding='utf-8') as f:
    line1 = f.readline()  # 读取第一行
    line2 = f.readline()  # 读取第二行
    print(f"第一行：{line1.strip()}")

# 方法 3：readlines() - 读取所有行到列表
with open('data.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()  # 返回字符串列表
    print(f"共 {len(lines)} 行")
    for i, line in enumerate(lines, 1):
        print(f"第{i}行：{line.strip()}")

# 推荐：直接迭代文件对象（最省内存）
with open('data.txt', 'r', encoding='utf-8') as f:
    for line_num, line in enumerate(f, 1):
        print(f"第{line_num}行：{line.strip()}")

1.4 写入文件的多种方式

# 写入模式（会清空原文件）
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write("第一行内容\n")
    f.write("第二行内容\n")
    f.write("第三行内容\n")

# 追加模式（在末尾添加）
with open('output.txt', 'a', encoding='utf-8') as f:
    f.write("追加的内容\n")

# 一次性写入多行
lines = ["苹果\n", "香蕉\n", "橙子\n"]
with open('fruits.txt', 'w', encoding='utf-8') as f:
    f.writelines(lines)  # 注意：需要自己添加换行符

# 使用 print 写入文件
with open('output.txt', 'w', encoding='utf-8') as f:
    print("第一行", file=f)
    print("第二行", file=f)
    print(f"变量值：{42}", file=f)

二、pathlib – 现代化的路径操作

2.1 Path 对象基础

from pathlib import Path

# 创建路径对象
p = Path('/home/user/documents/report.txt')

# 路径组成部分
print(p.parent)        # /home/user/documents（父目录）
print(p.name)          # report.txt（文件名）
print(p.stem)          # report（不含扩展名）
print(p.suffix)        # .txt（扩展名）
print(p.suffixes)      # ['.txt']（所有扩展名）

# 路径拼接（推荐方式）
base = Path('/home/user')
subdir = base / 'documents' / '2026'
file_path = subdir / 'report.txt'
print(file_path)  # /home/user/documents/2026/report.txt

2.2 路径判断与转换

from pathlib import Path

p = Path('/home/user/documents')

# 路径判断
print(p.exists())        # 是否存在
print(p.is_file())       # 是否是文件
print(p.is_dir())        # 是否是目录
print(p.is_absolute())   # 是否是绝对路径

# 路径转换
print(p.absolute())      # 转为绝对路径
print(p.resolve())       # 解析符号链接后的绝对路径
print(p.relative_to('/home/user'))  # documents

# 获取当前工作目录
cwd = Path.cwd()
print(f"当前目录：{cwd}")

# 获取家目录
home = Path.home()
print(f"家目录：{home}")

三、批量处理文件实战

3.1 遍历目录树

import os
from pathlib import Path

# 方法 1：os.walk() - 遍历目录树
for root, dirs, files in os.walk('./documents'):
    print(f"当前目录：{root}")
    print(f"子目录：{dirs}")
    print(f"文件：{files}")
    print("-" * 40)

# 方法 2：Path.glob() - 模式匹配
doc_dir = Path('./documents')

# 查找所有.txt 文件
txt_files = list(doc_dir.glob('*.txt'))
print(f"找到 {len(txt_files)} 个 txt 文件")

# 递归查找所有子目录中的.txt 文件
all_txt = list(doc_dir.rglob('*.txt'))
print(f"递归找到 {len(all_txt)} 个 txt 文件")

# 查找特定模式的文件
py_files = list(doc_dir.glob('**/*.py'))  # ** 表示递归
print(f"找到 {len(py_files)} 个 Python 文件")

3.2 批量重命名文件

from pathlib import Path

# 批量重命名：给所有文件添加前缀
doc_dir = Path('./documents')
for file in doc_dir.glob('*.txt'):
    new_name = f"backup_{file.name}"
    file.rename(file.parent / new_name)
    print(f"重命名：{file.name} -> {new_name}")

# 批量修改扩展名
for file in doc_dir.glob('*.txt'):
    new_name = file.with_suffix('.md')
    file.rename(new_name)
    print(f"修改扩展名：{file.name} -> {new_name.name}")

# 按序号重命名
for i, file in enumerate(doc_dir.glob('*.md'), 1):
    new_name = file.parent / f"document_{i:03d}.md"
    file.rename(new_name)
    print(f"重命名：{file.name} -> {new_name.name}")

3.3 批量读取与处理

from pathlib import Path

# 批量读取多个文件并合并
doc_dir = Path('./documents')
all_content = []

for file in sorted(doc_dir.glob('*.txt')):
    with open(file, 'r', encoding='utf-8') as f:
        content = f.read()
        all_content.append(f"=== {file.name} ===\n{content}")

# 合并写入新文件
with open('merged.txt', 'w', encoding='utf-8') as f:
    f.write('\n'.join(all_content))

print(f"已合并 {len(all_content)} 个文件到 merged.txt")

# 批量统计文件信息
print("\n文件统计：")
for file in sorted(doc_dir.glob('*.txt')):
    stat = file.stat()
    size_kb = stat.st_size / 1024
    print(f"{file.name}: {size_kb:.2f} KB")

3.4 文件内容搜索与替换

from pathlib import Path
import re

# 批量搜索包含特定关键词的文件
doc_dir = Path('./documents')
keyword = "Python"
matched_files = []

for file in doc_dir.glob('*.txt'):
    with open(file, 'r', encoding='utf-8') as f:
        content = f.read()
        if keyword in content:
            matched_files.append(file.name)
            count = content.count(keyword)
            print(f"{file.name}: 找到 {count} 处匹配")

print(f"\n共 {len(matched_files)} 个文件包含'{keyword}'")

# 批量替换文本
old_text = "旧版本"
new_text = "新版本"

for file in doc_dir.glob('*.txt'):
    with open(file, 'r', encoding='utf-8') as f:
        content = f.read()
    
    if old_text in content:
        new_content = content.replace(old_text, new_text)
        with open(file, 'w', encoding='utf-8') as f:
            f.write(new_content)
        print(f"已更新：{file.name}")

四、高级文件操作

4.1 文件复制、移动与删除

from pathlib import Path
import shutil

# 复制文件
src = Path('./source.txt')
dst = Path('./backup/source_copy.txt')
dst.parent.mkdir(parents=True, exist_ok=True)  # 创建目录
shutil.copy2(src, dst)  # copy2 保留元数据
print(f"已复制：{src} -> {dst}")

# 复制整个目录
shutil.copytree('./docs', './docs_backup')
print("已复制整个目录")

# 移动文件
src = Path('./old_location/file.txt')
dst = Path('./new_location/file.txt')
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.move(src, dst)
print(f"已移动：{src} -> {dst}")

# 删除文件
Path('./temp.txt').unlink()  # 删除单个文件
print("已删除 temp.txt")

# 删除整个目录
shutil.rmtree('./old_backup')
print("已删除目录")

4.2 临时文件处理

import tempfile
from pathlib import Path

# 创建临时文件
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
    temp_path = f.name
    f.write("临时内容")
    print(f"临时文件：{temp_path}")

# 使用临时文件
with open(temp_path, 'r') as f:
    print(f"读取临时文件：{f.read()}")

# 清理临时文件
Path(temp_path).unlink()
print("已清理临时文件")

# 创建临时目录
with tempfile.TemporaryDirectory() as temp_dir:
    print(f"临时目录：{temp_dir}")
    # 在临时目录中操作
    temp_file = Path(temp_dir) / 'test.txt'
    temp_file.write_text("测试内容")
# 退出 with 后自动清理临时目录

4.3 文件编码处理

import chardet

# 检测文件编码
def detect_encoding(file_path):
    with open(file_path, 'rb') as f:
        result = chardet.detect(f.read(10000))  # 读取前 10KB
    return result['encoding']

# 批量转换编码
from pathlib import Path

doc_dir = Path('./documents')
for file in doc_dir.glob('*.txt'):
    # 检测编码
    encoding = detect_encoding(file)
    print(f"{file.name}: 检测到编码 {encoding}")
    
    # 读取并转换为 UTF-8
    with open(file, 'r', encoding=encoding) as f:
        content = f.read()
    
    # 写回 UTF-8
    with open(file, 'w', encoding='utf-8') as f:
        f.write(content)
    print(f"已转换：{file.name}")

五、实战项目

5.1 日志文件分析器

from pathlib import Path
from collections import Counter
import re

def analyze_log(log_path):
    '''分析日志文件'''
    log_file = Path(log_path)
    
    error_count = 0
    warning_count = 0
    error_lines = []
    
    with open(log_file, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            if 'ERROR' in line:
                error_count += 1
                error_lines.append((line_num, line.strip()))
            elif 'WARNING' in line:
                warning_count += 1
    
    print(f"日志分析结果：")
    print(f"  错误数：{error_count}")
    print(f"  警告数：{warning_count}")
    print(f"\n最新错误：")
    for line_num, line in error_lines[-5:]:
        print(f"  行{line_num}: {line}")

# 使用示例
analyze_log('./app.log')

5.2 批量图片重命名

from pathlib import Path
from datetime import datetime

def rename_photos(photo_dir, prefix="IMG"):
    '批量重命名照片'
    photo_path = Path(photo_dir)
    
    # 获取所有图片文件
    images = list(photo_path.glob('*.jpg')) + list(photo_path.glob('*.png'))
    images.sort(key=lambda p: p.stat().st_mtime)  # 按修改时间排序
    
    for i, img in enumerate(images, 1):
        # 生成新文件名
        date_str = datetime.fromtimestamp(img.stat().st_mtime).strftime('%Y%m%d')
        new_name = f"{prefix}_{date_str}_{i:04d}{img.suffix}"
        new_path = img.parent / new_name
        
        # 重命名
        img.rename(new_path)
        print(f"{img.name} -> {new_name}")

# 使用示例
rename_photos('./photos', 'VACATION')

5.3 文件备份工具

from pathlib import Path
import shutil
from datetime import datetime

def backup_files(source_dir, backup_dir, patterns=None):
    '备份指定类型的文件'
    source = Path(source_dir)
    backup = Path(backup_dir)
    
    # 创建带时间戳的备份目录
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    backup_path = backup / f"backup_{timestamp}"
    backup_path.mkdir(parents=True, exist_ok=True)
    
    # 默认备份所有文件
    if patterns is None:
        patterns = ['*']
    
    # 复制文件
    copied_count = 0
    for pattern in patterns:
        for file in source.glob(pattern):
            if file.is_file():
                dest = backup_path / file.name
                shutil.copy2(file, dest)
                copied_count += 1
    
    print(f"备份完成：{copied_count} 个文件")
    print(f"备份位置：{backup_path}")
    return backup_path

# 使用示例
backup_files('./documents', './backups', ['*.txt', '*.docx', '*.pdf'])

六、常见错误与解决

6.1 文件不存在错误

from pathlib import Path

# 错误：文件不存在会报错
# with open('not_exist.txt', 'r') as f:
#     content = f.read()  # FileNotFoundError!

# 解决：先检查是否存在
file = Path('not_exist.txt')
if file.exists():
    content = file.read_text()
else:
    print("文件不存在。创建新文件")
    file.write_text("初始内容")

# 或使用异常处理
try:
    content = file.read_text()
except FileNotFoundError:
    print("文件不存在。使用默认内容")
    content = "默认内容"

6.2 编码错误

# 错误：编码不匹配
# with open('file.txt', 'r', encoding='ascii') as f:
#     content = f.read()  # UnicodeDecodeError!

# 解决：使用正确的编码
with open('file.txt', 'r', encoding='utf-8') as f:
    content = f.read()

# 或忽略错误
with open('file.txt', 'r', encoding='utf-8', errors='ignore') as f:
    content = f.read()

# 或替换错误字符
with open('file.txt', 'r', encoding='utf-8', errors='replace') as f:
    content = f.read()

6.3 权限错误

from pathlib import Path

# 错误：没有写入权限
# Path('/root/protected.txt').write_text("内容")  # PermissionError!

# 解决：检查权限或选择其他目录
file = Path('./user_file.txt')
try:
    file.write_text("内容")
except PermissionError:
    print("没有权限。尝试其他目录")
    file = Path.home() / 'file.txt'
    file.write_text("内容")

七、最佳实践

始终使用 with 语句：自动关闭文件。避免资源泄漏
明确指定编码：始终使用 encoding='utf-8'
使用 pathlib：比 os.path 更现代、更易用

大文件分块读取：避免一次性加载到内存

with open('large_file.txt', 'r') as f:
    for chunk in iter(lambda: f.read(8192), ''):
        process(chunk)

先创建目录再写入：

output = Path('./output/subdir/file.txt')
output.parent.mkdir(parents=True, exist_ok=True)
output.write_text("内容")

八、总结与练习

今天我们深入学习了 Python 文件处理的各个方面。建议：

掌握基础：open()、read()、write()、with 语句
熟练使用 pathlib：现代化的路径操作
理解编码：UTF-8 是默认选择
多实践：通过实际项目巩固知识

课后练习

编写脚本。统计目录下所有代码文件的总行数
实现一个简单的文件搜索工具。支持关键词搜索
编写批量重命名工具。支持添加前缀、后缀、序号
实现日志分析器。提取错误和警告信息
创建自动备份脚本。定期备份重要文件

关注我们获取更多 Python 实战教程！下节课我们将学习 Python 网络爬虫入门。

Python 文件批量处理与自动化 – 进阶篇最先出现在帝讯博客。

Python 自动化办公实战：Excel 处理 – 基础篇

小编 — Mon, 30 Mar 2026 02:05:55 +0000

欢迎来到今天的 Python 实战教程！今天我们来学习 Python 自动化办公之 Excel 处理。

一、环境准备

1.1 安装库

pip install openpyxl pandas xlrd xlwt

1.2 库的选择

openpyxl：读写.xlsx 文件
pandas：数据处理和分析
xlrd/xlwt：读写.xls 文件（旧格式）

二、读取 Excel 文件

2.1 使用 openpyxl

from openpyxl import load_workbook

# 加载工作簿
wb = load_workbook('data.xlsx')

# 选择工作表
ws = wb['Sheet1']

# 读取单元格
value = ws['A1'].value

# 遍历行
for row in ws.iter_rows():
    for cell in row:
        print(cell.value)

2.2 使用 pandas

import pandas as pd

# 读取 Excel
df = pd.read_excel('data.xlsx')

# 查看数据
print(df.head())
print(df.columns)

# 选择列
names = df['姓名']

三、写入 Excel 文件

3.1 创建新文件

from openpyxl import Workbook

wb = Workbook()
ws = wb.active
ws.title = "数据表"

# 写入数据
ws['A1'] = "姓名"
ws['B1'] = "年龄"
ws.append(["张三", 25])
ws.append(["李四", 28])

wb.save('output.xlsx')

3.2 使用 pandas 写入

import pandas as pd

data = {
    '姓名': ['张三', '李四'],
    '年龄': [25, 28],
    '城市': ['北京', '上海']
}

df = pd.DataFrame(data)
df.to_excel('output.xlsx', index=False)

四、数据处理实战

4.1 数据筛选

# 筛选年龄大于 25 的记录
filtered = df[df['年龄'] > 25]

# 多条件筛选
filtered = df[(df['年龄'] > 25) & (df['城市'] == '北京')]

4.2 数据统计

# 平均值
avg_age = df['年龄'].mean()

# 分组统计
grouped = df.groupby('城市')['年龄'].mean()

4.3 数据合并

# 合并两个 Excel
df1 = pd.read_excel('file1.xlsx')
df2 = pd.read_excel('file2.xlsx')

# 横向合并
merged = pd.merge(df1, df2, on='姓名')

# 纵向合并
combined = pd.concat([df1, df2])

五、批量处理

5.1 批量读取

import os
import pandas as pd

files = [f for f in os.listdir('.') if f.endswith('.xlsx')]
all_data = []

for file in files:
    df = pd.read_excel(file)
    all_data.append(df)

combined = pd.concat(all_data)

5.2 批量写入

departments = ['销售部', '技术部', '财务部']

for dept in departments:
    df = get_department_data(dept)
    df.to_excel(f'{dept}_报表.xlsx', index=False)

六、格式化与样式

from openpyxl.styles import Font, PatternFill

# 设置字体
ws['A1'].font = Font(bold=True, color='FF0000')

# 设置背景色
ws['A1'].fill = PatternFill(start_color='FFFF00', fill_type='solid')

# 设置列宽
ws.column_dimensions['A'].width = 20

七、总结

Python 处理 Excel 可以大幅提升办公效率。建议多实践。

关注我们获取更多 Python 实战教程！

Python 自动化办公实战：Excel 处理 – 基础篇最先出现在帝讯博客。