使用python读取CSV文件时遇到编码问题解决方案
作者:myrj
尝试使用python读取CSV文件时遇到障碍。
更新:如果只想跳过字符或错误,可以打开文件,如下所示:
with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:
到目前为止,我已经尝试过了。
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r') as data_file:
reader = csv.reader(data_file)
for row in reader:
print (row)
我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我试过了
with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:
错误:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to
现在,如果我只打印data_file,它说它们是cp1252编码的,但是如果我尝试
with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:
我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我也尝试了推荐的套餐。
我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我要解析的行是:
2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None
任何想法或帮助表示赞赏。
解决方案
我将使用csvkit,它使用自动检测适当的编码和解码。例如
import csvkit reader = csvkit.reader(data_file)
正如聊天解决方案所述,
for directory, subdirectories, files in os.walk(root_dir): for file in files: with open(os.path.join(directory, file), 'r', encoding="utf-8") as data_file: reader = csv.reader(data_file) for row in reader: data = [i.encode('ascii', 'ignore').decode('ascii') for i in row] print (data)
到此这篇关于用python读取CSV文件时遇到编码问题的文章就介绍到这了,更多相关python读取CSV文件内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!