python 中文字符串的处理实现代码_程序人生

python 中文字符串的处理实现代码

admin

2023-07-31 02:03:42

0次

>>> teststr = \’我的eclipse不能正确的解码gbk码！\’
>>> teststr
\’\\xe6\\x88\\x91\\xe7\\x9a\\x84eclipse\\xe4\\xb8\\x8d\\xe8\\x83\\xbd\\xe6\\xad\\xa3\\xe7\\xa1\\xae\\xe7\\x9a\\x84\\xe8\\xa7\\xa3\\xe7\\xa0\\x81gbk\\xe7\\xa0\\x81\\xef\\xbc\\x81\’
>>> tests2 = u\’我的eclipse不能正确的解码gbk码！\’
>>> test3 = tests2.encode(\’gb2312\’)
>>> test3
\’\\xce\\xd2\\xb5\\xc4eclipse\\xb2\\xbb\\xc4\\xdc\\xd5\\xfd\\xc8\\xb7\\xb5\\xc4\\xbd\\xe2\\xc2\\xebgbk\\xc2\\xeb\\xa3\\xa1\’
>>> test3
\’\\xce\\xd2\\xb5\\xc4eclipse\\xb2\\xbb\\xc4\\xdc\\xd5\\xfd\\xc8\\xb7\\xb5\\xc4\\xbd\\xe2\\xc2\\xebgbk\\xc2\\xeb\\xa3\\xa1\’
>>> teststr
\’\\xe6\\x88\\x91\\xe7\\x9a\\x84eclipse\\xe4\\xb8\\x8d\\xe8\\x83\\xbd\\xe6\\xad\\xa3\\xe7\\xa1\\xae\\xe7\\x9a\\x84\\xe8\\xa7\\xa3\\xe7\\xa0\\x81gbk\\xe7\\xa0\\x81\\xef\\xbc\\x81\’
>>> test3.decode(\’gb2312\’).encode(\’utf-8\’)
\’\\xe6\\x88\\x91\\xe7\\x9a\\x84eclipse\\xe4\\xb8\\x8d\\xe8\\x83\\xbd\\xe6\\xad\\xa3\\xe7\\xa1\\xae\\xe7\\x9a\\x84\\xe8\\xa7\\xa3\\xe7\\xa0\\x81gbk\\xe7\\xa0\\x81\\xef\\xbc\\x81\’
>>> test3.decode(\’gb2312\’).encode(\’utf-8\’) == teststr
True
如上所见,test3变量(gb2312编码)经过解码(变成unicode字符串)后再使用utf-8编码,就成了与teststr值相同的串了.

通过上面的例子我们也发现,unicode字符串是gb2312字符串(windows就使用这种格式)与utf-8字符串(python本身使用)之间的一座桥梁.

python 中文字符串处理

上一篇：Python 匹配任意字符（包括换行符）的正则表达式写法

下一篇：动态创建类实例代码

python 中文字符串的处理实现代码

相关内容

热门资讯