这两天在用python写一个采集器,有个功能模块是html代码转换为ubb,网上貌似没有现成程序,就自己写了个函数,顺便锻炼下自己的正则。
import re def Html2UBB(content): #以下是将html标签转为ubb标签 pattern = re.compile( \']*>([sS]+?)\',re.I) content = pattern.sub(r\'[url=1]2[/url]\',content) pattern = re.compile( \']+src=\\\"([^\\\"]+)\\\"[^>]*>\',re.I) content = pattern.sub(r\'[img]1[/img]\',content) pattern = re.compile( \'([sS]+?)\',re.I) content = pattern.sub(r\'[b]1[/b]\',content) pattern = re.compile( \'([sS]+?)\',re.I) content = pattern.sub(r\'[1]2[/1]\',content) pattern = re.compile( \'<[^>]*?>\',re.I) content = pattern.sub(\'\',content) #以下是将html转义字符转为普通字符 content = content.replace(\'<\',\'<\') content = content.replace(\'>\',\'>\') content = content.replace(\'”\',\'”\') content = content.replace(\'“\',\'“\') content = content.replace(\'\"\',\'\"\') content = content.replace(\'©\',\'©\') content = content.replace(\'®\',\'®\') content = content.replace(\' \',\' \') content = content.replace(\'—\',\'—\') content = content.replace(\'–\',\'–\') content = content.replace(\'‹\',\'‹\') content = content.replace(\'›\',\'›\') content = content.replace(\'…\',\'…\') content = content.replace(\'&\',\'&\') return content
使用时直接调用Html2UBB函数,返回值就是ubb码了html转ubb