需求驱动学习之Python（如何编写Python脚本替换文件中的多行字符？）

admin

2023-07-30 21:59:41

0次

在大概3个月之前，Python对我来说一直是个迷。然而，就在3个月前我经理给我一个任务——删除（替换）所有项目源码文件中包含特定几行内容的所有注释。整个项目源码的大小有1G，在Linux服务器（中高档）上编译需要半个多小时，可见代码量之大，不可能手动去一个一个改。肯定得用脚本去处理，于是我想到了Python。在这之前没有接触过Python，花了2个星期一顿恶补之后，总算顺利交差了。

一直很想和大家分享一下碰到的问题及我如何解决的（可能我的方案并不好，但是他能够解决我的问题），但一直拖到现在是因为我感觉我还对Python的了解还不够。因为要在短时间内完成上面交下来的任务，在学习Python的时候，都是走马观花，对解决自己的问题不相关的直接跳过，看资料也静不下心，脑海里都是问题。前几天我静下心把Python的书从头到尾浏览了一遍，感觉现在是时候要进行总结了。

本文的主要内容如下：

问题描述
解题思路
代码实现
Python的特点

1、问题描述

项目源码很大，属于C/C++混合的那种，编程风格也很多样，有’.c’、’.cc’、’cpp’、’.h’、’.hh’等文件。我要完成的任务是：把包含特定几行内容的注释删掉，如（声明：下面的内容只是我随便举的一个例子，项目源码中不涉及下面的内容。）

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions

* are met:

* – Redistributions of source code must retain the above copyright

* notice, this list of conditions and the following disclaimer.

* – Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.

* Neither the name of Sun Microsystems, Inc. or the names of

* contributors may be used to endorse or promote products derived

* from this software without specific prior written permission.

但是格式有很多种，如有的在“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”前面有一段关于本源码文件的描述、有的在“from this software without specific prior written permission.”后面有一段关于本源码文件的描述、有的是C++风格的注释用”//”,而不是“/**/”、还有的没有

“ * – Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.”等等还有其他一些。总之一句话，我要删除的包含特定几行内容的注释有很多中格式！

于是我决定要用Python来编写脚本处理。要匹配特定的内容，我想到了用正则表达式，但苦于不知道如何去构建正则来匹配上面描述的内容（您知道的话，希望能够告诉我）！我只有另辟路径了。

2、解题思路

我的思路——要删除所有项目源码中包含特定几行内容的注释，脚本要满足以下几点功能：

脚本要能够遍历所有的源码文件（’.c’、’.cc’、’cpp’、’.h’、’.hh’），并只处理上面的几种类型的文件
找出包含特定几行内容的注释，并删除之
能够处理一些特殊情况，如软连接文件

上面的几点的处理步骤可以表示如下：

Step 1：输入要处理源码文件夹名，或者源码文件名；

Step 2：如果是文件名，检查文件的类型是否为’.c’、’.cc’、’cpp’、’.h’、’.hh’，否则不处理；

Step 3：检查文件是否是软连接，如果是软连接则不处理；

Step 4：查找文件中是否存在匹配的注释，存在则删掉，否则不处理；

Step 5：如果是文件夹，则对文件夹中的每个文件、文件夹进行处理，转Step2.

思路很明确，关键是如何查找文件中是否包含匹配的内容，并删除！还有就是，对于一个没用过Python等脚本语言的人来说，如何编码实现也是一个问题！

如何确定注释是否为包含特定几行内容的注释？我的思路如下：（因为正则表达式学的不好，只有通过下面的方法了）

如果是/*、//则记录下当前的文件行数，即行号startLine
以行为单位查找是否存在特定的几行，如“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”等等
直到遇到*/，或注释结束了（对于//）。如果存在，则记录下注释结束的行号endLine
最后，删掉这从startLine ~ endLine的内容。

3、代码实现

废话我不多说了，直接按照上面的实例实现代码，如果你对Python不熟，请参阅相关资料。

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126

#!/usr/bin/env python#Filename: comment.py import os, sys, fileinput #————————————————————-def usage(): print u\’\’\’ help: comment.py [dirname]: Option, select a directory to operate [filename]: Option, select a file to operate Example: python comment.py /home/saylor/test \’\’\’#————————————————————–def commentFile(src, fileList): \’\’\’ description: comment files param src: Operate file name \’\’\’ #if file exist? if not os.path.exists(src): print \’Error: file – %s doesn\\\’t exist.\’% src return False if os.path.islink(src): print \’Error: file – %s is just a link, will not handle it.\’ return False filetype = (os.path.splitext(src))[1] if not filetype in [\’.c\’,\’.h\’]: return False try: if not os.access(src, os.W_OK): os.chmod(src, 0664) except: print \’Error: you can not chang %s\\\’s mode.\’% src try: inputf = open(src, \’r\’) outputfilename = src + \’.tmp\’ outputf = open(outputfilename, \’w\’) beginLine = 0 endLine = 100000000 isMatched = False #—–find the beginLine and endLine ——————- for eachline in fileinput.input(src): if eachline.find(\’/*\’) >= 0: beginLine = fileinputss=\”crayon-o\”>>= 0: beginLine = fileinput一个改。肯定得用脚本去处理，于是我想到了Python。在这之前没有接触过Python，花了2个星期一顿恶补之后，总算顺利交差了。

本文的主要内容如下：

问题描述
解题思路
代码实现
Python的特点

1、问题描述

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions

* are met:

* – Redistributions of source code must retain the above copyright

* notice, this list of conditions and the following disclaimer.

* – Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.

* Neither the name of Sun Microsystems, Inc. or the names of

* contributors may be used to endorse or promote products derived

* from this software without specific prior written permission.

“ * – Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.”等等还有其他一些。总之一句话，我要删除的包含特定几行内容的注释有很多中格式！

2、解题思路

我的思路——要删除所有项目源码中包含特定几行内容的注释，脚本要满足以下几点功能：

脚本要能够遍历所有的源码文件（’.c’、’.cc’、’cpp’、’.h’、’.hh’），并只处理上面的几种类型的文件
找出包含特定几行内容的注释，并删除之
能够处理一些特殊情况，如软连接文件

上面的几点的处理步骤可以表示如下：

Step 1：输入要处理源码文件夹名，或者源码文件名；

Step 2：如果是文件名，检查文件的类型是否为’.c’、’.cc’、’cpp’、’.h’、’.hh’，否则不处理；

Step 3：检查文件是否是软连接，如果是软连接则不处理；

Step 4：查找文件中是否存在匹配的注释，存在则删掉，否则不处理；

Step 5：如果是文件夹，则对文件夹中的每个文件、文件夹进行处理，转Step2.

如何确定注释是否为包含特定几行内容的注释？我的思路如下：（因为正则表达式学的不好，只有通过下面的方法了）

如果是/*、//则记录下当前的文件行数，即行号startLine
直到遇到*/，或注释结束了（对于//）。如果存在，则记录下注释结束的行号endLine
最后，删掉这从startLine ~ endLine的内容。

3、代码实现

废话我不多说了，直接按照上面的实例实现代码，如果你对Python不熟，请参阅相关资料。

#!/usr/bin/env python#Filename: comment.py import os, sys, fileinput #————————————————————-def usage(): print u\’\’\’ help: comment.py [dirname]: Option, select a directory to operate [filename]: Option, select a file to operate Example: python comment.py /home/saylor/test \’\’\’#————————————————————–def commentFile(src, fileList): \’\’\’ description: comment files param src: Operate file name \’\’\’ #if file exist? if not os.path.exists(src): print \’Error: file – %s doesn\\\’t exist.\’% src return False if os.path.islink(src): print \’Error: file – %s is just a link, will not handle it.\’ return False filetype = (os.path.splitext(src))[1] if not filetype in [\’.c\’,\’.h\’]: return False try: if not os.access(src, os.W_OK): os.chmod(src, 0664) except: print \’Error: you can not chang %s\\\’s mode.\’% src try: inputf = open(src, \’r\’) outputfilename = src + \’.tmp\’ outputf = open(outputfilename, \’w\’) beginLine = 0 endLine = 100000000 isMatched = False #—–find the beginLine and endLine ——————- for eachline in fileinput.input(src): if eachline.find(\’/*\’) >= 0: beginLine = fileinputrayon-num\” data-line=\”crayon-581270ac749bb524003060-11\”>11121314151617

上一篇：利用 scrapy爬知乎用户关系网以及下载头像

下一篇：使用Python开发Chrome插件

需求驱动学习之Python（如何编写Python脚本替换文件中的多行字符？）

1、问题描述

2、解题思路

3、代码实现

1、问题描述

2、解题思路

3、代码实现

相关内容

热门资讯