某Team在用Python开发一些代码,涉及子进程以及设法消除僵尸进程的需求。实践中他们碰上Python程序非预期退出的现象。最初他们决定用GDB调试Python解释器,查看exit()的源头。我听了之后,觉得这个问题应该用别的调试思路。帮他们排查这次程序故障时,除去原始问题,还衍生了其他问题。
这次的问题相比西安研发中心曾经碰上的Python信号处理问题,有不少基础知识、先验知识是共用的,此处不做再普及,感兴趣的同学可以翻看我以前发过的文章。
下文是一次具体的调试、分析记录。为了简化现场、方便调试,已将原始问题、衍生问题浓缩成DebugPythonWithGDB_6.py、DebugPythonWithGDB_7.py。
$ vi DebugPythonWithGDB_6.py
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283 | #!/usr/bin/env python# -*- encoding: utf-8 -*-?import sys, os, signal, subprocess, shlex, traceback?def on_SIGCHLD ( signum, frame ) :????print \”[on_SIGCHLD\”????sys.stdout.write( \”signum??= %u\\n\” % signum )????traceback.print_stack( frame )????print os.waitpid( –1, os.WNOHANG )????\”\”\”????try :????????print os.waitpid( -1, os.WNOHANG )????except OSError :????????sys.stdout.write( \’Line[%u]: OSError\\n\’ % sys.exc_info()[2].tb_lineno )????\”\”\”????print \”on_SIGCHLD]\”?def do_more ( count ) :????print \'[do_more() begin %u]\’ % count????os.system( r\’printf \”Child?? = %u\\n\” \” />;/bin/sleep 1\’ ) \”\”\” # # 这里存在竞争条件,可以增加触发OSError异常的概率 # os.system( r\’printf \”Child = %u\\n\” ![]() ![]() |
123456789101112131415161718192021222324252627282930313233 | $ python DebugPythonWithGDB_6.py \’python -c \”import time;time.sleep(3600)\”\’Parent = 10244Child = 10245[do_more() begin 0][on_SIGCHLDsignum = 17 File \”DebugPythonWithGDB_6.py\”, line 81, in main( os.path.basename( sys.argv[0] ), sys.argv[1:] ) File \”DebugPythonWithGDB_6.py\”, line 76, in main do_more( count ) File \”DebugPythonWithGDB_6.py\”, line 20, in do_more print \'[do_more() begin %u]\’ % count(10245, 9)on_SIGCHLD]Child = 10246[on_SIGCHLDsignum = 17 File \”DebugPythonWithGDB_6.py\”, line 81, in main( os.path.basename( sys.argv[0] ), sys.argv[1:] ) File \”DebugPythonWithGDB_6.py\”, line 76, in main do_more( count ) File \”DebugPythonWithGDB_6.py\”, line 21, in do_more |
上一篇:也谈如何阅读程序源代码