Post

Python -- Get started

Cpython

Build

On a 2019 Macbook Pro.

1
2
MACOSX_DEPLOYMENT_TARGET=10.15 ./configure --with-pydebug
bear -- make -j -s

On a 2022 Macbook M1. I have some problem with openssl. So I follow this post to resolve this issue.

1
2
CFLAGS="-O0" ./configure --with-pydebug --with-openssl=$(brew --prefix openssl)
bear -- make -j -s

I explicitly specified -O0 to disable optimization because I found --with-pydebug only enables debugging but does not disable optimization. I use bear to generated compilation database. Note, only use bear for the initial build. For incremental build, bear will replace everything inside the compilation database with the only incremental result. So just do make for incremental build.

Run cpython inside gdb

Follow this post https://hackmd.io/@klouielu/ByMHBMjFe?type=view to get started with using gdb debugging Cpython.

  • sudo gdb ./python.exe
  • set startup-with-shell off

GDB is not available in Macbook M1, so I use lldb instead.

1
lldb -- ./python.exe ~/tmp/test.py

I tried to use the cpython-lldb package, but I do not think it works for the latest version of Cpython as I encountered wired behavior when printing dictionaries. If you want to try it out, you need to get familiar with the script feature of lldb. First, we need to know which version of python it links to.

1
2
3
$ otool -L /Applications/Xcode.app//Contents/SharedFrameworks/LLDB.framework/LLDB | grep -i pytho
        @rpath/Python3.framework/Versions/3.9/Python3 (compatibility version 3.9.0, current version 3.9.0)
        @rpath/Python3.framework/Versions/3.9/Python3 (compatibility version 3.9.0, current version 3.9.0)

We cannot use a different version of python.

However, I realize that I do not need to get into the python script in most cases. The expr command is powerful enough to inspect the local state.

The example python code to debug is

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class A:
    def __init__(self, x = 5, y = 10):
        self.x = x
        self.y = y

    def foo(self):
        print(self.x)

    def __repr__(self):
        return f"<A>({self.x}, {self.y})"


a = A(5)
a.x

Example 1: print repr(obj)

1
2
(lldb) expr PyUnicode_AsUTF8(v->ob_type->tp_repr(v))
(const char *) $18 = 0x00000001009bf430 "<A>(5, 10)"

Example 2: print obj.__dict__

1
2
3
4
(lldb) expr -- PyObject* $xx = PyObject_GenericGetDict(v, 0)
(lldb) expr -- PyObject* $yy = ($xx)->ob_type->tp_repr($xx)
(lldb) expr PyUnicode_AsUTF8($yy)
(const char *) $2 = 0x00000001009bf3d0 "{'x': 5, 'y': 10}"

Example 3: get attribute

1
2
3
(lldb) expr -- PyObject* $rr = PyObject_GetAttr(v, PyUnicode_FromString("y"))
(lldb) expr PyUnicode_AsUTF8($rr->ob_type->tp_repr($rr))
(const char *) $21 = 0x000000010097b4f0 "10"

Debug

There are multiple ways to set breakpoints in Python. PEP 553 introduces function breakpoint(). For older versions, you can do

1
import code; code.interact(local={**globals(), **locals()})

or

1
import IPython; IPython.embed()

or

1
export PYTHONBREAKPOINT=IPython.embed

if IPython is installed.

For pytest, you can use below command to automatically drop into ipython shell.

1
pytest --pdbcls=IPython.terminal.debugger:TerminalPdb

For multithreading programs, neither pdb or IPython.embed works well. Suppose you add a breakpoint at the same location in 3 threads, then once the breakpoint is hit, 3 pdb instances are created, and when you type some command in one pdb console and hit enter, it jumps to the pdb instance in another thread, which is very confusing. Ipython.embed is worse in this case. It throws exception, gets stuck and needs kill -9 to stop it. The community has been asking for stopping all threads behavior for a long time. See this stackoverflow answer.

One way to walk around this issue is using a global thread lock.

1
2
3
4
5
6
7
8
import threading
_lock = threading.Lock()
...

with _lock:
  breakpoint()
  assert 1 == 1
...

assert 1 == 1 above prevents the stack goes out of the lock.

Attach to a python process

gdb supports python API after gdb-7.0, but it must be configured with --with-python. We can use gdb --configuration to check if python API is supported or not. See more details of this post.

Using ubuntu as an example, you only need to apt-get install gdb python3-dbg, and then you can attach to a process gdb python <pid> to debug a running python program. However, for my case, our program runs using python3.8, and the system python version is python3.9. so apt-get install python3-dbg actually installs python3.9-dbg. It cannot find the python stacktrace as blow:

1
2
(gdb) py-bt
Unable to locate python frame

Also, pyrasite is super helpful to attach to a python process as well.

Pyrasite’s implementation is interesting. Basically, it uses PyRun_SimpleString.

1
2
3
4
5
6
7
8
9
(gdb) call PyGILState_Ensure()
$1 = PyGILState_UNLOCKED
(gdb) call PyRun_SimpleString("print(5+10); print(1000)")
(gdb) set $f = (FILE*)(fopen("xiong2.py", "r"))
(gdb) print $f
$19 = (FILE *) 0x5630fc64f370
(gdb) call PyRun_SimpleFile($f, "xiong2.py")
(gdb) call fclose($f)
(gdb) call PyGILState_Release(1)

See value of https://github.com/python/cpython/blob/878ead1ac1651965126322c1b3d124faf5484dc6/Include/pystate.h#L77

Python syntax

Class

Python grammar has class definition as

1
2
3
4
5
6
7
8
9
10
11
12
13
class_def:
    | decorators class_def_raw
    | class_def_raw

class_def_raw:
    | 'class' NAME ['(' [arguments] ')' ] ':' block

arguments:
    | args [','] &')'

args:
    | ','.(starred_expression | ( assignment_expression | expression !':=') !'=')+ [',' kwargs ]
    | kwargs

So basically, class can be defined as

1
2
class A(B, C, k1=v1, k2=v2):
    ...

What do these B, C, k1 and k2 could be? This is answered inside function builtin_build_class. Basically, B and C should be subclasses. k1 and k2 could define metaclass or class namespace variables.

Name mangling

Check out this

Tools

pyright

pyright is a python language server. It is also a command line tool for type check.

In order to make it work with python virtualenv. You need to add below configs.

1
2
3
"venvPath": "/opt/homebrew/Caskroom/miniforge/base/envs/",
"venv": "website-py3_11_0",
"extraPaths": ["./gen-py/"],

The venvPath config above can be replaced using command line argument -v /opt/homebrew/Caskroom/miniforge/base/envs.

Misc

conda, conda-forge and miniforge

conda is a pacakge manager. While conda-forge is a channel. Conda’s default channel will charge you in some commercial use cases

miniforge-installed conda is the same as Miniconda-installed conda, except that it uses the conda-forge channel (and only the conda-forge channel) as the default channel.

Refs

This post is licensed under CC BY 4.0 by the author.