Post

Tech diary

Some random thoughts or learnings.

2024-12-05 Thu

I always came cross real models. For example today https://2023.splashcon.org/details/iwaco-2023-papers/5/Borrow-checking-Hylo. Stay focused and consistently!

2024-10-06 Sun

I came across this repo https://github.com/libusual/libusual when I dig into pgbouncer. This is a super cool tool box. Just a mark here. I will read more about it.

2024-07-29 Mon

Python traceback and stacktrace are two different concepts. Traceback is an object from the place an exception is thrown to the place it is caught. Stacktrace is the call frames from current location to the main function.

logging library supports both of them: exc_info=True and stack_info=True. Also logging.exception(...) has exc_info=True by default.

2024-07-23 Tue

Today I learned a very interesting skill about Remote Code Execution attach. See link. Basically, kafka-ui has this vulnerability. We can use https://www.revshells.com/ to auto generate the reverse shell code. It is very fantastic.

2024-06-17 Mon

“You have to win the roadmap.” – Lisa Su

2024-06-12 Mon

TIL: https://github.com/rr-debugger/rr

2024-06-10 Mon

TIL: popcnt is a CPU instruction to count the number of 1’s in a binary word. C++ has a function for it https://en.cppreference.com/w/cpp/numeric/popcount.

2024-05-31 Fri

After upgrading MacOS, nerds fonts refused to work. After a quick search online, I found it is the problem with iTerm2: settings -> profiles -> default -> Text -> font -> Hack Nerd Font. Maybe I should spend some time learning the internals of how font works inside terminal emulator.

2024-05-15 Wed

I just watched a video from Jeff Dean about ML, and wrote down the papers mentioned below. Hopefully, I will have chance of reading them sometime in the future.

2024-05-12 Sunday

This is a good blog about heap analysis. http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.html

2024-04-28 Sunday

A good blog about Golang runtime https://zboya.github.io/post/go_scheduler/#runtimemain-%E7%9A%84%E6%89%A7%E8%A1%8C

2024-03-07 Thursday

A terrible learning about k8s https://github.com/kubernetes/kubernetes/issues/43916

2024-1-23 Tuesday

I had bad luck after installing Mysql on macbook. I always get this error

1
2
$ mysql -uroot
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)

I understand I can give it an explicit host -h 127.0.0.1 to force it to use network socket instead of Unix socket, but I want to figure out why. After a few trials, I made it work by doing two things

1
2
$ sudo rm -rf /opt/homebrew/var/mysql
$ brew postinstall mysql

This postinstall command is new to me. Then I realized that brew formula has a post_install block. Ah!. It needs some initialization work.

TIL.

2024-1-21 Sunday

Very nice compiler material https://c9x.me/compile/

2024-1-8 Monday

I read a blog titled “Go: what we go right and what we go wrong” recently and the related comments in HN https://news.ycombinator.com/item?id=38872362&p=2. The comments are quite insightful. I learned a technical term structural typing.

2023-12-24 Sun

I read a few videos from Prof. Andy. I just feel that the two most difficult problems in RDMS are query optimization and concurrency control.

2023-12-16 Sat

I need code folding in my blog, and this post https://github.com/cotes2020/jekyll-theme-chirpy/issues/436 perfectly solved my problem.

2023-12-15 Fri

This is really a nice series about Mysql Internals https://www.zhihu.com/column/c_1453135878944567296

2023-12-10 Sun

When I was watching Prof. Andy’s CMU class talk Query Execution 1, one part stroke me

Expression trees are flexible but slow. JIT compilation can (sometimes) speed them up.

This is very interesting. This reminds me of the days when we wrote the execution engine for the matching service DSL. I should definitely spend some time reading Postgres JIT.

2023-11-30 Thu

TIL: parquet-tools

2023-11-25 Sat

I finally set up my tech blog https://dingxiong.github.io/

2023-11-24 Fri

Tmux on MacOS drove me crazy. It reorders PATH variable. See https://superuser.com/a/583502 .

2023-11-21 Tue

I finally understand Python async/await.

2023-10-26 Thu

TIL: how to calculate C_n^k % p: https://codeforces.com/blog/entry/78873

2023-10-16 Mon

LC 1388 blows my mind. I should make it an exercise for my kid in future.

2023-10-11 Wed

I learnt quite a few things about Java today: c1, c2 compilier, ZGC, etc. Most useful things can be found at https://openjdk.org/jeps .

2023-10-07 Sat

TIL: asof join.

TIL: When I read Clickhouse documents, I found a good blog post series from Tilak. Vectorized and Compiled Queries.

2023-09-30 Sat

TIL: Java playground https://dev.java/playground/

TIL: Java virtual thread https://inside.java/2021/05/10/networking-io-with-virtual-threads/ This is interesting because I hate async/await. This is as easy as Golang goroutine but is stackless. I am kind of liking Java again!

2023-09-14 Thu

TIL: the meaning of each field inside /proc/meminfo.

2023-09-02 Sat

I spent almost one week studying the source code of Debezium and learnt a lot of details in it. Then I asked myself: if I need to read the source code of every tool I use, then I will burn out. I admit that reading source code helps me in the long run, but I do not have the energy cause I have a little baby. I would spend more time in family instead of self-improvement.

2023-08-29 Tue

TIL: kafka advertised.listeners from this post https://stackoverflow.com/questions/52996028/accessing-local-kafka-from-within-services-deployed-in-local-docker-for-mac-inc

2023-08-22 Tue

TIL: a good sql parser in Java https://github.com/apache/calcite

2023-08-16 Wed

I always feel overwhelmed by the new Apache projects in the data area. Today I read a great post https://www.clouddatainsights.com/real-time-olap-databases-and-streaming-databases-a-comparison/ that answers some of my confusions.

2023-08-10 Thu

Uff. I spent almost one week diving deep into TLS and finally get the blog post out.

2023-07-30 Sun

I was bored yesterday and tried to dig a bit more into distributed databases, so I found a Youtube video titled The Design & Architecture of CockroachDB Beta. Before watching it, I have a few questions on top of my mind:

  1. How strong consistency is achieved? We are all familiar with the master-slave architecture, but in this architecture, only the master node is the source of truth. A distributed db should not use this architecture because we should not always route the traffic to the same node. So how does CockroachDB solve this issue?
  2. How distributed transaction is achieved? two-phase commit?
  3. I already heard that CockroachDB is built on top of RocksDB. How does it implement a SQL layer on top a KV store? And especially how secondary index is constructed?
  4. Benchmark data compared to Mysql/PostgresDB.

After watching the video, I think my first question is resolved. It is based on the Raft consensus algorithm and data replication. Basically, each write will replicate data to multiple notes, and only majority of the node group thinks it is done, it will return to the client. In this way, we have multiple source of truth and any node has this replica can answer a strong-consistent query. Then, I am more curious about the performance. This video does not answer my fourth question though.

This video mentioned how transaction is implemented, but it is only briefly mentioned. My first intuition is that this process has so many race conditions that can break transaction. Also, it does not give many details about isolation levels. My third question is briefly explained too, but I still do not know how index is constructed and the performance.

Anyway, I need to read more.

I am back after surfing the internet for the whole afternoon! There are quite a few blogs and videos discussing transactions in CockroachDB, but none of them show the details I desire, so I was still lost until I read this paper RAFT based Key-Value Store with Transaction Support. I would admit that I love short papers! This paper is not about CockroachDB, but I believe CockroachDB uses the same idea. Basically, it is Raft + 2PC. The transaction coordinators form a Raft group. The data shards form a group of Raft groups. The transaction coordinator leader uses 2PC(two-phase commit) algorithm to coordinate data shards leaders to commit a transaction.

So far, we figured out how distributed transaction works. The paper also provides some benchmark data. Write transaction can take up to 5 seconds given heavy loads! I am more curious to see CockroachDB’s benchmark. We are still left with the 3rd and 4th questions.

OK. After digging around for some more time, It is 11:41pm PST. I found this post from CockroachDB team. It is well written. Now I understand how data is stored in the kv store and how secondary indices are implemented. But how about joins?

Btw, I came across quite a few databases I probably need to dig deep into.

  • RocksDB
  • BadgerDB
  • BoltDB. This is archived. There is a Hacker news post saying that the author of BadgerDB was disappointed by the performance of BoltDB, so he/she invented BadgerDB.
  • pebble
  • TiKV

2023-06-20 Mon

TIL: https://wg21.link/index.html is the place to search all C++ Standards Committee Papers. It is a really a wonderful place for me to dig into the history of some topics.

2023-06-15 Thu

TIL: LLVM does have its libc++ library, but its libc library is not ready to use. So it will use whatever libc library provided by OS: glibc, apple libc, Musl, etc.

See more details in this post. https://stackoverflow.com/questions/59019932/what-standard-c-library-does-clang-use-glibc-its-own-or-some-other-one

Also learnt the difference between abort and exit: https://stackoverflow.com/questions/397075/what-is-the-difference-between-exit-and-abort

2023-06-10 Sat

TIL: https://github.com/haoNoQ/clang-analyzer-guide/releases

2023-06-07 Wed

Last time I said LLVM static analyzer is my next focus. But I found it is so hard. I spent two days reading Prof. Anders Moller’s book Static Program Analysis, and some companying videos, called “DECA I” on Youtube. But still I found it not easy to follow.

Meanwhile, Prof. Claire Le Goues’s course Program Analysis seems quite interesting. I think it is better to learn it by doing projects.

2023-06-04 Sun

LLVM static analyzer will be my next focus.

2023-06-01 Thu

TIL:

The final period or comma goes inside the quotation marks, even if it is not a part of the quoted material, unless the quotation is followed by a citation. If a citation in parentheses follows the quotation, the period follows the citation.

Grammarly helps!

2023-05-27 Sat

TIL: Split Infinitives

I started using grammarly and it pointed me a potential mistake like to easily use. According to https://www.niu.edu/writingtutorial/grammar/split-infinitives.shtml,

An infinitive is a verb preceded by the word to: (to write, to examine, to take, to cooperate). When an adverb appears between to and the verb itself, we get a split infinitive.

2023-05-24 Wed

TIL: The Inverted Pyramid Structure

2023-05-16 Tue

Definitely will never read the memory paper from Ulrich. :)

2023-05-14 Sun

After reading a lot of ELF in the last 3 weeks. I will pick up mold source code. I found that I can attach vscode to k8s. I will do that I want to dig into x86 binary files.

But before that, I think I’d better read Ulrich Drepper’s another paper first: What Every Programmer Should Know About Memory.

2023-05-07 Sun

TIL: static keyword in C means local file scope. Thanks to Ulrich for his wonderful paper “How to Write Shared Libraries”.

2023-04-30 Sun

TIL: Ulrich Drepper is the author of glibc. He wrote quite a few good papers about linux, core system etc. See the list.

I will spend some time reading them. Some interesting onces:

  • https://www.akkadia.org/drepper/
  • https://www.akkadia.org/drepper/dsohowto.pdf

2023-04-29 Sat

TIL: The first C++ compiler is called Cfront. Basically, it just translate C++ to C.

2023-04-26 Wed

r/ExperiencedDevs is a really good sub reddit!

2023-04-19 Wed

Today I spent a little bit time reading dynamo paper https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html. I did not finish reading all.

2023-04-06 Thu

I got interested in ELF recently. After finishing major part of my boring Kafka work, I decide to take some time to dive into ELF. And during this process, I found this blog series about linker. But why to study linker for ELF though? Because these two are closed related. Without some linker knowledge, I probably will never appreciate the value of the various sections inside ELF. Ian Lance Taylor is great engineer and researcher. No doubt that I will learn a lot from his blog posts.

2023-04-02 Sun

Today I finished following the kilo project! So fantastic! I learnt a lot about terminal editor and picked up some C knowledge. The author Salvatore Sanfilippo is also the author of Redis. I already followed his blog on feedly. Hopefully, I will read more update from him in future.

Again, excited!

2023-03-30 Thu

Today I read a great blog about termios https://blog.nelhage.com/2009/12/a-brief-introduction-to-termios/ Not only termios, but it also helps me to differentiate console, terminal, shell. I may read Nelson’s other posts in future. I notice he was a staff engineer in stripe working on Ruby static checker. That is well deserved! I wish I have such strong technical capability one day!

2023-03-19 Sun

I built a wails app recently, and today I tried to package it as a .pkg file in MacOS. This is a terrible experience. First, in order to publish it to Apple store, you need to pay $99 per year for a developer account. Ok, I do not want to pay. How about just making it only work for myself, namely, adding it to Launchpad? I copied the full release folder to /Applications. Man, the app immediately crashed every time I clicked the icon. Sure, I guess probably launchpad virtual env does not allow port forwarding. But where are the app logs? Nowhere! From below snippet, you see that fd 0, 1, 2 all point to /dev/null. There are some posts online talking about how to log to syslog and then use Console to view the logs. At this moment, I lost all my patience and gave up. This reminded me of the dark days of configuring a windows server.

Fuck it! Both windows and macos suck!

1
2
3
4
5
6
7
8
9
10
11
12
$ ps -ef |grep wails
  502 24870     1   0  1:21PM ??         0:00.03 /bin/bash /Applications/wails-example.app/Contents/MacOS/wails-example
  502 24878  1703   0  1:21PM ttys017    0:00.00 grep wails
(website-py3_11_0) 2023-03-19 13:21:41 (bash) /Applications/wails-example.app/Contents/MacOS
$ lsof -p 24870
COMMAND   PID      USER   FD   TYPE DEVICE SIZE/OFF                NODE NAME
bash    24870 xiongding  cwd    DIR   1,13      640                   2 /
bash    24870 xiongding  txt    REG   1,13  1326688 1152921500312422323 /bin/bash
bash    24870 xiongding    0r   CHR    3,2      0t0                 335 /dev/null
bash    24870 xiongding    1u   CHR    3,2     0t72                 335 /dev/null
bash    24870 xiongding    2u   CHR    3,2      0t0                 335 /dev/null
bash    24870 xiongding  255r   REG   1,13      131           110694417 /Applications/wails-example.app/Contents/MacOS/wails-exampl
This post is licensed under CC BY 4.0 by the author.