顯示具有 linux 相關 標籤的文章。顯示所有文章

2023年10月28日星期六

火焰圖 get /usr/share/d3-flame-graph/d3-flamegraph-base.html

不知道你是不是和我一樣, 看性能之巔 (第二版), page 268 時, 想要把 perf 資訊以火焰圖的方式呈現, 結果怎麼找都找不到 /usr/share/d3-flame-graph/d3-flamegraph-base.html。

需要使用 d3-flame-graph, 這是 nodejs 的東西, 在和 npm 奮戰多時之後, 我放棄了, 還是生不出來, 我只想要 /usr/share/d3-flame-graph/d3-flamegraph-base.html, 不能讓我下載這個檔案就好了嗎? 不想搞 npm 的東西。

後來在 [PATCH v3] perf script flamegraph: Avoid d3-flame-graph package dependency找到, 明明只要下載這個檔案就搞定的事, 偏偏被折騰那漫久, 文章也提到, 在 debian 要生出這個東西是很痛苦的, 那麼巧, 我就是用 debian, 在有了 /usr/share/d3-flame-graph/d3-flamegraph-base.html, 終於可以輸出火焰圖了。

直接抓, 輕鬆愉快
wget https://cdn.jsdelivr.net/npm/d3-flame-graph@4.1.3/dist/templates/d3-flamegraph-base.html

root@deb64:~# perf record -F 99 -a -g -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.892 MB perf.data (1982 samples) ]
root@deb64:~# perf script report flamegraph
dumping data to flamegraph.html

會輸出 fig 1. flamegraph.html

fig 1. flamegraph.html

Currently flame graph generation requires a d3-flame-graph template to
be installed. Unfortunately this is hard to come by for things like
Debian [1]. If the template isn't installed then ask if it should be
downloaded from jsdelivr CDN. The downloaded HTML file is validated
against an md5sum. If the download fails, generate a minimal flame
graph with the javascript coming from links to jsdelivr CDN.

有些工具是用 python, perl, nodejs, 他們都有自己的執行環境, 運氣不好, 會在這上面花很多時間。大部份人也不可能都熟悉這些環境, 能熟其中一個就已經很不錯了。

perf stat -e instructions,cycles,L1-dcache-load-misses,LLC-load-misses,dTLB-load-misses   ls /

apt-get install bpfcc-tools


profile-bpfcc

cpudist-bpfcc

2021年11月10日星期三

telegram + gcin 無法輸入中文

你可能遇到這個問題, telegram 和 gcin 搭配時, 無法正常喚起 gcin 輸入中文, 在網路上找了很多資料, 改了一些環境變數之後, 沒有其他人幸運, 你還是不能使用 gcin 輸入中文, 然後找到我寫的這篇, 很遺憾, 我要說的答案是換掉 gcin, 改用 fcitx, 這個問題本身很可能出在 gcin, 在某些組合之下, telegram 就是無法搭配 gcin 使用中文輸入。在 gcin 還沒做出修改之前, 這問題可能無解。

因為軟體改版的關係, 之前找的解法在新版可能無法使用, 這是正常的情形。

另外的建議是使用該 linux 套件提供的 gcin 和 telegram, 這樣應該會比較沒問題, 可以正常使用 gcin 在 telegram 輸入中文。

我的情形是這樣, 舊版的 ubuntu 已經沒有更新 telegram, 所以我使用官網的新版 telegram, 然後 gcin 就無法輸入中文。在一陣子推測之後, 有了以下結論。

telegram + gcin 看起來應該是 telegram 在某個版本之後, 沒有用 dynamic link qt, 造成 gcin 在某種情況下無法被叫出來, 這個會有點麻煩, 但是 fcitx 不知道使用了什麼方法, 可以正確處理這個問題。

list 1. 官網 3.2.2 telegram

 1 	linux-vdso.so.1 (0x00007ffc31967000)
 2 	libgtk3-nocsd.so.0 => /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007fc1fc23e000)
 3 	libgio-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0 (0x00007fc1fbe9f000)
 4 	libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007fc1fbc4b000)
 5 	libglib-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007fc1fb935000)
 6 	libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007fc1fb6f0000)
 7 	libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007fc1fb43c000)
 8 	libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007fc1fb214000)
 9 	libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fc1faedc000)
10 	libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 (0x00007fc1facda000)
11 	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1faabb000)
12 	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc1fa8b7000)
13 	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1fa519000)
14 	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1fa128000)
15 	/lib64/ld-linux-x86-64.so.2 (0x00007fc1fc445000)
16 	libgmodule-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007fc1f9f24000)
17 	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc1f9d07000)
18 	libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fc1f9adf000)
19 	libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fc1f98c4000)
20 	libmount.so.1 => /lib/x86_64-linux-gnu/libmount.so.1 (0x00007fc1f9670000)
21 	libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fc1f9468000)
22 	libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fc1f91f6000)
23 	libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007fc1f8fc4000)
24 	libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007fc1f8d92000)
25 	libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007fc1f8b8e000)
26 	libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007fc1f8988000)
27 	libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007fc1f873b000)
28 	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc1f8533000)
29 	libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007fc1f831e000)
30 	libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007fc1f8117000)

list 1 可以看到, 從官網抓的 telegram 沒有 dynamic link qt, 可能是這個原因造成 gcin qt 模組無法正常發揮, 因為 link gcin qt 模組的 qt library 和 telegram 用的 qt library 版本可能不一樣。

list 2. linux 套件提供的 telegram

e7000)
 21 	libxcb-record.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-record.so.0 (0x00007f125c1c6000)
 22 	libxcb-screensaver.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-screensaver.so.0 (0x00007f125c1c1000)
 23 	libqrcodegencpp.so.1 => /usr/lib/x86_64-linux-gnu/libqrcodegencpp.so.1 (0x00007f125c1b1000)
 24 	libminizip.so.1 => /usr/lib/x86_64-linux-gnu/libminizip.so.1 (0x00007f125bfa5000)
 25 	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f125bf7c000)
 26 	libQt5Network.so.5 => /usr/lib/x86_64-linux-gnu/libQt5Network.so.5 (0x00007f125bdf2000)
 27 	libQt5Gui.so.5 => /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5 (0x00007f125b730000)
 28 	libQt5Core.so.5 => /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 (0x00007f125b1e6000)
 29 	libQt5WaylandClient.so.5 => /usr/lib/x86_64-linux-gnu/libQt5WaylandClient.so.5 (0x00007f125b0b3000)
 30 	libdbusmenu-qt5.so.2 => /usr/lib/x86_64-linux-gnu/libdbusmenu-qt5.so.2 (0x00007f125b073000)
 31 	libQt5Widgets.so.5 => /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5 (0x00007f125a9e9000)
 32 	libQt5DBus.so.5 => /usr/lib/x86_64-linux-gnu/libQt5DBus.so.5 (0x00007f125a960000)
 33 	libKF5WaylandClient.so.5 => /usr/lib/x86_64-linux-gnu/libKF5WaylandClient.so.5 (0x00007f125a877000)

list 2 L31 可以看到 dynamic link 的 qt library, 所以我才推測 gcin 的問題和 qt 有關。但如果都使用 xim 的話, 不應該會有這種差異的, ref 2 有提到一些 qt 處理 xim 的問題, 也許就是這個差異。

另外有個方法就是重新編譯 telegram, 也許可以解決, 如果真的不想換掉 gcin, 那只好用 web 版本了。

linux 上還有其他輸入法架構, 我沒有接觸其他的輸入法架構, 不確定會不會也有類似問題。

ref:

2021年10月22日星期五

linux 使用 command 設定 wireless

linux 下的燒錄和設定 wireless 是我最不想學的指令, 燒錄指的是有保護的光碟; wireless 則是有加密的設定, 如果沒有加密, 指令其實不是太難。

wireless 有時候要先搞定 wireless card, 還好目前這個問題已經很容易處理, 早些時候 (2002 左右), 要在 linux 裝上 wireless card driver, 要費很大功夫。

甚至還有 Ndiswrapper 這種在 linux 上使用 windows driver 的用法。

wireless 設定我通常是用 wicd 來設定, network manager 有點大, 我不太喜歡用這個。

最近再次挑戰使用指令來設定 wireless, 想說這麼久了, 應該會有人寫出更容易設定的文件了, 還真的讓我找到了。

如果沒有使用複雜的加密設定, 可以參考「Ubuntu 網路設定 - iwlist, iwconfig 無線上網指令」

如果使用 wpa, 請參考以下文章。

結合上述 2 篇文章, 終於可以設定手機 WPA 分享, 公司使用帳號/密碼的連線方式, 可以不需要 wicd 了。

list 1

1 wpa_supplicant -Dnl80211,wext -iwlp2s0 -cphone-ap.cnf
2 wpa_supplicant -Dnl80211,wext -iwlp2s0 -coffice-ap.cnf
3 wpa_supplicant -B -i wlp2s0 -c /var/lib/wicd/configurations/70b31741b721 -Dwext

從 list 2 可以看到, 要怎麼找出那些參數設定是最困難的, 我怎麼知道要設定 L4 ~ L8 這些設定值呢?

list 2. Office-ap.cnf

 1 # reading passphrase from stdin
 2 network={
 3 	   ssid="Office"
 4         proto=RSN
 5         key_mgmt=WPA-EAP
 6         eap=PEAP
 7         pairwise=CCMP
 8         group=CCMP
 9         identity="user_name"
10         password="password"
11 	#psk="1122335566"
12 	#psk=aabbccddee
13 }

找出 wireless essid:

list 5

iwlist wireless_interface scan | grep ESSID
ex: 
 1 root@u64:root# iwlist wlp2s0 scan | grep ESSID
 3                     ESSID:"KUAN"
 4                     ESSID:""
 5                     ESSID:"CHEN'S Family 2"
 6                     ESSID:"JOY_HALL"
 7                     ESSID:"BOLKH"
 8                     ESSID:"BOLKH"
 9                     ESSID:"CHEN'S Family 2"
10                     ESSID:"BOLKH"
11                     ESSID:"TOY"

產生 wpa 密碼設定檔: wpa_passphrase KUAN > KUAN.cnf

這時候程式沒有任何提示, 輸入你的密碼按下 enter 就可以離開 wpa_passphrase 指令, 不過密碼會顯示在畫面上。

list 3. phone-ap.cnf

1 # reading passphrase from stdin
2 network={
3 	ssid="phone"
4 	#psk="11223355"
5 	psk=aabbccddee
6 }

在 ubuntu 18.04 wicd/wpa 設定檔在 /var/lib/wicd/configurations

wpa_supplicant -B -i wlan0 -c /var/lib/wicd/configurations/92633b140e0c -Dwext

2021年9月5日星期日

訂閱/取消 linux kernel mail list

參考: How do I unsubscribe from the linux-kernel mailing list?

訂閱

mail address: majordomo@vger.kernel.org
內容寫 subscribe linux-kernel
標題不用寫任何字

取消訂閱

mail address: majordomo@vger.kernel.org
內容寫 unsubscribe linux-kernel
標題不用寫任何字

或是參考以下 url

http://vger.kernel.org/majordomo-info.html

直接按下紅色右邊的連結會叫出 mail 軟體填入訂閱/取消的資訊

Send request in email to address <majordomo@vger.kernel.org>

To subscribe a list (``linux-kernel'' is given as an example), use following as the only content of your letter:

subscribe linux-kernel

Like via this URL: "subscribe linux-kernel".

To get off a list (``linux-kernel'' is given as an example), use following as the only content of your letter:

unsubscribe linux-kernel

Like via this URL: "unsubscribe linux-kernel".

Indeed these commands have optional second parameter: your email address, but Majordomo has a tendency to become upset, and refuse to serve, if you use it, and your "From:"/"Sender:"/"Reply-To:" headers don't match with your real address. Less confusion is better, of course.

A listing of all lists, and their archives at VGER's Majordomo.

2020年11月9日星期一

隱藏 firefox (Quantum 版本) tab, 只顯示 tree tab

一直很喜歡 firefox tree tab view 的 addon, 但是在 Quantum 版本之後, 就算裝了 tree tab view addon, 原本畫面的 tab 還是會出現, 後來才得知需要自己寫一個 css。

在 url 打入 about:config 尋找 toolkit.legacyUserProfileCustomizations.stylesheets 一般會是 false, 改成 true。

userChrome.css

1 #TabsToolbar { visibility: collapse !important; }

建立 userChrome.css 檔案, 找到 profile 的目錄, 建立 chrome, 放入 chrome 目錄。我的是在 /home/descent/.mozilla/firefox/nt3gd16x.default-release/chrome

這邊我之前放錯目錄, 結果沒有效果。有點不好找。

在 url 打 about:support 找 Application Basics, 然後就可以看到 Profile Directory

Application Basics

  
Name 	Firefox
Version 	82.0.2
Build ID 	20201027185343
Distribution ID 	
User Agent 	Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0
OS 	Linux 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24)
Application Binary 	/usr/lib/firefox/firefox
Profile Directory 	/home/descent/.mozilla/firefox/nt3gd16x.default-release

userChrome.css.bak 可以暫時把 bookmark tar 隱藏。

userChrome.css.bak

1 #TabsToolbar { visibility: collapse !important; }
2 #PersonalToolbar {visibility: collapse !important;}
3 #navigator-toolbox:hover > #PersonalToolbar {visibility: visible !important;}

fig 2 橫列的 tab 已經消失

2020年5月29日星期五

使用 expect 自動登入 bbs

expect 是 Tcl Extensions, 所以要看 tcl 來學習, 再加上 expect 自己擴充。

從這裡參考而來: 使用 expect 自動登入 bbs

p2.exp

 1 #!/usr/bin/expect
 2 set timeout 60
 9 
16 set BBS_ID [lindex $argv 0]
17 set BBS_PW [lindex $argv 1]
18 puts $BBS_ID
19 puts $BBS_PW

21 
22   spawn ssh -oBatchMode=no -oStrictHostKeyChecking=no bbsu@ptt.cc
23 
30 
31 expect "或以 new 註冊: " { send "$BBS_ID\r" }
32 expect "請輸入您的密碼" { send "$BBS_PW\r" }
33 expect "請按任意鍵繼續" 
34 send "\r"
35 
36 interact

用法:
p2.exp user_name user_password

如果有不知道的按鍵, 例如方向鍵, 可以使用 autoexpect 來產生 expect script, 裡頭就會得知方向鍵該怎麼寫。

autoexpect 執行之後, 依序按下「上下左右」, 輸入 exit 離開 autoexpect, 這時候就會產生一個 script.exp, 參考其中內容即可。

當然也可以運用 expect 來自動發布 bbs 文章。

2020年5月24日星期日

pthread 實作練習 (1) - pthread_create, pthread_exit, pthread_self

相分食有賰, 相搶食無份

在上次的「user mode pthread 實作 - simple_thread」已經知道如何實作 user mode thread 的觀念, 這次進一步來完成「CS170: Project 2 - User Mode Thread Library (20% of project score)」要求的 3 個函式:

int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg);
void pthread_exit(void *value_ptr);
pthread_t pthread_self(void);

自己嘗試了一下, 比我想像的還要難一些。

這篇文章說明 pthread_create(), pthread_exit() 我自己實作的過程, pthread_self() 感覺好像不難, 但也沒那麼單純。

[pthread_t pthread_self(void)]

pthread_self 把 pthread_t 當成 thread id 回傳, 這就造成我只能把 pthread_t 定義成一個數字, 雖然也可以把 pthread_t 定義成 struct, 但在這邊就不好處理了。

原本我把 pthread_t 定義為 struct, 只好改為

typedef unsigned long long pthread_t;

struct ThreadData
{
  my_x32_jmp_buf jmp_buf_;
};
typedef std::pair ThreadPair;

另外準備 ThreadPair 當做管理 thread 的資料結構。

pthread_t pthread_self(void)
{
  return thread_vec[current_index].first;
}

pthread_self() 就這麼簡單。當目前的 thread 執行時, 傳回目前的 ThreadPair 資料結構。所以 current_index 指向正確的資料結構是很重要的, 計算錯誤就指到別的 thread 了。

[void pthread_exit(void *value_ptr)]

pthread_create 相對簡單一些, pthread_exit 就有點難倒我。

在「user mode pthread 實作 - simple_thread」我是用 while(1) 測試一個 thread, 但如果這個 thread 正常結束了, 應該要怎麼辦呢? 應該要正常離開吧。

但什麼才是正常離開呢?

schedule 不會再去把這個 thread 選出來執行, 否則這個 thread 永遠都會被執行, 但是由於 function 已經結束, 再選出來執行的時候, 只會亂跑, cpu 跳到不正確的地方 (因為已經沒有正常的程式碼了), 最後會造成整個 process segmentation fault。

另外一個問題是, 如果這個 thread 沒有呼叫 pthread_exit(), 我們希望可以作到在 thread 結束時, 會去呼叫 pthread_exit(), 覺得是不可能的任務嗎?

simple_thread.cpp

206 int func3_ret = 33;
207 void *func3(void *arg)
208 {
209   {
210     printf("331 ");
211     printf("332 ");
212     printf("333 ");
213     printf("334 ");
214     printf("335 ");
215     printf("\n");
216   }
217   return &func3_ret;
218 }

在這個 func3 thread 結束之後, 要讓他呼叫 pthread_exit(), 而 pthread_exit 會做一些事情, 將 func3 這個 thread 移除, 之後就不會再選 func3 來執行。

這樣應該會覺得已經很難了, 不過作業還不只這樣, 作業希望可以把 func3_ret 的值傳給 pthread_exit(void *value_ptr), 這樣在 pthread_exit(void *value_ptr) 印出 value_ptr 時, 會得到 33, list 1 的結果。

list 1

331 332 333 334 335
thread exit: 0x81d5074, retval: 33, current_index: 2

有點不可思議是嗎?

結果又是我自己把他想得太難了, 在看完「CS170: Project 3 - Thread Synchronization (20% of project score)」Implementation 之後, 原來有個這麼簡單的方式, 我恍然大悟。

先說明我自己那麼胡亂的想法。

怎麼在 func3 之後呼叫 pthread_exit(), 操作 stack, 在 stack 的 return address 把 pthread_exit 位址存入之後即可, 不算太難, 但如果要把 &func3_ret 傳給 pthread_exit 的參數呢? 有點難, 我先 push 一個 function push_arg, 再次 push pthread_exit, 讓 pthread_exit 可以取得 push_arg 的參數也就是 &func3_ret, 看起來很厲害, 聽不懂沒關係, 因為完全是多此一舉。

漂亮的解法是, 只要用一個 wrapper function 包住這個 thread function 即可, 文字說明可能不清楚, show me the code。

simple_thread.cpp

215   void wrap_routine(void *(*start_routine) (void *), void *arg)
216   {
217     void *ptr;
218     if (start_routine)
219     {
220       printf("arg: %p\n", arg);
221       ptr = start_routine(arg);
222     }
223     pthread_exit(ptr);
224   }

236     thread_pair.second.jmp_buf_[0].eip = (intptr_t)wrap_routine;

251     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)arg;
252     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);
253
254     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)start_routine;
255     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);
256
257     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)0; // simulate push return address
258     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);

simple_thread.cpp L215 用了一個 wrap_routine 包住要執行的 thread function start_routine, 最後再呼叫 pthread_exit() 即可。

而 jmp_buf 的 eip 則是把 wrap_routine 填入即可。

而 wrap_routine 的 2 個參數: start_routine, arg, 就從 stack push 進去, simple_thread.cpp L251 ~ 257 的部份。

如此一來, 就算你的 thread funcion 沒呼叫 pthread_exit, wrap_routine 也會幫你呼叫, 也可以取得 start_routine 的 return value, 實在妙以。

這樣就順利解決 pthread_exit 取得 thread function return value 的問題, 再來由於這個 thread 已經結束, 那 pthread_exit 還要做什麼呢? 我是讓 pthread_exit 去挑下一個 thread function 來執行, 並在相關的資料結構標記目前的 thread 已經結束, 之後就不會挑這個 thread 來執行。

但是這邊有個討厭的難題, 由於存取到 global variable, 這個 global variable 在 signal handler 也會使用, 所以要注意到同步的問題, 我目前沒有處理。Todo

做這些是為了之後的 pthread_join 做準備。

[int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg)]

這裡有一個我之前忽略的問題, 沒有讓 main thread 繼續執行下去, 得補上這段。另外, 如果 main thread 結束, 所有的 thread 也要跟著結束。

list 2. pthread_create

235   int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg)
236   {
237     static int init_main_thread = 0;
238     static DS::ThreadPair main_thread_pair;
239
240 #if 1
241     sigset_t sigs;
242     sigemptyset(&sigs);
243     sigaddset(&sigs, SIGRTMIN);
244     sigprocmask(SIG_SETMASK, &sigs, 0);
245 #endif
246
247     if (0 == init_main_thread)
248     { 
249       main_thread_pair.first = 1; // fixed to 1
250       thread_vec.push_back(main_thread_pair);
251       init_main_thread = 1;
252     }
253
254     ThreadPair thread_pair;
255     *thread = gen_tid();
256     thread_pair.first = *thread;
257
260     thread_pair.second.jmp_buf_[0].eip = (intptr_t)wrap_routine;
262     
263     auto stack_addr = thread_malloc_stack(BUF_SIZE);
264     if (stack_addr == 0)
265       return -1;
266     
267     printf("xx sizeof(intptr_t): %u\n", sizeof(intptr_t));
268 
269     thread_pair.second.jmp_buf_[0].esp = ((intptr_t)stack_addr + BUF_SIZE - sizeof(intptr_t)); // current stack - 4 or 8
270   
271     printf("stack_addr + BUF_SIZE: %p\n", (char *)stack_addr+BUF_SIZE);
274 
275     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)arg;
276     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);
277     
278     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)start_routine;
279     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);
280 
281     *(intptr_t*)thread_pair.second.jmp_buf_[0].esp = (intptr_t)0; // simulate push return address
282     thread_pair.second.jmp_buf_[0].esp -= sizeof(intptr_t);
299 
300     thread_vec.push_back(thread_pair);
301     cur_thread = thread_vec.end() - 1;
302     current_index = thread_vec.size() - 1;
303     
304     auto &m_thread_pair = thread_vec[0];
305     if (my_setjmp(m_thread_pair.second.jmp_buf_) == 0)
306     {
307       my_longjmp(DS::thread_vec[DS::current_index].second.jmp_buf_, 1);
308     }
309     else
310     {
311       printf("m th\n");
312     }
313     return 0;
314   }

list 2. L304 ~ 312, 就是在保存 main thread jmp_buf, list 2. L247 ~ 252 把 main thread 的資料結構加入 thread_vec, tid 固定在 1, 所以呼叫一次 pthread_create, 會產生 2 個 thread 資料結構, 一個是該 function, 另外一個是 main thread。

所以 signal handler 要保證在之後才能發動, 要不然 main thread 的 jmp_buf 沒設定, 就無法回到 main thread 了。我沒有處理此狀況, 不難, 先 block 該 signal 即可, setjmp 之後在 unblock。

其他就是在設定 jmp_buf 的 eip, esp, 和之前一樣。

source code:
https://github.com/descent/simple_thread/tree/master/cpp

thread 的上限:

cat /proc/sys/kernel/threads-max 
我的系統是 39319

2020年3月20日星期五

執行遠端的 X 應用程式包含中文輸入法

本機: pc 端 linux 遠端 eeepc 901: 透過 ssh 連到「遠端 eeepc 901」

本機的 /etc/ssh/ssh_config 修改 ForwardX11 yes

這樣從本機連入 eeepc 901 時, eeepc 901 的終端機會設定 DISPLAY=localhost:10.0 就可以將遠端 eeepc 901 的 firefox 畫面執行到本機中。不用自己辛苦的設定 DISPLAY 變數。

遠端 xwindow app 使用中文輸入法

env: 連入 eeepc 901, 使用 eeepc 901 firefox, eeepc 901 裝的是 gcin, 本機裝的是 fcitx, 所以執行 eeepc 901 的 firefox 之後, 使用的是 gcin 來輸入中文, 本機的 x 應用程式則是用 fcitx, 有點複雜, 希望你沒昏頭。

ssh username_abc@eeepc901_ip # 請輸入對應的 username/ip
export XMODIFIERS=@im=gcin
export GTK_IM_MODULE=gcin
export QT_IM_MODULE=gcin # for telegram
gcin&
firefox

這樣就會看到 eeepc 901 的 firefox 在本機的 X 執行起來, 中文也可以正常輸入。

中文輸入的問題: 試了好多天, 終於搞定我的 X Client

如果遇到以下問題:

Error: cannot open display: localhost:10.0
X11 connection rejected because of wrong authentication.

參考:
xauth not creating .Xauthority file

使用 xauth 重新建立 ~/.Xauthority

# Rename the existing .Xauthority file by running the following command
mv .Xauthority old.Xauthority

# xauth with complain unless ~/.Xauthority exists
touch ~/.Xauthority

# only this one key is needed for X11 over SSH
xauth generate :0 . trusted

# generate our own key, xauth requires 128 bit hex encoding
xauth add ${HOST}:0 . $(xxd -l 16 -p /dev/urandom)

# To view a listing of the .Xauthority file, enter the following
xauth list

2020年3月13日星期五

linux/unix signal 議題

以簡馭繁，以變為宗。

signal 是一個很複雜的東西, 如果和 fork, thread 搞在一起, 複雜到令人害怕, 關於 signal 和 thread 的其中一個複雜議題, 可以參考 - thread 和 signal。

如果用 c++ 再搭配 c++ exception handling, 那更是複雜到爆炸, 隨便混在一起, 你清楚程式的執行路徑嗎?

在我剛開始接觸 linux programming 之時, 每本相關書籍都會提到 siganl, 但我總是懵懵懂懂, 好像知道了, 又好像完全不認識 signal, 在利用 signal + setjmp/longjmp 寫出 user mode thread 時, 我又重新學習了 signal。

signal 可以想成是中斷。一般 cpu 會有中斷控制器可以接受中斷, signal 則是 unix 用來模擬中斷的行為。

如果想練習中斷控制器的程式, 但沒有適當的硬體或是不想面對硬體的暫存器設定, 可以用 signal 程式來練習, 和寫中斷程式的觀念都一樣, 本篇就是要討論這些問題。

當然硬體的中斷不能佔用太多 cpu 時間來處理, signal 則沒有這樣的限制, 你要在 signal handler while(1) 也不會影響整個系統。但其餘要注意的可重入性都是和中斷一樣的。

而 signal 一開始的設計不完美, 衍生一些漏洞, 而之後改善的版本是以另外一組 signal 來提供, 之前不完美的 signal 就因為相容性保留下來。

「Linux 内核源代码情景分析」6.4 一節, 說明 linux 怎麼實作 signal, 我從這裡得知為什麼 pause() 可以被 signal 打斷, 而 system call 為什麼又會因為 signal 而有被打斷而沒有執行完整的問題。

傳統 signal (不完美版本)

Signal     Value     Action   Comment
───────────────────────────────────────────
SIGHUP        1       Term    Hangup detected on controlling terminal
                              or death of controlling process
SIGINT        2       Term    Interrupt from keyboard
SIGQUIT       3       Core    Quit from keyboard
SIGILL        4       Core    Illegal Instruction
SIGABRT       6       Core    Abort signal from abort(3)
SIGFPE        8       Core    Floating point exception
SIGKILL       9       Term    Kill signal
SIGSEGV      11       Core    Invalid memory reference
SIGPIPE      13       Term    Broken pipe: write to pipe with no
                              readers
SIGALRM      14       Term    Timer signal from alarm(2)
SIGTERM      15       Term    Termination signal
SIGUSR1   30,10,16    Term    User-defined signal 1
SIGUSR2   31,12,17    Term    User-defined signal 2
SIGCHLD   20,17,18    Ign     Child stopped or terminated
SIGCONT   19,18,25    Cont    Continue if stopped
SIGSTOP   17,19,23    Stop    Stop process
SIGTSTP   18,20,24    Stop    Stop typed at terminal
SIGTTIN   21,21,26    Stop    Terminal input for background process
SIGTTOU   22,22,27    Stop    Terminal output for background process

改善的版本:
SIGRTMIN ~ SIGRTMAX

這篇文章要看幾個 signal 特性:
q11: 當處理 SIGUSR1 signal handler 發動時, 這時候再收到一個 SIGUSR1, 同一個 SIGUSR1 signal handler 會被中斷, 然後去執行這次收到的 SIGUSR1 引發的 SIGUSR1 signal handler 嗎?

注意: 這裡指再次收到同樣的 SIGUSR1 signal, 如果是處理 SIGUSR1 signal handler, 然後收到 SIGUSR2, 那是不同的情形。

答案是「會」, 也「不會」。

如果是「不會」的話, 等到第一次的 SIGUSR1 signal handler 執行完之後, 會再次執行一次 SIGUSR1 signal handler 嗎?

答案是「會」, 也「不會」。

如果是「會」的話, 再問一個問題:
q22: 當處理 SIGUSR1 signal handler 發動時, 這時候再收到「兩個」SIGUSR1, 同一個 SIGUSR1 signal handler 執行結束之後, 這個 SIGUSR1 signal handler 會再執行 2 次嗎?

在 q11 的前提上, 答案是不會, 一樣只執行一次。

但是如果 SIGUSR1 換成 SIGRTMIN, 同一個 signal handler 會執行 2 次, 很奇怪, 為什麼要設計成不一樣的行為? 這就是新舊 2 種 signal 的分別。

test_signal.c 用了 2 個 signal 測試, 一次是 SIGUSR1, 另外一次是 SIGRTMIN。我刻意讓 signal handler sigalrm_fn 做個 delay, 以便在還沒結束時, 可以再收到一個或是兩個 SIGUSR1/SIGRTMIN。

test_signal.c

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <unistd.h>
 4 #include <sys/time.h>
 5 #include <signal.h>
 7 
 8 void sigalrm_fn(int sig)
 9 {
10   static int cnt;
11   printf("got sig: %d, cnt: %d\n", sig, cnt);
12   for (int i=0 ; i < 10000 ; ++i)
13     for (int i=0 ; i < 1000 ; ++i)
14       for (int i=0 ; i < 500 ; ++i)
15         ;
16   printf("end USR1!\n", cnt);
17   ++cnt;
18 }
19 
20 int main(int argc, char *argv[])
21 {
22   printf("SIGRTMIN: %d\n", SIGRTMIN);
23   printf("SIGRTMAX: %d\n", SIGRTMAX);
24   //signal(SIGUSR1, sigalrm_fn);
25   signal(SIGRTMIN, sigalrm_fn);
26 
29   while (1) 
30     pause();
31 
32   return 0;
33 }

對於 SIGUSR1 來說, 如果在 sigalrm_fn 執行期間送出 2 個 SIGUSR1, sigalrm_fn 只會執行一次; 對於 SIGRTMIN 來說, 如果在 sigalrm_fn 執行期間送出 2 個 SIGRTMIN, sigalrm_fn 會執行 2 次, 一般來說都會覺得 SIGRTMIN 的行為比較正確, 這就是改善後的效果。

對於傳統的 signal, 2 次的 signal 會被合併成一個, 對於 rt 系列的 signal, 會被 queue 起來。

在我的系統上, SIGRTMIN 是 34, 所以用以下的指令測試。

killall -34 test_signal
killall -s SIGUSR1 simple_thread

在 end USR1! 訊息還沒有印出的時候, 再送出 2 次的 SIGRTMIN 或是 SIGUSR1, 就可以看到這個現象。

如果在你的平台不是這樣的結果可能是 signal 的預設行為不同, 那 ... 就算了。

如果這樣還沒有難倒你, 那再來看一個。

sysv_signal() 是古早不可靠時代的版本, 把 signal(SIGUSR1, sigalrm_fn) 換成 sysv_signal(SIGUSR1, sigalrm_fn) 之後, 會發現在 sigalrm_fn 執行期間, 送出 SIGUSR1 之後, 程式會突然結束。

這是因為古早的版本除了不會 queue 住訊號之外, 在執行的途中如果又來一個 SIGUSR1, 正在執行的 signal handler 會被中斷, 而在執行 SIGUSR1 signal handler 期間, 還會把設定好的 SIGUSR1 signal handler sigalrm_fn 換回預設行為, 而 SIGUSR1 的預設行為就是中斷程式, 所以程式就這麼結束了。

所以得在 SIGUSR1 signal handler 裡頭再次呼叫 signal, 讓 signal 不要把 SIGUSR1 signal handler 改回預設, 很奇怪的設計吧!

那我使用的 signal 行為到底是那一種呢? 很遺憾, 沒有固定的版本, glibc 不同版本的實作可能也不同, 所以還是使用 sigaction 會有比較好的可攜性, 不過 sigaction 用法就複雜多了。

現在你知道為什麼 sigaction 這麼複雜, 因為要知道這麼多東西, 才能把 sigaction 的那些參數設定好。

上述的 signal 相當於以下的 sigaction。

struct sigaction s1, old_s1;

sigemptyset(&s1.sa_mask);

s1.sa_flags = 0;

s1.sa_handler = sigalrm_fn;
sigaction(SIGUSR1, &s1, &old_s1);

如果想要在執行 signal handler 時, 可以中斷目前的 signal handler 執行, 使用以下的 flags。
s1.sa_flags = SA_NODEFER;

看起來只多了幾行, 但要理解背後的設計並不簡單, tlpi 花了 3 章在說明 signal, 但我想第一次看這 3 章的人應該不會一次就能理解。

本篇對於 signal 的敘述沒有很精確, 請參考相關專業書籍, tlpi (The Linux Programming Interface), 經典的 apue (Advanced Programming in the UNIX Environment), 最主要是想提提一些觀念, 讓大家知道 signal 不是好惹的。

我還閱讀了「Linux 内核源代码情景分析」6.4 一節, 理解 linux 怎麼實作 signal, 慢慢串起這些觀念。

windows message 機制我不熟, 但應該和 signal 中斷的方式是不一樣的。

signal 的設計太厲害了, 這麼複雜的東西到底是誰想到加入 unix 的。

2020年3月6日星期五

pthread 實作練習 (0) - user mode pthread 實作 - simple_thread

熱極則風, 花盛必謝。

看了一系列的舊文章「Re: 什麼是 multi-thread」才知道原來 thread 是不需要 os kernel 支援就可以辦到的, 之前一直以為需要 os kernel 支援 kernel thread, thread library 才有辦法實作出來, 因為我實在想不到如果 kernel 不支援 kernel thread, library 到底要怎麼支援 thread, 在 user mode 要靠「什麼」才有辦法在 2 個 function 之間切換。

噢! 當然戰文本身也是很精彩的, 這系列文章有些你來我往的回文, 只要能說出個道理, 都是能從中學到東西的。

coroutine 之前研究過, 它是主動讓出 cpu 執行權, 如果不主動讓出, 該怎麼在執行的 function 中讓出 cpu 呢? os 靠的是 timer 中斷, user mode 程式要怎麼作到類似的效果呢?

在查詢資料的過程, 找到 pthreads-1_60_beta6.tar.gz 這個 user mode pthread, 是由 Chris Provenzano 開發的 pthread 實作品。

這是搭配 linux kernel 1.X 用的 pthread library, 只支援 x86 32bit, 那時候的 linux kernel 還沒有 kernel thread 支援, 我像是挖到寶似的, 想看 Chris Provenzano 是怎麼辦到的, 本想編譯起來用 gdb 追蹤, 不過在目前的環境似乎編不起來, 我放棄了。而我追 code 能力太差, 沒有從 souce code 看出什麼端倪。

另外找到一個課程的作業 - CS170: Project 2 - User Mode Thread Library (20% of project score), 我的媽呀! 還真的有這樣的課程, 不禁為他們學生默哀, 這應該會是讓他們困擾很久的作業的吧! 這是 ucsb 的 os 課程。

ucsb 中文是 - 加州大學聖塔芭芭拉分校, 不太熟悉這間學校。

看到這些學校的作業是這麼的扎實, 記得自己的 os 課程就是教課書 - Operating System Concepts 的內容, 考試則是課本上的知識, 程度真的差太多, 實作太少了, 這樣和非本科的差距並不會太大, 如果是會寫一個 user mode thread library, 和非本科的差距就顯現出來了。

以前老師給了另外的選擇, 看 bsd source code, 不過被全班投票的結果否決了, 真的可惜。

但是該課程也沒那麼狠心, 在 Implementation 一節中, 說明的一些實作細節, 用著我的破英文很勉強的看了看, 得知了幾個關鍵。

需要使用 setjmp/longjmp, signal。

如果對 setjmp/longjmp 很陌生, 可參考 - setjmp/longjmp 實作 (x86 32 bit)。

知道這個概念之後, 相當高興, 以為可以順利寫出來, 但在我開始下手時, 卻發現困難重重, 我不知道應該在哪裡執行 setjmp, 在哪裡執行 longjmp。

ex.c

 1 void func1()
 2 {
 3   printf("1\n");
 4   printf("2\n");
 5   printf("3\n");
 6   printf("4\n");
 7   printf("5\n");
 8 }
 9
10 void func2()
11 {
12   printf("21\n");
13   printf("22\n");
14   printf("23\n");
15   printf("24\n");
16   printf("25\n");
17 }

ex.c 我該在哪裡插入 setjmp 呢? ex.c L3 ~ L7 之間嗎? 都不對阿! longjmp 應該安插在哪裡呢?

Implementation 原來還有背面, 我漏看了, 重新看過之後得到以下心得:

setjmp/longjmp - 這個用來保存 2 個 function 切換的狀態, 還需要特別保存 stack。
signal/SIGALRM - 這個就是我百思不得其解的關鍵, 使用 signal 來中斷正在執行的 function, 在 signal handler 中, 保存正在執行的 function 狀態 (使用 setjmp), 再選出一個 function, 跳去執行它 (透過 longjmp)。

而在看過一些 source code 之後, 我又得到一些心得, 需要修改 jmp_buf 的 esp, eip 欄位。

CS170: Project 2 - User Mode Thread Library (20% of project score)

Project Goals

The goals of this project are:

to understand the idea of threads
to implement independent, parallel execution within a process

Administrative Information

The project is an individual project. It is due on uesday, April 30, 2019, 23:59:59 PST (no deadline extensions or late turn ins).

Implement a basic thread subsystem in Linux user mode

The goal of this project is to implement a basic thread system for Linux. As discussed in class, threads are independent units of execution that run (virtually) in parallel in the address space of a single process (and thus, share the same heap memory, open files, process identifier, ...). Each thread has its own context, which consists of (a) the set of CPU registers and (b) a stack. The goal of a thread subsystem is to provide applications that want to use threads a set of library functions (an interface) that the application can use to create and start new threads, terminate threads, or manipulate threads in different ways.
The most well-known and wide-spread standard that defines the interface for threads on Unix-style operating systems is called POSIX threads (or pthreads). The pthreads interface defines a set of functions, a few of which we want to implement for this project. Of course, there are different ways in which the pthreads interface can be realized, and systems have implemented a pthreads subsystem both in the OS kernel and in user mode. For this project, we aim to implement a few pthreads functions in user mode (as a library) on Linux.
More specifically, for this project, we want to implement the following POSIX thread functions (prototypes and explanations partially taken from the respective man pages):

int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg);

The pthread_create() function shall create a new thread within a process. Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread. In our implementation, the second argument (attr) shall always be NULL. The thread is created executing start_routine with arg as its sole argument. If the start_routine returns, the effect shall be as if there was an implicit call to pthread_exit() using the return value of start_routine as the exit status. Note that the thread in which main() was originally invoked differs from this. When it returns from main(), the effect shall be as if there was an implicit call to exit() using the return value of main() as the exit status.

void pthread_exit(void *value_ptr);

The pthread_exit() function shall terminate the calling thread. In our current implementation, we ignore the value passed in as the first argument (value_ptr) and clean up all information related to the terminating thread. The process shall exit with an exit status of 0 after the last thread has been terminated. The behavior shall be as if the implementation called exit() with a zero argument at thread termination time.

pthread_t pthread_self(void);

The pthread_self() function shall return the thread ID of the calling thread.
The goal of this project is to implement the three functions introduced above. For more details about error handling, please refer to the respective man pages. The code for these three functions should be compiled into an object file called threads.o, which will act as our thread library. Note that your code will not contain a main routine, and hence, cannot run as a stand-alone executable. To create the thread library, simply call the compiler like this:

gcc/g++ -c -o threads.o [list of your source files]

You should include the pthreads header file (#include ) in your source(s) so that you have the definitions of types such as pthread_t or pthread_attr_t available. The object file that was created can then be combined with a (third party) application that requires threads (and hence, calls the three functions that you have implemented in your library). This is done by running:

gcc/g++ -o application [list of application object files] threads.o

To test your thread implementation, we will compile your thread library with our test application. This test application will call into your pthread functions and check whether threads are properly started and terminated.

Implementation

Implementing a user space thread library seems like a daunting task at first. This section contains a number of hints and directions on how you could approach the problem.
First, you will need a data structure that can store information about a single thread. This data structure will likely need to hold, at least, information about the state of the thread (its set of registers), information about its stack (e.g., a pointer to the thread's stack area), and information about the status of the thread (whether it is running, ready to run, or has exited). This data structure is often called a thread control block (TCB). Since there will be multiple threads active at the same time, the thread control blocks should be stored in a list or a table (array). To make it easier, you can assume that a program will not create more than 128 threads.
You will likely need a routine that initializes your thread subsystem. This init routine should be called when the application calls pthread_create for the first time. Before that, there is only one thread running (the main program), and there is not much you need to do.
Once multiple threads are running, you will need a way of switching between different threads. To do this, you could use the library functions setjmp and longjmp. In a nutshell, setjmp saves the current state of a thread into a jmp_buf structure. longjmp uses this structure to restore (jump back) to a previously saved state. You will find these functions very useful, since they do exactly what is needed when switching from one thread to another and later resuming this thread's execution. More precisely, you can have the currently executing thread call setjmp and save the result in the thread's TCB. Then, your thread subsystem can pick another thread, and use the saved jmp_buf of that thread together with longjmp to resume the execution of the new thread.
The process of picking another thread is called scheduling. In our system, we want to have a very simply scheduler. Just cycle through the available threads in a round robin fashion, giving an equal, fair share to each thread.
Following the discussion about setjmp and longjmp, there is one question that should immediately arise: How can we force the thread of a program to call setjmp every once in a while (and thus, giving up control of the CPU)? In particular, application developers do not need to know about your thread implementation, so it is unlikely that they will periodically call setjmp and allow your scheduler to switch to a new process. To solve this issue, we can make use of signals and alarms. More precisely, we can use the ualarm or the setitimer function to set up a periodic timer that sends a SIGALRM signal every X milliseconds (for this project, we choose X to be 50ms). Whenever the alarm goes of, the operating system will invoke the signal handler for SIGALRM. So, you can install your own, custom signal handler that performs the scheduling (switching between threads) for you. For installing this signal handler, you should use sigaction with the SA_NODEFER flag (read the man page for sigaction for details). Otherwise (e.g., when using the deprecated signal function, alarms are automatically blocked while you are running the signal handler. This is something that we clearly do not want for this project.
Note that we require that your thread system supports thread preemption and switches between multiple threads that are ready. It is not okay to run each individual thread to completion before giving the next one a chance to execute.
A final problem that needs to be addressed is how to create a new thread. We have previously discussed means to switch between two threads once they have been created. However, there must also be a mechanism to create a new thread "from scratch." To this end, the system has to properly initialize the TCB for the new thread. This means that a new thread ID needs to be created. In addition, the system has to allocate a new stack. This can be done easily using malloc. For this project, we want to allocate for each thread a stack of 32,767 bytes. The last step is to initialize the thread's state so that it "resumes" execution from the start function that is given as argument to the pthread_create function. For this, we could use setjmp to save the state of the current thread in a jmp_buf, and then, modify this jmp_buf in two important ways. First, we want to change the program counter (the EIP) to point to the start function. Second, we want the stack pointer (the ESP) to point to the top of our newly allocated stack.
To modify the jmp_buf directly, we have to first understand that it is a very operating system and processor family-specific data structure that is typically not modified directly. On the CSIL machines, we can see the definition of the jmp_buf here in this header file: /usr/include/bits/setjmp.h. Given that we work on a 64-bit machine on CSIL, the jmp_buf is defined as an array of 8 long (64-bit) integers. Moreover, libc (x86_64/jmpbuf-offsets.h) defines the following constants as the eight integer elements of this structure:

#define JB_RBX   0
  #define JB_RBP   1
  #define JB_R12   2
  #define JB_R13   3
  #define JB_R14   4
  #define JB_R15   5
  #define JB_RSP   6
  #define JB_PC    7

We can see that the stack pointer (RSP) has index 6 and the program counter (PC) has index 7 into the jmp_buf. This allows us to easily write the new values for ESP and EIP into a jmp_buf. Unfortunately, there is a small complication on the Linux systems in the lab. These machines are equipped with a libc that includes a security feature to protect the addresses stored in jump buffers. This security feature "mangles" (i.e., encrypts) a pointer before saving it in a jmp_buf. Thus, we also have to mangle our new stack pointer and program counter before we can write it into a jump buffer, otherwise decryption (and subsequent uses) will fail. To mangle a pointer before writing it into the jump buffer, make use of the following function:

static long int i64_ptr_mangle(long int p)
   {
        long int ret;
        asm(" mov %1, %%rax;\n"
            " xor %%fs:0x30, %%rax;"
            " rol $0x11, %%rax;"
            " mov %%rax, %0;"
        : "=r"(ret)
        : "r"(p)
        : "%rax"
        );
        return ret;
   }

We are almost there. Now, we just have to remember that the start routine of every new thread expects a single argument (a void pointer called arg). Also, when the start routine returns, it should perform an implicit jump to pthread_exit. We first have to understand how arguments are passed between functions. When you think back to your compiler class, you might remember that arguments were passed on the stack. Indeed, this was the way it was done on "old" 32-bit x86 machines. However, we now have 64-bit machines, and these machines have more registers. To improve the performance of function calls, compiler writers decided to leverage these additional registers. More specifically, the function calling convention was changed, and the first six arguments are passed in registers. Only the 7th argument and onwards are passed on the stack. This page provides some more details about this. Now, the question is how we can pass the required argument to the start function of the thread? One possible solution is to introduce a wrapper function. This wrapper function could grab the argument from the TCB and then invoke the actual start function, passing it the required argument. In addition, the wrapper function also provides a convenient way to handle the case when the start function returns and you have to call pthread_exit. For this, you can simply invoke pthread_exit after the thread returns from the previous call to the start function. When you use a wrapper function, you need to make the program counter (EIP) point to this function instead of the actual start function. And you need to find a way to make the address of the start function available to the wrapper.
Once your stack is initialized and the (mangled) stack pointer (ESP) and program counter (EIP) are written to the jmp_buf, your new thread is all ready to go, and when the next SIGALRM arrives, it is ready to be scheduled!
This is a complicated project. Take it one step at a time, and make sure that everything works before moving on. Also, the debugger (gdb) is your friend. Expect that the code will crash. At this point, use the debugger to understand what is going on. For this project, you might want to look at the gdb commands disassemble and stepi. For example, your code might crash when you invoke longjmp for the first time to switch to a new thread, and you don't understand why. To debug this problem, you could set a breakpoint in __longjmp (say yes when the debugger asks whether you want to "make breakpoint pending on future shared library load"). Since __longjmp is a libc library function, you don't have the source directly available to the debugger. However, you can issue the disassemble command to show the assembly code for this (short) function. And you can use stepi to step forward for a single machine (assembly) instruction. This allows you to check if you restore the correct register values and also see the address where you end up jumping to. You can use the commands info registers to list the content of all the CPU registers, and you can use x [address] to print out the value that is stored in the memory at [address]. This is nice when you want to see where your stack pointer points to.

Deliverables

Please follow the instructions below exactly!

We use gradescope to manage your project submissions and to communicate the results back to you. You will submit all files that are part of your project via the gradescope web interface.
All your files must be in a directory named threads. The name of the threads library that we will test must be threads.o, and the POSIX function implementation must be done in C/C++. Of course, you cannot leverage any of the existing pthread library code to implement your thread library.
All files that you need to build your library must be included (sources, headers, makefile) in that folder. We will just call make and expect that the object file threads.o is built from your sources. Please do not include any object or executable files.
Gradescope does support built-in autograding, but, currently, we do not intend to use it. Instead, we will test your projects in our own environment. So, do not worry if you don't get immediate feedback or if the system tells you that the autograder is not running.
Your project must compile on a CSIL machine. If you worked on a Windows machine or your laptop at home, then make sure it still works on CSIL or modify it appropriately!
Include a README with this project. Explain what you did in the README. If you had problems, tell us why and what.

其實在看提示之前, 我有想到應該在 signal handler 使用 setjmp/longjmp, 只是我被自己迷惑了, 因為在 signal handler 的 stack, 已經不是原本程式的 stack, 為了跳到 signal handle, kernel 對原本的 stack 做了修改, 我自以為在這裡保存這個 stack 是沒有用的, 是我自己想太多了。

基本概念是這樣, 假設我們有 func1, func2 這 2 個 function, func1 先執行, 使用 alarm signal, 讓 5ms 發動一次 alarm signal, 5ms 就會呼叫一次 signa handler, 這時候就可以在這裡將目前執行的 function - func1 setjmp 起來, 然後使用 longjmp 跳到 func2 去執行, 這樣就完成了 5ms 切換 func1, func2, 就達到了 user mode thread 的效果。

這個概念和 os 的 process 切換是類似的。

而對 func1, func2 來說, 需要有各自的 stack, 這樣才不會有相互蓋到 stack 的問題, 使用 setjmp 來保存 register 資料外, 還需要提供一個 stack 空間, 所以要把 jmp_buf 的 esp 欄位改到預先準備好的 stack 空間, simple_thread.c L112, L117。

另外要修改 jmp_buf 的另外一個欄位是 eip, 需要把它指到 func1, func2 的開頭, 這樣一來, longjmp 就會從 func1, func2 的開頭執行。

而不幸的是, jmp_buf 和執行的 cpu 有關係, 所以得要搞懂這個平台的 jmp_buf 是怎麼安排這些暫存器的資料結構。

Implementation 還提供了另外一個重要的訊息, 由於為了安全, jmp_buf 都會被用一個演算法保護起來, 避免被亂改, 所以 Implementation 提供了一段程式碼幫助同學處理這部份。

我沒有用這些方式, 我懶得搞懂這些, 我只想搞懂 user mode thread 怎麼做而已, 所以準備了自己的 setjmp/longjmp, 叫做 my_setjmp, my_long_jmp, 當然對應的 jmp_buf 就是 my_jmp_buf。

再來還剩下一個難題: 在 signal handler 發動自己準備的 my_longjmp 之後, 會發現之後的 signal handler 不會再次被呼叫了, 這裡存在一個很難發現的魔法, 需要對 singal 是怎麼實作有點了解才會知曉或是閱讀相關介紹 signal 書籍, tlpi (The Linux Programming Interface) 那本就不錯, 經典的 apue (Advanced Programming in the UNIX Environment) 當然也是。

如果想知道 signal 是怎麼實作的話, 可以參考「Linux 内核源代码情景分析」6.4 一節。

總之在 signal handler 被呼叫之後, 預設情形這個 signal 會被 block 起來, 直到 signal handler 返回之後, 才會被 unblock, 這時候, 同個 signal 來了之後, 這個 signal handler 才會再次發動。

但是我們的 signal handler 並不會正常返回, 因為我們用 longjmp 跳到 func1 或是 func2, 所謂的 signal handler 正常返回是指在 signal handler return 之後, 還會呼叫 sigreturn (man sigreturn), 這時候會從 user mode 再次切回 kernel mode, 然後才有機會把原來被中斷的地方再次安插回原本的 stack, 如此一來, 下次這個 process 執行的時候, 才會從被中斷的地方繼續執行。所以被 block 的 SIGUSR1 會被一直 block 住, 導致之後的收到 SIGUSR1 後, 都不會再執行 signal handler。

所以要在 simple_thread.c 加入 L62 ~ L64, unblock SIGUSR1。

但是如果你是使用 libc 的 setjmp/longjmp, sigsetjmp/siglongjmp 可能不需要自己 unblock SIGUSR1, 系統的 setjmp/longjmp 可能會處理被 block 的 signal, 如果用 _setjmp/_longjmp 就不會處理 signal, 類似我用自己的 my_setjmp, 這時候就要自己 unblock SIGUSR1。

這邊會遇到進階的 signal 議題, 例如: signal handler 可以被中斷嗎? 在執行 signal hadnler 時, 如果有 2 個 signal 送過來, signale handler 會再次執行 2 次嗎? 如果對這些議題不熟也沒關係, 以這個範例來說, SIGUSR1 signal handler 在執行的時候, 如果再次收到 SIGUSR1, 會等到原本的 SIGUSR1 signal handler 做完, 然後才會再次執行。

這是可以設定的, 那一種作法好呢? 我還沒有答案。

而如果在執行 SIGUSR1 signal handler 期間收到 2 次以上的 SIGUSR1, 之後只會再執行 SIGUSR1 signal handler 一次, 這樣的行為讓你有點擔心吧, 這表示很有可能 func1, func2 的切換行為有可能會漏掉幾次, 是的, 沒辦法, 傳統 signal 就是這麼「不可靠」。

signal 相關問題可參考 - linux/unix signal 議題

疑! 剛剛不是說要用 SIGALRM, 怎麼變成 SIGUSR1, 因為後來發現用 SIGUSR1 比較好測試, 就改用這個了。

程式在 setjmp func1, func2 之後, 會使用 longjmp 執行 func2, 再來就是透過 signal handler 來切換到 func1, 再來又透過 signal handler 再次切換到 func2, 依序下去。

simple_thread.c

  1 #include <stdio.h>
  2 #include <stdlib.h>
  3 #include <unistd.h>
  4 #include <sys/time.h>
  5 #include <signal.h>
  7 #include "my_setjmp.h"
  8 
 11 
 12 #define BUF_SIZE 32768
 13 char func1_stack[BUF_SIZE+64];
 14 char func2_stack[BUF_SIZE+64];
 16 
 17 my_jmp_buf th1;
 18 my_jmp_buf th2;
 21 
 22 my_jmp_buf *cur_th;
 23 my_jmp_buf *next_th;
 24 
 25 
 26 void func1()
 27 {
 28   while(1)
 29   {
 30     printf("1");
 31     printf("2");
 32     printf("3");
 33     printf("4");
 34     printf("5");
 35     printf("6");
 36     printf("7");
 37     printf("8");
 38     printf("9");
 39     printf("a");
 40     printf("\n");
 41   }
 42 }
 43 void func2()
 44 {
 45   while(1)
 46   {
 47     printf("21 ");
 48     printf("22 ");
 49     printf("23 ");
 50     printf("24 ");
 51     printf("25 ");
 52     printf("\n");
 53   }
 54 }
 55 
 57 void sigalrm_fn(int sig)
 58 {
 59   sigset_t sigs;
 60   /* Unblock the SIGUSR1 signal that got us here */
 61 #if 1
 62   sigemptyset (&sigs);
 63   sigaddset (&sigs, SIGUSR1);
 64   sigprocmask (SIG_UNBLOCK, &sigs, NULL);
 65 #endif
 66   printf("got USR1!\n");
 71 #if 1
 72     if (cur_th == &th1)
 73     {
 74       printf("2\n");
 75       next_th = &th2;
 76     }
 77     else
 78     {
 79       printf("1\n");
 80       next_th = &th1;
 81     }
 82 #endif
 83 
 86   if (my_setjmp(*cur_th) == 0)
 87   {
 88     cur_th = next_th;
 91     my_longjmp(*next_th, 1);
 92   }
 93   else
 94   {
 95     return;
 96   }
104   return;
105 }
106 
107 int main(int argc, char *argv[])
108 {
109   signal(SIGUSR1, sigalrm_fn);
110   my_setjmp(th1);
111   th1[0].eip = (unsigned long)func1;
112   th1[0].esp = (unsigned long)(func1_stack + BUF_SIZE);
113 
114   if (my_setjmp(th2) == 0)
115   {
116     th2[0].eip = (unsigned long)func2;
117     th2[0].esp = (unsigned long)(func2_stack + BUF_SIZE);
118     cur_th = &th2;
119     my_longjmp(th2, 1);
120   }
131 
132   while (1) 
133     pause();
181   return 0;
182 }

func1 印出 123456789a, function 印出 21 22 23 24 25, 可以從以下影片看出, 當送出 SIGUSR1, func1 和 func2 會相互切換, 基本上算是成功了。當然離完成 pthread 這樣的 library 還很遠, 而且還有很多沒有考慮到, 但至少邁出一小步了。而我的「目的」當然也只是想知道 user mode thread library 是怎麼做的, 也不是想寫出一個 pthread library, 有興趣的朋友可以繼續下去, 完成 ucsb 的作業。

例如: main thread 並沒有繼續下去, 這個版本只會在 func1, func2 交錯執行, main 之後的程式再也不會執行。當然還有一堆不完整的東西沒有實作, 後來我開始實作某些 pthread function, 才發現我漏了很多東西, 而且這些東西難度也不低。只有這個主要觀念離簡單 pthread 實作都還很遠。

可以用以下指令送出 SIGUSR1

killall -s SIGUSR1 simple_thread

整個程式從開始到完成期間: 20200220 ~ 20200226, 20200312 補上 x86_64 setjmp/longjmp 的版本。

CS170 作業可不是只要求這樣, 再來還需要有 atomic 的操作, 要寫支援 mutex 這個作業, 是不是又感覺害怕了。我搞錯了, 不需要到 atomic 的操作, 只要類似關中斷的行為即可, 這個難度就下降很多。

這個作業都是基本中的基本, 但是基本問題可不等於簡單問題, 這些觀念過了在久, 都不會改變的, 把心思花在上頭並不會隨著時間而白費。

好了, 再來就是真的切換 thread, 只要把 SIGUSR1 換成 SIGALRM, 設定成 1 秒切換, 這樣 func1, func2 就會在 1 秒內切換, 你說 1 秒太慢了, 也可以換成 ms, 就留給各位朋友當作業了。

以下影片展示 gdb debug, 同時顯示 c/asm source code 和 asm 的 debug 視窗, 可以看到 setjmp/longjmp 是怎麼運作的。

ctrl+x 2 開啟 2 個視窗
layout asm 開啟 asm debug 
ctrl+x o 可以切換視窗
最上方是 source code
中間是反組譯的組合語言
最下方是 gdb 指令區

我的 simple 系列又多了一筆 - simple_thread, 不過 simple 系列一點都不 simple。
soure code:
https://github.com/descent/simple_thread

user mode thread implementation:

ref:

以下資訊感謝 KILLE 提供:
這是另外一個簡單的實作 (感覺像是修 CS170 的學生):
https://github.com/DennisZZH/User-Mode-Thread-Library

另外可參考的實作:
https://github.com/prabhendu/operating_system_1/blob/master/gtthread.c
https://github.com/citrix123/Uthreads/blob/master/uthread.c

徹底理解setjmp/longjmp並DIY一個簡單的協程
https://www.twblogs.net/a/5ccbd4fabd9eee1ac30bd85c

彻底理解setjmp/longjmp并DIY一个简单的协程
https://blog.csdn.net/dog250/article/details/89742140

2020年2月29日星期六

bash history keybind

bash 有一個 hot key 可以往前搜尋按過的命令, ctrl+r, 但是要往後找好像做不到, 朋友告知了一個好作法, 可以很方便的尋找按過的命令, 需要安裝 fzf, fd。

apt-get install fzf fd-find

list 1. ~/.bashrc

1 [[ -f /usr/share/doc/fzf/examples/key-bindings.bash ]] && . /usr/share/doc/fzf/examples/key-bindings.bash
2 [[ -f /usr/share/doc/fzf/examples/completion.bash ]] && . /usr/share/doc/fzf/examples/completion.bash
3 export FZF_DEFAULT_COMMAND="fd -H --exclude={.git,.hg}"
4 export FZF_DEFAULT_OPTS="--layout=reverse --select-1 --exit-0 --bind 'ctrl-o:execute(vim {})+abort'"
5 export FZF_CTRL_T_COMMAND="$FZF_DEFAULT_COMMAND -t f"
6 export FZF_CTRL_T_OPTS="--tabstop=2 --bind ?:toggle-preview --preview '(bat --number --color=always {}) 2> /dev/null | head -100' --preview-window=right:hidden"
7 export FZF_ALT_C_COMMAND="fd -H -t d --exclude={.git,.hg}"
8 export FZF_ALT_C_OPTS="--layout=reverse --select-1 --exit-0"

再把 list 1 的指令貼在 ~/.bashrc, 就可以有方便的 ctrl+r 搜尋功能, fig 1 顯示在按下 ctrl+r 之後, 可以用方向按鍵來選擇要執行的命令。

ctrl+r 再按下搜尋的關鍵字之後, 就可以看到 fig 1 的列表。
ex: ctrl + r vi, 就會顯示出之前按下 vi 相關的命令。

fig 1. ctrl + r

2019年4月20日星期六

mv: Directory not empty

mv 是 linux 移動檔案或是目錄的一個工具程式, 但是在某個情境下, mv 會無法完成這件事情, 並發出以下訊息。

command unable to remove target: Directory not empty

fig 1 /tmp/o1, /tmp/b1 的目錄內容

當 o1/1 目錄有個檔案時, 以下指令就會失敗, 就算 o1/1/15 和 b1/1/ 目錄裡頭的檔案並沒有檔名重複。

descent@debian64:o1$ mv 1 /tmp/b1/
mv: cannot move ‘1’ to ‘/tmp/b1/1’: Directory not empty

ref:

ref 3 說明了檔案覆蓋情形, 那麼目錄呢?
ref 2 提供了 -b 參數, 但這可能不是我們要的。

list 1 info mv

Make a backup of each file that would otherwise be overwritten or
     removed.  Without this option, the original versions are destroyed.
     Use METHOD to determine the type of backups to make.  When this
     option is used but METHOD is not specified, then the value of the
     ‘VERSION_CONTROL’ environment variable is used.  And if
     ‘VERSION_CONTROL’ is not set, the default backup type is
     ‘existing’.

     ‘none’
     ‘off’
          Never make backups.

     ‘numbered’
     ‘t’
          Always make numbered backups.

     ‘existing’
     ‘nil’
          Make numbered backups of files that already have them, simple
          backups of the others.

     ‘simple’
     ‘never’
          Always make simple backups.  Please note ‘never’ is not to be
          confused with ‘none’.

list 1 是 info mv 的部份內容, mv --backup[=CONTROL] 的 CONTROL 就是 list 1 那些值。

descent@debian64:o1$ mv 1 /tmp/b1/

通常我們想要的結果是希望 /tmp/o1/1/15 被移動到 /tmp/b1/1/, 但是 list 1 的那些 option 都做不到這個效果, -b 會把 /tmp/b1/1/ 改成 tmp/b1/1~/, 然後吧 o1/1 移動到 /tmp/b1/, -b 相當於 mv --backup=simple 1 /tmp/b1/。

descent@debian64:o1$ ls /tmp/b1/
0  1  1~  2  2.txt  3

這不是我們要的結果, 似乎只能改用 cp or rsync or tar 來完成這個效果, 再把原來的目錄刪除。

2018年6月15日星期五

在 linux 下處理 windows/dos 格式的中文檔案

在 linux 下一般都是 utf8/unix 格式的編碼, 那處理中文 big5/windows/dos 格式時會遇到什麼問題?

會影響中文正確的程式很多, 這邊以我自己遇到的程式來說明:

終端機
vim
git/hg diff

我用的是 mate-terminal, 支援 utf8/big5 模式, 你一定猜想, 要正確看 windows/big5 檔案, 應該會需要切換到 big5 編碼吧, 答案 "是", 也 "不是", 要看你在終端機用什麼軟體看 windows/big5 檔案。

t1.txt 是一個 windows/big5 檔案, 內容是 "施逼\r\n"

cat
mate-terminal 使用 utf8
descent@debian64:~$ cat /tmp/t1.txt
�I�G

mate-terminal 使用 big5
descent@debian64:~$ cat /tmp/t1.txt
施逼
descent@debian64:~$

less
mate-terminal 使用 big5

less.txt

1
2 <AC>I<B9>G
3 /media/vbox_share/tmp/t1.txt (END)

mate-terminal 使用 utf8

less.txt

1
2 <AC>I<B9>G
3 /media/vbox_share/tmp/t1.txt (END)

mate-terminal 使用 big5
export LESSCHARSET=latin1
施逼
/media/vbox_share/tmp/t1.txt (END)

mate-terminal 使用 utf8
export LESSCHARSET=latin1
�I�G
/media/vbox_share/tmp/t1.txt (END)

git/hg diff 同 less

好了, 重頭戲 vim 來了, 有好幾個組合可以正確顯示 windows/big5 中文, 我要用的組合是:

mate-terminal 使用 utf8
~/.vimrc
set fileencodings=utf-8,big5,gb18030
encoding=utf-8

這樣 vim 就會將 t1.txt 用 big5 的編碼打開, 然後轉成 utf8 編碼, 而 termencoding 我沒有設定, 所以和 encoding 一樣是 utf8, 由於 mate-terminal 使用 utf8, 所以一切配合的很好, windows/big5 中文檔案正確顯示, 沒有亂碼。

如果 vim 誤判編碼的話, 使用 :e ++enc=big5, 會讓 vim 重新讀取檔案, 並以 fileencoding = big5 編碼開啟檔案。

使用 vim 寫入中文之後, 也會將這個中文從內部編碼 utf8 轉回原本的 big5 中文編碼, 用 windows notepad 開啟, 中文可以正常顯示。

locale 設定全都是 en_US.UTF-8, 沒有設定為 zh_tw.BIG5。

descent@debian64:tmp$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

vim 相關編碼設定請參考 ref 1。

另外一個問題是 \r \n, cr, lf 換行問題。

cr: \r, 0xd, ^M
lf: \n, 0xa

windows 用 \r\n 來表示換行, linux 用 \n 表示換行, mac 用 \r 表示換行。
vi 很貼心的會偵測檔案格式, 如果是 windows 格式, 會用 windows 換行存檔, 基本上沒什麼問題, 不過在使用 git diff 時, 就會看到多一個 ^M, 就是 cr, git 可以設定讓這些換行符號不當成差異顯示出來。

為什麼以我一個 linux 使用者會有機會遇到 windows/big5 中文檔案呢? 說來話長, 就不說了。

ref:

2017年5月26日星期五

linux samba 相關操作

最近 (20170513) WannaCry 勒索病毒搞的人心惶惶, smb protocol 似乎有什麼問題被拿來利用了, 不過還是紀錄一下 smb 相關用法。

mount samba filesystem in linux

# mkdir /mnt/cifs
# mount -t cifs //server-name/share-name /mnt/cifs -o username=shareuser,password=sharepassword,domain=nixcraft
# mount -t cifs //192.168.101.100/sales /mnt/cifs -o username=shareuser,password=sharepassword,domain=nixcraft
mount -t cifs //192.168.121.32/descent /mnt/cifs/  -o username=myname,password=mypwd

ref: http://www.cyberciti.biz/faq/linux-mount-cifs-windows-share/

samba access symbolic link
ref: http://www.ubuntu-tw.org/modules/newbb/viewtopic.php?post_id=128058
edit smb.conf

follow symlinks = yes
unix extensions = no
wide links = yes

使用 smbclient copy smb server 檔案

ref: http://www.linuxso.com/command/smbclient.html
--user username%passwd, 此例子的密碼是空字串 ''

get ＜remote file name＞[local file name]
Copy the file called remote file name from the server to the
machine running the client. If specified, name the local copy local
file name. Note that all transfers in smbclient are binary. See
also the lowercase command.

from samba server copy to local
$ smbclient -c "get e.iso /tmp/e.iso" \\\\127.0.0.1\\smb_test --user username%''

put＜local file name＞ [remote file name]

Copy the file called local file name from the machine running the

client to the server. If specified, name the remote copy remote
file name. Note that all transfers in smbclient are binary. See
also the lowercase command.

from local copy to samba server copy
$ /usr/bin/smbclient -c "put /tmp/exam.d e.txt" //127.0.0.1/smb_test/ --user username%''

2017年4月1日星期六

使用 linux cgroup 來限制 firefox, chrome 所使用的資源

firefox, chrome 是不是常常把 linux 系統的資源耗盡, 讓你的桌面環境很頓。用 cgroup 來對付這兩個吃資源的程式。

一開始找到這篇《Running Firefox in a cgroup (using systemd)》, 不過他不重要了, 不需要看, 因為要使用 systemd 搭配一堆我看不懂的指令, 都不知道自己在打什麼。

以下介紹的方式是直接操作 cgroup 這個虛擬的檔案系統來設定 cgroup。

mount -t cgroup -o memory memcg /media/cgroup
cd /media/cgroup
mkdir browser
cd browser
echo 1536M > memory.limit_in_bytes # 設定這個 group 只能使用 1536M 的記憶體
echo 19654 > tasks # 設定某個終端機 shell 的 pid
echo $$ 可以得知目前終端機 shell 的 pid, 把那個值 echo 到 task

從那個終端機 shell 執行的程式都會被限制在 1536M 的記憶體, 我從這個終端機執行 firefox, google-chrome 就會受限在 1536M 的記憶體。

mount -t cgroup -o blkio cgroup /media/block/
cd /media/block
mkdir compiler
cd compiler
echo 250 > blkio.weight # 設定這個 group IO 的 schedule 參數值, 這個值越小, 越不容易被 IO schedule 選到執行
echo 19654 > tasks # 設定某個終端機 shell 的 pid

這樣整個 linux 桌面就不會被這 2 個瀏覽器卡住了, 不過有時候會換成這 2 個瀏覽器卡卡的, 請自行調整這些設定值。

ref:
linux kernel hacks (繁體中文版) chapter 2

2017年3月26日星期日

使用 gdb+qemu 來執行/除錯 raspberry pi linux kernel

the 1st edition: 20150408
the 2nd edition: 20150711 add rpi2 part, Initramfs source files conf
the 3rd edition: 20151001 /dev/ram0
the 4th edition: 20170320 使用 4.4.y linux kernel

看了這個《Linux 内核分析》課程之後, 我知道可以用 qemu + gdb 來 debug linux kernel。

課程所提供的 rootfs 在這裡, 很精簡, 對學習很有幫助:

git clone  https://github.com/mengning/menu.git

我想如法泡製在 raspberry pi 上, ~~因為我沒有實體的 pi 開發版~~ (後來有了), 所以用這方法來 trace linux kernel。不過要先解決如何編譯 linux kernel for qemu/pi。

《raspberry pi Kernel Building》這份文件有教學, 不過編譯出來的 kernel 無法在 qemu 上執行, 要參考這篇《build raspberrypi kernel for qemu》(需要 patch 並選擇一些選項)

舊的 xecdesign.com 資料已經不存在了, 也無法適用於 4.4.y 的 kernel, https://github.com/dhruvvyas90/qemu-rpi-kernel 這裡提供的 linux-arm.patch 可以讓 rpi 4.4.y 的 kernel 在 qemu 中啟動, tools/build-kernel-qemu 提供了 script 自動編譯 kernel。

~~wget http://xecdesign.com/downloads/linux-qemu/linux-arm.patch~~
patch -p1 -d linux/ < linux-arm.patch

記得加入 debug option, 這樣編譯出來的 kernel 才有 debug symbol, 可以配合 gdb 做 source code level debug。

add debug option

kernel hacking—>
   Compile-time checks and compiler options ->
     [*] compile the kernel with debug info

~~這是我的 linux kernel config 和下載的 patch:~~
~~config and patch~~

https://github.com/descent/linux/tree/rpi-4.4.y-qemu 有我 patch 好的 kernel, qemu-config 是給 qemu 用的 config file。

測試環境:

git clone https://github.com/raspberrypi/linux
git commit 8362c08dfc8dd9e54745b3f1e5e4ff0a1fb30614

再來的問題是, 如何載入 initrd?

和 x86 的版本有點不同, 需要 -append 加入參數,《Compiling Linux kernel for QEMU ARM emulator》提供了一個辦法, 不過 -append 裡頭的參數還是無法讓我正常使用《Linux内核分析》裡頭介紹的 rootfs initrd image, 其文章提供的 hello rootfs initrd image 當然也無法使用, 最後找到以下指令。

qemu-system-arm -M versatilepb -cpu arm1176 -kernel /media/2/pi/linux/arch/arm/boot/zImage -m 128 -initrd rootfs  -append "initrd=rootfs"
qemu-system-arm -M versatilepb -cpu arm1176 -kernel /media/2/pi/linux/arch/arm/boot/zImage -m 128 -initrd m/rootfs.img  -append "initrd=rootfs.img"

menuos for raspberry pi in qemu

寄件者 ??

雖然有了 arm 的 debug 環境, 不過我還有些疑惑? linux 被載入的位址為何? 固定在那個位址嗎? 它可以被移動到其他位址嗎?

x86 有個 relocatable kernel 的選項。那他怎麼做 relocation 的動作?

x86 linux option

1 Processor type and features
2   (0x1000000) Physical address where the kernel is loaded
3   [*] Build a relocatable kernel

對應的變數是:
CONFIG_RELOCATABLE=y

arm 版本沒有 relocatable kernel 的選項, 不過有個 AUTO_ZRELADDR=y, 在 Enable "AUTO_ZRELADDR" support under "Boot" options, 好像是等同於 RELOCATABLE。

The Linux Kernel: Configuring the Kernel Part 5 提到:

This next kernel option (Build a relocatable kernel (RELOCATABLE)) allows the kernel to be placed somewhere else in the memory. The kernel file will be 10% larger,

深入探索 Kdump，第 3 部分: Kdump 原理探秘
可重定位內核（relocatable kernel）
可重定位內核的意義

在 kdump 出現之前，內核只能從一個固定的物理地址上啟動。這對 kdump 來說是一種限制。因為為了收集生產內核的內存鏡像，捕獲內核不能從生產內核使用的啟動地址上啟動。因此就需要另編譯一個從一個不同的地址啟動的內核來作捕獲內核。這就是為什麼 RHEL5 中有一個包叫 kernel-kdump 的原因。技術的創新往往來自對方便的追求。開發人員為了不用費心多編譯一個內核，為 kernel 實現了「可重定向」這個特性。
實現原理

x86_64: 運行時修改 text 段及 data 段的眏射

kernel 在啟動以後，會檢測自己被加載到了什麼位置。然後根據這個來更新自己的內存頁表以反映 kernel 的 text 段和 data 段中虛擬地址與物理地址之間正確的映射關係。

i386: 使用預先生成的重定位信息

i386 中的 text 和 data 段是已經寫死的線性映射區的一部分，要想使用修改頁表的辦法支持重定向是比較困難的。於是在編譯內核時，另生成一份所有需要重定位的 symbol 的位置信息，放進 bzimage 格式的內核中。內核啟動解壓縮後，根據加載的地址和這份表來時行重定位。

powerpc: 將 vmlinuz 鏈接為「position-independent executable」形式

與 x86 體系不同，在 powerpc 體系中，/boot/vmlinuz 並不是一個 bzimage 格式的文件，它就是一個 ELF 格式的文件，而且啟動機理也不盡相同。因此，在 powerpc 上主要是利用了「位置無關可執行」格式這一成熟技術來實現可重定位。

以下是我的 qemu command, 搭配 linux/vmlinux 就可以 gdb single step linux kernel。

qemu-system-arm -M versatilepb -cpu arm1176 -kernel linux/arch/arm/boot/zImage -m 128 -initrd m/rootfs.img  -append "initrd=rootfs.img" -s -S

不過我想從第一個指令開始看起, 無法成功, 只好照著課程步驟, 把 break point 設在 start_kernel, 可以正常 debug, 和課程提到的是一樣的結果。以下是開機時暫存器的值。

gdb.sh 示範如果使用 gdb 連上 qemu:

gdb.sh

 1 descent@debian64:linux-4.4.1$ arm-linux-gnueabihf-gdb linux/vmlinux
 2 GNU gdb (Debian 7.10-1+b1) 7.10
 3 Copyright (C) 2015 Free Software Foundation, Inc.
 4 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 5 This is free software: you are free to change and redistribute it.
 6 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 7 and "show warranty" for details.
 8 This GDB was configured as "x86_64-linux-gnu".
 9 Type "show configuration" for configuration details.
10 For bug reporting instructions, please see:
11 <http://www.gnu.org/software/gdb/bugs/>.
12 Find the GDB manual and other documentation resources online at:
13 <http://www.gnu.org/software/gdb/documentation/>.
14 For help, type "help".
15 Type "apropos word" to search for commands related to "word"...
16 Reading symbols from vmlinux...done.
17 (gdb) target remote localhost:1234
18 Remote debugging using localhost:1234
19 0x0000fff0 in ?? ()
20 (gdb) b start_kernel
21 Breakpoint 1 at 0xc1846740: file init/main.c, line 498.
22 (gdb) c
23 Continuing.
24 
25 Breakpoint 1, start_kernel () at init/main.c:498
26 498 {
27 (gdb)

rpi

 1 (gdb) i r
 2 r0             0x0    0
 3 r1             0x0    0
 4 r2             0x0    0
 5 r3             0x0    0
 6 r4             0x0    0
 7 r5             0x0    0
 8 r6             0x0    0
 9 r7             0x0    0
10 r8             0x0    0
11 r9             0x0    0
12 r10            0x0    0
13 r11            0x0    0
14 r12            0x0    0
15 sp             0x0    0x0 <__vectors_start>
16 lr             0x0    0
17 pc             0x0    0x0 <__vectors_start>
18 cpsr           0x400001d3     1073742291

使用 Initramfs source files conf 的選項可以將 initramfs 直接編入 kernel, 這樣就不需要使用 qemu 的 -initrd。

Initramfs source files conf

1 General setup  --->
2   [*] Initial RAM filesystem and RAM disk (initramfs/initrd) support
3   (/media/2/linux_kernel/m/rootfs) Initramfs source file(s)

重要:
不過需要將 /dev/console 複製到 Initramfs source file (我的例子是: /media/2/linux_kernel/m/rootfs), 否則開機會有 console 相關的錯誤訊息。

arm(rpi) 的 qemu 指令

qemu-system-arm -M versatilepb -cpu arm1176 -kernel arch/arm/boot/zImage -m 128

descent@debian32:~/linux-3.10.27/arch/x86/boot$ ls ~/git/FS/target2/dev/
console  mtd1   mtd13  mtd3  mtd7       mtdblock1   mtdblock13  mtdblock3  mtdblock7  ptmx    rtc0  tty2  tty6   ttyS1
kmem     mtd10  mtd14  mtd4  mtd8       mtdblock10  mtdblock14  mtdblock4  mtdblock8  pts     tty   tty3  tty7   ttyS2
mem      mtd11  mtd15  mtd5  mtd9       mtdblock11  mtdblock15  mtdblock5  mtdblock9  ram0    tty0  tty4  tty8   urandom
mtd0     mtd12  mtd2   mtd6  mtdblock0  mtdblock12  mtdblock2   mtdblock6  null       random  tty1  tty5  ttyS0  zero

init 放在 /init, 有些環境可能要放在 /sbin/init

若要在終端機秀出 qemu 訊息, 使用以下指令, 重點在指定 -nographic -append "console=ttyAMA0 這兩個參數。

qemu-system-arm -M versatilepb -cpu arm1176 -kernel arch/arm/boot/zImage -m 128 -nographic -append "console=ttyAMA0"

qemu rpi message for terminal

  1 descent@debian64:linux$ qemu-system-arm -M versatilepb -cpu arm1176 -kernel arch/arm/boot/zImage -m 128 -nographic  -append "console=ttyAMA0"
  2 
  3 (process:31080): GLib-WARNING **: /build/glib2.0-94amRy/glib2.0-2.50.1/./glib/gmem.c:483: custom memory allocation vtable not supported
  4 pulseaudio: set_sink_input_volume() failed
  5 pulseaudio: Reason: Invalid argument
  6 pulseaudio: set_sink_input_mute() failed
  7 pulseaudio: Reason: Invalid argument
  8 Uncompressing Linux... done, booting the kernel.
  9 Booting Linux on physical CPU 0x0
 10 Initializing cgroup subsys cpuset
 11 Initializing cgroup subsys cpu
 12 Initializing cgroup subsys cpuacct
 13 Linux version 4.4.30-rt5+ (descent@debian64) (gcc version 4.8.3 20140303 (prerelease) (crosstool-NG linaro-1.13.1+bzr2650 - Linaro GCC 2014.03) ) #9 Mon Mar 20 17:27:35 CST 2017
 14 CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
 15 CPU: VIPT aliasing data cache, unknown instruction cache
 16 Machine: ARM-Versatile PB
 17 Memory policy: Data cache writeback
 18 sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
 19 Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32480
 20 Kernel command line: console=ttyAMA0
 21 PID hash table entries: 512 (order: -1, 2048 bytes)
 22 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
 23 Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
 24 Memory: 124120K/131072K available (4039K kernel code, 185K rwdata, 1088K rodata, 164K init, 139K bss, 6952K reserved, 0K cma-reserved)
 25 Virtual kernel memory layout:
 26     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
 27     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
 28     vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
 29     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
 30     modules : 0xbf000000 - 0xc0000000   (  16 MB)
 31       .text : 0xc0008000 - 0xc050a1bc   (5129 kB)
 32       .init : 0xc050b000 - 0xc0534000   ( 164 kB)
 33       .data : 0xc0534000 - 0xc05626c0   ( 186 kB)
 34        .bss : 0xc05626c0 - 0xc0585460   ( 140 kB)
 35 NR_IRQS:224
 36 VIC @f1140000: id 0x00041190, vendor 0x41
 37 FPGA IRQ chip 0 "SIC" @ f1003000, 13 irqs, parent IRQ: 63
 38 clocksource: timer3: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
 39 Console: colour dummy device 80x30
 40 Calibrating delay loop... 735.23 BogoMIPS (lpj=3676160)
 41 pid_max: default: 32768 minimum: 301
 42 Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
 43 Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
 44 Disabling cpuset control group subsystem
 45 Initializing cgroup subsys memory
 46 Initializing cgroup subsys devices
 47 Initializing cgroup subsys freezer
 48 CPU: Testing write buffer coherency: ok
 49 Setting up static identity map for 0x8220 - 0x827c
 50 devtmpfs: initialized
 51 VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
 52 clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
 53 NET: Registered protocol family 16
 54 DMA: preallocated 256 KiB pool for atomic coherent allocations
 55 Serial: AMBA PL011 UART driver
 56 dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 44, base_baud = 0) is a PL011 rev1
 57 console [ttyAMA0] enabled
 58 dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 45, base_baud = 0) is a PL011 rev1
 59 dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 46, base_baud = 0) is a PL011 rev1
 60 fpga:09: ttyAMA3 at MMIO 0x10009000 (irq = 70, base_baud = 0) is a PL011 rev1
 61 PCI core found (slot 11)
 62 PCI host bridge to bus 0000:00
 63 pci_bus 0000:00: root bus resource [mem 0x50000000-0x5fffffff]
 64 pci_bus 0000:00: root bus resource [mem 0x60000000-0x6fffffff pref]
 65 pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
 66 pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
 67 PCI: bus0: Fast back to back transfers disabled
 68 pci 0000:00:0c.0: BAR 2: assigned [mem 0x50000000-0x50001fff]
 69 pci 0000:00:0c.0: BAR 1: assigned [mem 0x50002000-0x500023ff]
 70 pci 0000:00:0c.0: BAR 0: assigned [io  0x1000-0x10ff]
 71 vgaarb: loaded
 72 SCSI subsystem initialized
 73 clocksource: Switched to clocksource timer3
 74 NET: Registered protocol family 2
 75 TCP established hash table entries: 1024 (order: 0, 4096 bytes)
 76 TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
 77 TCP: Hash tables configured (established 1024 bind 1024)
 78 UDP hash table entries: 256 (order: 0, 4096 bytes)
 79 UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
 80 NET: Registered protocol family 1
 81 RPC: Registered named UNIX socket transport module.
 82 RPC: Registered udp transport module.
 83 RPC: Registered tcp transport module.
 84 RPC: Registered tcp NFSv4.1 backchannel transport module.
 85 NetWinder Floating Point Emulator V0.97 (double precision)
 86 futex hash table entries: 256 (order: -1, 3072 bytes)
 87 Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
 88 jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
 89 romfs: ROMFS MTD (C) 2007 Red Hat, Inc.
 90 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
 91 io scheduler noop registered
 92 io scheduler deadline registered
 93 io scheduler cfq registered (default)
 94 pl061_gpio dev:e4: PL061 GPIO chip @0x101e4000 registered
 95 pl061_gpio dev:e5: PL061 GPIO chip @0x101e5000 registered
 96 pl061_gpio dev:e6: PL061 GPIO chip @0x101e6000 registered
 97 pl061_gpio dev:e7: PL061 GPIO chip @0x101e7000 registered
 98 clcd-pl11x dev:20: PL110 rev0 at 0x10120000
 99 clcd-pl11x dev:20: Versatile hardware, VGA display
100 Console: switching to colour frame buffer device 80x30
101 brd: module loaded
102 sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103)
103 sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93
104 sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
105 sym0: SCSI BUS has been reset.
106 scsi host0: sym-2.2.3
107 scsi 0:0:2:0: CD-ROM            QEMU     QEMU CD-ROM      2.3. PQ: 0 ANSI: 5
108 scsi target0:0:2: tagged command queuing enabled, command queue depth 16.
109 scsi target0:0:2: Beginning Domain Validation
110 scsi target0:0:2: Domain Validation skipping write tests
111 scsi target0:0:2: Ending Domain Validation
112 sr 0:0:2:0: [sr0] scsi3-mmc drive: 16x/50x cd/rw xa/form2 cdda tray
113 cdrom: Uniform CD-ROM driver Revision: 3.20
114 physmap platform flash device: 04000000 at 34000000
115 physmap-flash.0: Found 1 x32 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
116 Intel/Sharp Extended Query Table at 0x0031
117 Using buffer write method
118 smc91x.c: v1.1, sep 22 2004 by Nicolas Pitre <nico@fluxnic.net>
119 smc91x smc91x.0 eth0: SMC91C11xFD (rev 1) at c8a5a000 IRQ 57
120  [nowait]
121 smc91x smc91x.0 eth0: Ethernet addr: 52:54:00:12:34:56
122 mousedev: PS/2 mouse device common for all mice
123 ledtrig-cpu: registered to indicate activity on CPUs
124 Netfilter messages via NETLINK v0.30.
125 nf_conntrack version 0.5.0 (1939 buckets, 7756 max)
126 ip_tables: (C) 2000-2006 Netfilter Core Team
127 NET: Registered protocol family 17
128 bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
129 Bridge firewalling registered
130 input: AT Raw Set 2 keyboard as /devices/fpga:06/serio0/input/input0
131 input: ImExPS/2 Generic Explorer Mouse as /devices/fpga:07/serio1/input/input2
132 VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
133 Please append a correct "root=" boot option; here are the available partitions:
134 0100            4096 ram0  (driver?)
135 0101            4096 ram1  (driver?)
136 0102            4096 ram2  (driver?)
137 0103            4096 ram3  (driver?)
138 0104            4096 ram4  (driver?)
139 0105            4096 ram5  (driver?)
140 0106            4096 ram6  (driver?)
141 0107            4096 ram7  (driver?)
142 0108            4096 ram8  (driver?)
143 0109            4096 ram9  (driver?)
144 010a            4096 ram10  (driver?)
145 010b            4096 ram11  (driver?)
146 010c            4096 ram12  (driver?)
147 010d            4096 ram13  (driver?)
148 010e            4096 ram14  (driver?)
149 010f            4096 ram15  (driver?)
150 0b00         1048575 sr0  driver: sr

qemu-system-i386

  1 Script started on Wed 05 Aug 2015 04:59:52 PM CST
  2 descent@debian32:~/linux-3.10.27/arch/x86/boot$ qemu-system-i386 -nographic -kernel bzImage -append "console=ttyS0"
  3 [    0.000000] Initializing cgroup subsys cpuset
  4 [    0.000000] Initializing cgroup subsys cpu
  5 [    0.000000] Initializing cgroup subsys cpuacct
  6 [    0.000000] Linux version 3.10.27 (descent@debian32) (gcc version 4.8.5 (Debian 4.8.5-1) ) #2 SMP Wed Aug 5 16:41:20 CST 2015
  7 [    0.000000] e820: BIOS-provided physical RAM map:
  8 [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
  9 [    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
 10 [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
 11 [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable
 12 [    0.000000] BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved
 13 [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
 14 [    0.000000] Notice: NX (Execute Disable) protection missing in CPU!
 15 [    0.000000] SMBIOS 2.8 present.
 16 [    0.000000] e820: last_pfn = 0x7fe0 max_arch_pfn = 0x100000
 17 [    0.000000] found SMP MP-table at [mem 0x000f6640-0x000f664f] mapped at [c00f6640]
 18 [    0.000000] Scanning 1 areas for low memory corruption
 19 [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
 20 [    0.000000] init_memory_mapping: [mem 0x07800000-0x07bfffff]
 21 [    0.000000] init_memory_mapping: [mem 0x00100000-0x077fffff]
 22 [    0.000000] init_memory_mapping: [mem 0x07c00000-0x07fdffff]
 23 [    0.000000] ACPI: RSDP 000f6470 00014 (v00 BOCHS )
 24 [    0.000000] ACPI: RSDT 07fe16a9 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
 25 [    0.000000] ACPI: FACP 07fe0bda 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
 26 [    0.000000] ACPI: DSDT 07fe0040 00B9A (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
 27 [    0.000000] ACPI: FACS 07fe0000 00040
 28 [    0.000000] ACPI: SSDT 07fe0c4e 009AB (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
 29 [    0.000000] ACPI: APIC 07fe15f9 00078 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
 30 [    0.000000] ACPI: HPET 07fe1671 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
 31 [    0.000000] 0MB HIGHMEM available.
 32 [    0.000000] 127MB LOWMEM available.
 33 [    0.000000]   mapped low ram: 0 - 07fe0000
 34 [    0.000000]   low ram: 0 - 07fe0000
 35 [    0.000000] Zone ranges:
 36 [    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
 37 [    0.000000]   Normal   [mem 0x01000000-0x07fdffff]
 38 [    0.000000]   HighMem  empty
 39 [    0.000000] Movable zone start for each node
 40 [    0.000000] Early memory node ranges
 41 [    0.000000]   node   0: [mem 0x00001000-0x0009efff]
 42 [    0.000000]   node   0: [mem 0x00100000-0x07fdffff]
 43 [    0.000000] Using APIC driver default
 44 [    0.000000] ACPI: PM-Timer IO Port: 0x608
 45 [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
 46 [    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
 47 [    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
 48 [    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
 49 [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
 50 [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
 51 [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
 52 [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
 53 [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
 54 [    0.000000] Using ACPI (MADT) for SMP configuration information
 55 [    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
 56 [    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
 57 [    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
 58 [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
 59 [    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
 60 [    0.000000] e820: [mem 0x08000000-0xfffbffff] available for PCI devices
 61 [    0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:1 nr_node_ids:1
 62 [    0.000000] PERCPU: Embedded 13 pages/cpu @c7ed0000 s32064 r0 d21184 u53248
 63 [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32382
 64 [    0.000000] Kernel command line: console=ttyS0
 65 [    0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
 66 [    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
 67 [    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
 68 [    0.000000] Initializing CPU#0
 69 [    0.000000] Initializing HighMem for node 0 (00000000:00000000)
 70 [    0.000000] Memory: 118524k/130944k available (5775k kernel code, 12028k reserved, 2092k data, 2124k init, 0k highmem)
 71 [    0.000000] virtual kernel memory layout:
 72 [    0.000000]     fixmap  : 0xfff15000 - 0xfffff000   ( 936 kB)
 73 [    0.000000]     pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
 74 [    0.000000]     vmalloc : 0xc87e0000 - 0xff7fe000   ( 880 MB)
 75 [    0.000000]     lowmem  : 0xc0000000 - 0xc7fe0000   ( 127 MB)
 76 [    0.000000]       .init : 0xc17b0000 - 0xc19c3000   (2124 kB)
 77 [    0.000000]       .data : 0xc15a3f7b - 0xc17af0c0   (2092 kB)
 78 [    0.000000]       .text : 0xc1000000 - 0xc15a3f7b   (5775 kB)
 79 [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
 80 [    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
 81 [    0.000000] Hierarchical RCU implementation.
 82 [    0.000000]  RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
 83 [    0.000000] NR_IRQS:2304 nr_irqs:256 16
 84 [    0.000000] Console: colour VGA+ 80x25
 85 [    0.000000] console [ttyS0] enabled
 86 [    0.000000] tsc: Fast TSC calibration failed
 87 [    0.000000] tsc: Unable to calibrate against PIT
 88 [    0.000000] tsc: using HPET reference calibration
 89 [    0.000000] tsc: Detected 3594.682 MHz processor
 90 [    0.009236] Calibrating delay loop (skipped), value calculated using timer frequency.. 7189.36 BogoMIPS (lpj=3594682)
 91 [    0.011114] pid_max: default: 32768 minimum: 301
 92 [    0.013258] Security Framework initialized
 93 [    0.015457] SELinux:  Initializing.
 94 [    0.017211] Mount-cache hash table entries: 512
 95 [    0.024728] Initializing cgroup subsys freezer
 96 [    0.027285] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
 97 [    0.027285] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
 98 [    0.027285] tlb_flushall_shift: 6
 99 [    0.067329] Freeing SMP alternatives: 24k freed
100 [    0.068019] ACPI: Core revision 20130328
101 [    0.082594] ACPI: All ACPI Tables successfully acquired
102 [    0.092004] Enabling APIC mode:  Flat.  Using 1 I/O APICs
103 [    0.095000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
104 [    0.107000] smpboot: CPU0: Intel QEMU Virtual CPU version 2.3.0 (fam: 06, model: 06, stepping: 03)
105 [    0.110000] APIC calibration not consistent with PM-Timer: 107ms instead of 100ms
106 [    0.110000] APIC delta adjusted to PM-Timer: 6249946 (6741014)
107 [    0.110658] Performance Events: Broken PMU hardware detected, using software events only.
108 [    0.112041] Failed to access perfctr msr (MSR c1 is 0)
109 [    0.121634] Brought up 1 CPUs
110 [    0.122075] smpboot: Total of 1 processors activated (7189.36 BogoMIPS)
111 [    0.136000] RTC time:  8:59:58, date: 08/05/15
112 [    0.138000] NET: Registered protocol family 16
113 [    0.140000] kworker/u2:0 (13) used greatest stack depth: 7312 bytes left
114 [    0.145499] kworker/u2:0 (17) used greatest stack depth: 7188 bytes left
115 [    0.149506] ACPI: bus type PCI registered
116 [    0.152247] PCI: PCI BIOS revision 2.10 entry at 0xfd40f, last bus=0
117 [    0.153054] PCI: Using configuration type 1 for base access
118 [    0.158222] kworker/u2:0 (28) used greatest stack depth: 7176 bytes left
119 [    0.224664] bio: create slab <bio-0> at 0
120 [    0.228120] ACPI: Added _OSI(Module Device)
121 [    0.229000] ACPI: Added _OSI(Processor Device)
122 [    0.229020] ACPI: Added _OSI(3.0 _SCP Extensions)
123 [    0.230053] ACPI: Added _OSI(Processor Aggregator Device)
124 [    0.248036] ACPI: Interpreter enabled
125 [    0.249202] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20130328/hwxface-568)
126 [    0.251100] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20130328/hwxface-568)
127 [    0.253401] ACPI: (supports S0 S3 S4 S5)
128 [    0.254000] ACPI: Using IOAPIC for interrupt routing
129 [    0.256325] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
130 [    0.260855] ACPI: No dock devices found.
131 [    0.293274] kworker/u2:0 (303) used greatest stack depth: 7164 bytes left
132 [    0.307297] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
133 [    0.309451] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
134 [    0.312217] PCI host bridge to bus 0000:00
135 [    0.313176] pci_bus 0000:00: root bus resource [bus 00-ff]
136 [    0.314220] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
137 [    0.315057] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
138 [    0.316110] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
139 [    0.317059] pci_bus 0000:00: root bus resource [mem 0x08000000-0xfebfffff]
140 [    0.325852] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
141 [    0.326111] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
142 [    0.343104] acpi PNP0A03:00: ACPI _OSC support notification failed, disabling PCIe ASPM
143 [    0.344070] acpi PNP0A03:00: Unable to request _OSC control (_OSC support mask: 0x08)
144 [    0.350525] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
145 [    0.351557] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
146 [    0.353559] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
147 [    0.355146] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
148 [    0.357429] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
149 [    0.360731] ACPI: Enabled 16 GPEs in block 00 to 0F
150 [    0.364861] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
151 [    0.365073] vgaarb: loaded
152 [    0.366034] vgaarb: bridge control possible 0000:00:02.0
153 [    0.369180] SCSI subsystem initialized
154 [    0.370104] ACPI: bus type ATA registered
155 [    0.374000] pps_core: LinuxPPS API ver. 1 registered
156 [    0.374042] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
157 [    0.376055] PTP clock support registered
158 [    0.377651] PCI: Using ACPI for IRQ routing
159 [    0.387092] cfg80211: Calling CRDA to update world regulatory domain
160 [    0.390840] NetLabel: Initializing
161 [    0.391045] NetLabel:  domain hash size = 128
162 [    0.392028] NetLabel:  protocols = UNLABELED CIPSOv4
163 [    0.394131] NetLabel:  unlabeled traffic allowed by default
164 [    0.396357] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
165 [    0.397000] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
166 [    0.399258] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
167 [    0.403000] Switching to clocksource hpet
168 [    0.459290] pnp: PnP ACPI init
169 [    0.460062] ACPI: bus type PNP registered
170 [    0.472349] pnp: PnP ACPI: found 10 devices
171 [    0.473069] ACPI: bus type PNP unregistered
172 [    0.476546] kworker/u2:0 (369) used greatest stack depth: 7020 bytes left
173 [    0.549015] NET: Registered protocol family 2
174 [    0.553424] TCP established hash table entries: 1024 (order: 1, 8192 bytes)
175 [    0.555696] TCP bind hash table entries: 1024 (order: 1, 8192 bytes)
176 [    0.557204] TCP: Hash tables configured (established 1024 bind 1024)
177 [    0.559307] TCP: reno registered
178 [    0.561870] UDP hash table entries: 256 (order: 1, 8192 bytes)
179 [    0.563621] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
180 [    0.566137] NET: Registered protocol family 1
181 [    0.568983] RPC: Registered named UNIX socket transport module.
182 [    0.570720] RPC: Registered udp transport module.
183 [    0.571742] RPC: Registered tcp transport module.
184 [    0.572788] RPC: Registered tcp NFSv4.1 backchannel transport module.
185 [    0.576091] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
186 [    0.576429] pci 0000:00:01.0: PIIX3: Enabling Passive Release
187 [    0.581731] pci 0000:00:01.0: Activating ISA DMA hang workarounds
188 [    0.622091] microcode: CPU0 sig=0x663, pf=0x1, revision=0x0
189 [    0.624728] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
190 [    0.626546] Scanning for low memory corruption every 60 seconds
191 [    0.633912] audit: initializing netlink socket (disabled)
192 [    0.636554] type=2000 audit(1438765198.636:1): initialized
193 [    0.708201] HugeTLB registered 4 MB page size, pre-allocated 0 pages
194 [    0.743534] VFS: Disk quotas dquot_6.5.2
195 [    0.745062] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
196 [    0.752887] kworker/u2:0 (544) used greatest stack depth: 6980 bytes left
197 [    0.761762] NFS: Registering the id_resolver key type
198 [    0.763958] Key type id_resolver registered
199 [    0.764838] Key type id_legacy registered
200 [    0.766713] msgmni has been set to 231
201 [    0.777784] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
202 [    0.780426] io scheduler noop registered
203 [    0.781273] io scheduler deadline registered
204 [    0.783633] io scheduler cfq registered (default)
205 [    0.786663] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
206 [    0.791211] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
207 [    0.793137] ACPI: Power Button [PWRF]
208 [    0.814357] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
209 [    0.839611] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
210 [    0.852105] Non-volatile memory driver v1.3
211 [    0.853475] Linux agpgart interface v0.103
212 [    0.857962] [drm] Initialized drm 1.1.0 20060810
213 [    0.869001] loop: module loaded
214 [    0.886918] scsi0 : ata_piix
215 [    0.889793] scsi1 : ata_piix
216 [    0.892069] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14
217 [    0.893396] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15
218 [    0.903225] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
219 [    0.907034] serio: i8042 KBD port at 0x60,0x64 irq 1
220 [    0.908692] serio: i8042 AUX port at 0x60,0x64 irq 12
221 [    0.917325] mousedev: PS/2 mouse device common for all mice
222 [    0.921552] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
223 [    0.928758] device-mapper: ioctl: 4.24.0-ioctl (2013-01-15) initialised: dm-devel@redhat.com
224 [    0.931408] cpuidle: using governor ladder
225 [    0.932440] cpuidle: using governor menu
226 [    0.934504] hidraw: raw HID events driver (C) Jiri Kosina
227 [    0.941414] Netfilter messages via NETLINK v0.30.
228 [    0.944016] nf_conntrack version 0.5.0 (1852 buckets, 7408 max)
229 [    0.952440] ctnetlink v0.93: registering with nfnetlink.
230 [    0.956965] ip_tables: (C) 2000-2006 Netfilter Core Team
231 [    0.959743] TCP: cubic registered
232 [    0.960075] Initializing XFRM netlink socket
233 [    0.963648] NET: Registered protocol family 10
234 [    0.968998] ip6_tables: (C) 2000-2006 Netfilter Core Team
235 [    0.972067] sit: IPv6 over IPv4 tunneling driver
236 [    0.975865] NET: Registered protocol family 17
237 [    0.977894] Key type dns_resolver registered
238 [    0.980079] Using IPI No-Shortcut mode
239 [    0.987210] registered taskstats version 1
240 [    0.990722]   Magic number: 11:675:977
241 [    1.053051] ata2.00: ATAPI: QEMU DVD-ROM, 2.3.0, max UDMA/100
242 [    1.056210] ata2.00: configured for MWDMA2
243 [    1.064049] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.3. PQ: 0 ANSI: 5
244 [    1.072887] sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
245 [    1.074694] cdrom: Uniform CD-ROM driver Revision: 3.20
246 [    1.079892] sr 1:0:0:0: Attached scsi generic sg0 type 5
247 [    1.088087] Freeing unused kernel memory: 2124k freed
248 [    1.128077] Write protecting the kernel text: 5776k
249 [    1.129247] Write protecting the kernel read-only data: 1696k
250                                                             
251   *    *                                   ****       ****  
252  ***  ***     **        **      *    *    *    *     **     
253  * *  * *    *  *      *  *     *    *   *      *   **      
254  * *  * *   *    *    *    *    *    *   *      *    ***    
255  *  **  *   ******    *    *    *    *   *      *      **   
256  *      *   *         *    *    *    *   *      *       **  
257  *      *    *        *    *     *  **    *    *       **   
258  *      *     ***     *    *      **  *    ****     ****    
259                                                             
260 MenuOS>>[    1.536217] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
261 [    1.614615] tsc: Refined TSC clocksource calibration: 3594.685 MHz
262 [    1.617405] Switching to clocksource tsc
263 
264 MenuOS>>help
265 help - Menu List
266     * help - Menu List
267     * version - MenuOS V1.0(Based on Linux 3.18.6)
268     * quit - Quit from MenuOS
269     * time - Show System Time
270     * time-asm - Show System Time(asm)
271 MenuOS>>quit
272 quit - Quit from MenuOS
273 MenuOS>>qemu: terminating on signal 15 from pid 5972
274 descent@debian32:~/linux-3.10.27/arch/x86/boot$ exit
275 
276 Script done on Wed 05 Aug 2015 05:00:09 PM CST

rootfs 是《Linux 内核分析》提供的, 從這裡下載:
git clone https://github.com/mengning/menu.git

我做了個小修改使其可以在 arm kernel 上執行。

使用以下指令編譯:

gcc -o init linktable.c menu.c test.c -m32 -static -lpthread

編譯相關指令
make i386_defconfig

make ARCH=arm menuconfig

make ARCH=arm versatile_defconfig
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig

make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=mnt/ext4 modules
sudo make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=mnt/ext4 modules_install

模擬器果然只是模擬器, 在 rpi 2 的真實機器上並沒有正常呈現畫面, 我的 dev 加入不少東西, 我也不知道哪個才是必要的。init 則要放在 /sbin/init

rpi2-rfs

 1 root@NB-debian:/media/2# ls /dev/ sbin/
 2 /dev/:
 3 autofs   mapper       sdb1      tty22  tty5     vboxdrvu
 4 block   mcelog       sdb2      tty23  tty50    vboxnetctl
 5 bsg   media0       sdb3      tty24  tty51    vboxusb
 6 btrfs-control  mei       serial    tty25  tty52    vcs
 7 bus   mem       sg0       tty26  tty53    vcs1
 8 cdrom   mmcblk0      sg1       tty27  tty54    vcs2
 9 cdrw   mmcblk0p1      sg2       tty28  tty55    vcs3
10 char   mmcblk0p2      shm       tty29  tty56    vcs4
11 console   mqueue       snapshot  tty3   tty57    vcs5
12 core   net       snd       tty30  tty58    vcs6
13 cpu   network_latency     sr0       tty31  tty59    vcs7
14 cpu_dma_latency  network_throughput  stderr    tty32  tty6     vcsa
15 cuse   null       stdin     tty33  tty60    vcsa1
16 disk   port       stdout    tty34  tty61    vcsa2
17 dri   ppp       tty       tty35  tty62    vcsa3
18 dvd   psaux       tty0      tty36  tty63    vcsa4
19 dvdrw   ptmx       tty1      tty37  tty7     vcsa5
20 fb0   pts       tty10     tty38  tty8     vcsa6
21 fd   random       tty11     tty39  tty9     vcsa7
22 full   rfkill       tty12     tty4   ttyS0    vfio
23 fuse   rtc       tty13     tty40  ttyS1    vga_arbiter
24 hidraw0   rtc0       tty14     tty41  ttyS2    vhci
25 hpet   sda       tty15     tty42  ttyS3    vhost-net
26 hugepages  sda1       tty16     tty43  ttyUSB0  video0
27 initctl   sda2       tty17     tty44  uhid     watchdog
28 input   sda3       tty18     tty45  uinput   watchdog0
29 kmsg   sda4       tty19     tty46  urandom  xconsole
30 kvm   sda5       tty2      tty47  usb      zero
31 log   sda6       tty20     tty48  v4l
32 loop-control  sdb       tty21     tty49  vboxdrv
33 
34 sbin/:
35 init

以下是真實機器版本:

寄件者 20150614 raspberry pi 2

總算搞出來了, sd card 抽換的手快抽筋了。

在 arm 平台假如還有問題的話, 有可能還需要 /dev/ram0 並指定 kernel 參數 (也許不用, 但 /dev/ram0 一定要)
root=/dev/ram0

ex:
bootargs root=/dev/ram0 rw console=ttyS0,115200 mem=768M@0x00000000

ref:

訂閱：文章 (Atom)

blog 文章

2023年10月28日 星期六

2021年11月10日 星期三

2021年10月22日 星期五

2021年9月5日 星期日

2020年11月9日 星期一

2020年5月29日 星期五

2020年5月24日 星期日

2020年3月20日 星期五

2020年3月13日 星期五

2020年3月6日 星期五