aboutsummaryrefslogtreecommitdiff
path: root/www-2/content/posts
diff options
context:
space:
mode:
authorYuqian Yang <crupest@crupest.life>2026-01-23 23:16:45 +0800
committerYuqian Yang <crupest@crupest.life>2026-01-23 23:16:45 +0800
commit78e3e234877cb10ca1088df31e831b36fa4a12c0 (patch)
treea4b86275895b33d47df4686e5ce8f98b57016f90 /www-2/content/posts
parent3af5ef00b38c6962c6e3f63add0312fa6537b74b (diff)
downloadcrupest-78e3e234877cb10ca1088df31e831b36fa4a12c0.tar.gz
crupest-78e3e234877cb10ca1088df31e831b36fa4a12c0.tar.bz2
crupest-78e3e234877cb10ca1088df31e831b36fa4a12c0.zip
HALF WORK!
Diffstat (limited to 'www-2/content/posts')
-rw-r--r--www-2/content/posts/c-func-ext.md101
-rw-r--r--www-2/content/posts/nspawn.md207
-rw-r--r--www-2/content/posts/use-paddleocr.md103
3 files changed, 411 insertions, 0 deletions
diff --git a/www-2/content/posts/c-func-ext.md b/www-2/content/posts/c-func-ext.md
new file mode 100644
index 0000000..1f5f822
--- /dev/null
+++ b/www-2/content/posts/c-func-ext.md
@@ -0,0 +1,101 @@
+---
+title: "Libc/POSIX Function \"Extensions\""
+date: 2025-03-04T13:40:33+08:00
+lastmod: 2025-03-04T13:40:33+08:00
+categories: coding
+tags:
+ - c
+ - posix
+---
+
+(I've given up on this, at least for linux pam.)
+
+Recently, I’ve been working on porting some libraries to GNU/Hurd. Many (old)
+libraries use [`*_MAX` constants on POSIX system
+interfaces](https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/nframe.html)
+to calculate buffer sizes. However, the GNU/Hurd maintainers urge against the
+blind use of them and refuse to define them in system headers. When old APIs are
+gone, compatibility problems come. To make my life easier, I'll put some
+reusable code snippets here to help *fix `*_MAX` bugs*.
+
+<!--more-->
+
+```c
+#include <stdlib.h>
+#include <stdarg.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+
+static inline char *xreadlink(const char *restrict path) {
+ char *buffer;
+ size_t allocated = 128;
+ ssize_t len;
+
+ while (1) {
+ buffer = (char*) malloc(allocated);
+ if (!buffer) { return NULL; }
+ len = readlink(path, buffer, allocated);
+ if (len < (ssize_t) allocated) { return buffer; }
+ free(buffer);
+ if (len >= (ssize_t) allocated) { allocated *= 2; continue; }
+ return NULL;
+ }
+ }
+
+
+static inline char *xgethostname(void) {
+ long max_host_name;
+ char *buffer;
+
+ max_host_name = sysconf(_SC_HOST_NAME_MAX);
+ buffer = malloc(max_host_name + 1);
+
+ if (gethostname(buffer, max_host_name + 1)) {
+ free(buffer);
+ return NULL;
+ }
+
+ buffer[max_host_name] = '\0';
+ return buffer;
+}
+
+static inline char *xgetcwd(void) {
+ char *buffer;
+ size_t allocated = 128;
+
+ while (1) {
+ buffer = (char*) malloc(allocated);
+ if (!buffer) { return NULL; }
+ getcwd(buffer, allocated);
+ if (buffer) return buffer;
+ free(buffer);
+ if (errno == ERANGE) { allocated *= 2; continue; }
+ return NULL;
+ }
+}
+
+static inline __attribute__((__format__(__printf__, 2, 3))) int
+xsprintf(char **buf_ptr, const char *restrict format, ...) {
+ char *buffer;
+ int ret;
+
+ va_list args;
+ va_start(args, format);
+
+ ret = snprintf(NULL, 0, format, args);
+ if (ret < 0) { goto out; }
+
+ buffer = malloc(ret + 1);
+ if (!buffer) { ret = -1; goto out; }
+
+ ret = snprintf(NULL, 0, format, args);
+ if (ret < 0) { free(buffer); goto out; }
+
+ *buf_ptr = buffer;
+
+out:
+ va_end(args);
+ return ret;
+}
+```
diff --git a/www-2/content/posts/nspawn.md b/www-2/content/posts/nspawn.md
new file mode 100644
index 0000000..866cf96
--- /dev/null
+++ b/www-2/content/posts/nspawn.md
@@ -0,0 +1,207 @@
+---
+title: "Use systemd-nspawn to Create a Development Sandbox"
+date: 2025-03-04T23:22:23+08:00
+lastmod: 2025-03-27T17:46:24+08:00
+---
+
+*systemd-nspawn* is a great tool for creating development sandboxes. Compared to
+other similar technologies, it's lightweight, flexible, and easy to use. In this
+blog, I'll present a simple guide to using it.
+
+<!--more-->
+
+## Advantages
+
+I've been using traditional VMs and Docker for creating development
+environments. While both work fine, regardless of the performance, they suffer
+from being overly isolated. Two big headaches for me are host network sharing in
+traditional VMs and the immutability of Docker container ports and mounts.
+
+*systemd-nspawn* is much more flexible. Every feature can be configured
+granularly and dynamically. For example, filesystem sharing can be configured to
+work like bind mounts, and network isolation can be disabled entirely, which
+exactly solves the two headaches mentioned above. Additionally, being part of
+*systemd*, it has the same excellent design as other *systemd* components.
+
+Debian has a similar powerful tool called *schroot*. It is the official tool for
+automatic package building. Unfortunately, it seems to be a tool specific to
+Debian.
+
+## Usage
+
+*systemd-nspawn* consists of two parts that work together to achieve its VM
+functionality:
+
+1. The program `systemd-nspawn`, which runs other programs in an isolated
+ environment with user-specified settings. Each running VM is essentially a
+ group of processes launched via `systemd-nspawn`.
+2. Components for defining and managing VMs, possibly utilizing
+ `systemd-nspawn`.
+
+*systemd-nspawn* has a user interface similar to *systemd service*:
+
+- `[name].service` => `[name].nspawn`: Define VMs.
+ - Should be placed in `/etc/systemd/nspawn/`, where `machinectl` scans for VM
+ definitions.
+ - `[name]` serves as the VM's name. Use it to specify the VM when calling
+ `machinectl`. Note: You'd better use a valid hostname (avoid illegal
+ characters like `.`) to prevent weird errors.
+ - The options available roughly mirror `systemd-nspawn`'s CLI arguments, with
+ some adjustments to better fit VM semantics.
+ - Isolation-related options are usually prefixed with `Private` (e.g.,
+ `PrivateUsers=`).
+- `systemctl` => `machinectl`: Manage VMs.
+ - `enable`/`disable`: Set whether the VM starts automatically at system boot.
+ - `start`/`poweroff`/`reboot`/`terminate`/`kill`: Control the VM's running
+ state.
+ - `login`/`shell`: Do things inside the VM.
+
+I'll demonstrate how to create a Debian-based VM on Arch Linux as an example.
+You should adjust the commands based on your own situation.
+
+### Create Root Filesystem
+
+The root filesystem of a distribution can be created using a special tool from
+its package manager. For Debian-based distributions, it's `debootstrap`. If your
+OS uses a different package manager ecosystem, the target distribution's one and
+its keyrings (which might reside somewhere else) have to be installed first.
+
+```bash-session
+# pacman -S debootstrap debian-archive-keyring ubuntu-keyring
+```
+
+Regular directories work perfectly as root filesystems, but other directory-like
+things should work, too, such as `btrfs` subvolume.
+
+```bash-session
+# btrfs subvolume create /var/lib/machines/[name]
+```
+
+Now, run `debootstrap` to create a minimal filesystem. Update the command with
+the target distribution's codename and one of its mirrors you select.
+
+```bash-session
+# debootstrap --include=dbus,libpam-systemd,libnss-systemd [codename] \
+ /var/lib/machines/[name] [mirror]
+```
+
+At this point, the filesystem contains only the distribution's essential
+packages, much like a base Docker image (e.g., `debian`), so you can customize
+it in a similar way.
+
+### Configure and Customize
+
+I'll present my personal configuration here as a reference. You can create a new
+one based on it or from scratch.
+
+1. Disable user isolation: `[Exec] PrivateUsers=no`
+2. Disable network isolation: `[Network] Private=no`
+3. Create a user with the same username, group name, UID and GIDs: should be
+ done inside the VM.
+4. Only bind a subdirectory under *home*: `[Files] Bind=/home/[user]/[subdir]`
+5. Set the hostname: `[Exec] Hostname=[hostname]`
+
+I disable user isolation because it's implemented using the kernel's user
+namespace, which adds many inconveniences due to UID/GID mapping.
+
+So, the final `.nspawn` file is like:
+
+```systemd
+/etc/systemd/nspawn/[name].nspawn
+---
+[Exec]
+PrivateUsers=no
+Hostname=[hostname]
+
+[Files]
+Bind=/home/[user]/[subdir]
+
+[Network]
+Private=no
+```
+
+If `machinectl` can already start the VM, you can log in to customize it
+further. Otherwise, you can use `systemd-nspawn` directly to enter the VM and
+run commands inside it. `--resolv-conf=bind-host` binds the host's
+`/etc/resolv.conf` file to make the network work.
+
+```bash-session
+# systemd-nspawn --resolv-conf=bind-host -D /var/lib/machines/[name]
+```
+
+Now, inside the VM, you can do whatever you like. In my configuration, a correct
+user must be created manually.
+
+```bash-session
+# apt install locales lsb-release sudo \
+ nano vim less man bash-completion curl wget \
+ build-essential git
+# dpkg-reconfigure locales
+
+# useradd -m -G sudo -s /usr/bin/bash [user]
+# passwd [user]
+```
+
+Some setup may need to be done manually, especially those usually handled by the
+distribution's installer.
+
+1. Update `/etc/hostname` with the VM's real hostname.
+2. Update `/etc/hosts`.
+
+```plain
+/etc/hosts
+---
+127.0.0.1 localhost [hostname]
+::1 localhost ip6-localhost ip6-loopback
+ff02::1 ip6-allnodes
+ff02::2 ip6-allrouters
+```
+
+**Ubuntu 20.04 specific:** Due to [a bug in
+systemd](https://github.com/systemd/systemd/issues/22234), the backport source
+has to be added.
+
+```plain
+/etc/apt/sources.list
+---
+deb https://mirrors.ustc.edu.cn/ubuntu focal main restricted universe multiverse
+deb https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
+deb https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
+deb https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse
+```
+
+### Use
+
+The following command starts a new shell session for the specified user inside
+the VM, where you can run commands and perform tasks.
+
+```bash-session
+# machinectl shell [user]@[name]
+```
+
+Another way is to use `login` command to enter the *login console*. From there,
+you can log in as a user to start a shell session.
+
+```bash-session
+# machinectl login [name]
+```
+
+To exit a VM session (especially in the *login console*), press `CTRL+]` three
+times quickly in a row.
+
+### Snapshot
+
+The easiest way to backup/snapshot a VM is to create an archive of the VM's
+filesystem. You can use any archive tool you prefer, such as the simple `tar`.
+If the VM's filesystem is a `btrfs` subvolume, native `btrfs` snapshots can be
+used here. Before creating a snapshot, you should power off the VM to avoid
+archiving runtime files.
+
+```bash-session
+# machinectl poweroff [name]
+# btrfs subvolume snapshot /var/lib/machines/[name] \
+ /var/lib/machines/btrfs-snapshots/[name]/[snapshot-name]
+```
+
+`machinectl` also provides an *image* feature similar to Docker, though I've
+never tried it. Feel free to explore it if you're interested!
diff --git a/www-2/content/posts/use-paddleocr.md b/www-2/content/posts/use-paddleocr.md
new file mode 100644
index 0000000..806df41
--- /dev/null
+++ b/www-2/content/posts/use-paddleocr.md
@@ -0,0 +1,103 @@
+---
+title: "Use PaddleOCR"
+date: 2022-11-30T13:25:36+08:00
+description: Simple steps to use PaddleOCR.
+categories: coding
+tags:
+ - AI
+ - python
+ - OCR
+---
+
+I guess [_OCR_](https://en.wikipedia.org/wiki/Optical_character_recognition) is not something new for us. While there are a lot of open source artificial intelligence engines to achieve this, I need a easy-to-use one.
+
+Recently I got a task to convert images into text. The image number is fairly big. So it's just impossible to OCR them one by one manually. So I wrote a python script to handle this tedious task.
+
+<!--more-->
+
+## Basic Processing
+
+The original images contain a identical useless frame around the part that I need. So a crop is required because it will improve the performance (of course, the image is smaller) and there are unrelated texts in the frame.
+
+Cropping is a easy problem. Just install [`Pillow`](https://pillow.readthedocs.io/en/stable/) package with `pip`:
+
+```shell
+pip install Pillow
+```
+
+Then use `Pillow` to do the cropping:
+
+```python
+image_file_list = ["image1.png", "image2.png", ...]
+crop_file_list = [f"crop-{image_file}" for image_file in image_file_list]
+
+## left, top, width, height
+geometry = (100, 200, 300, 400)
+print("Target geometry:", geometry)
+## convert to (left, top, right, bottom)
+geometry_ltrb = (geometry[0], geometry[1], geometry[0] +
+ geometry[2], geometry[1] + geometry[3])
+
+## crop image with geometry
+for index, image_file in enumerate(image_file_list):
+ print(f"[{index + 1}/{len(image_file_list)}] Cropping '{image_file}' ...")
+ with Image.open(join(dir_path, image_file)) as image:
+ image.crop(geometry_ltrb).save(crop_file_list)
+```
+
+Now we have cropped images with original filename prefixed by `crop-`.
+
+## Install PaddlePaddle
+
+It's not easy to install [`PaddlePaddle`](https://github.com/PaddlePaddle/Paddle) with `pip` because it needs to run some native compilation. `Anaconda` is also complex to install and generates a lot of garbage files. The cleanest way is to use [`Docker`](https://www.docker.com) and with [`vscode` Remote Connect extensions](https://code.visualstudio.com/docs/devcontainers/containers).
+
+Of course you need to install docker first, which is basically out of this blog's scope.
+
+Then run the following command to create and run the `PaddlePaddle` image:
+
+```shell
+docker run -it --name ppocr -v "$PWD:/data" --network=host registry.baidubce.com/paddlepaddle/paddle:2.4.0-cpu /bin/bash
+```
+
+Something to note
+
+1. You can change the mounted volumes to what you want to process.
+
+2. This image is pulled from [`Baidu`](https://baidu.com) (the company creates _PaddlePaddle_) registry, which is fast in China. You can also pull it from `DockerHub`.
+
+3. This image's _PaddlePaddle_ is based on cpu. Of course you have a cpu in your computer. But if you have a GPU or even [_CUDA_](https://developer.nvidia.com/cuda-downloads), you can select another image with correct tag. But cpu image is almost always work and using GPU is harder to configure.
+
+4. I don't known why `--network=host` is needed. The container does not publish any ports. But it can access Internet faster or VSCode Remote Connect needs it?
+
+## Install PaddleOCR
+
+This image above only contain _PaddlePaddle_. [_PaddleOCR_](https://github.com/PaddlePaddle/PaddleOCR) is another package based on it and needs individual install. However, this time we can just use `pip` again.
+
+```shell
+pip install paddleocr
+```
+
+## Coding
+
+The next step is to write python codes. Also the easiest part!
+You can connect to the container you just created with vscode and then happy coding!
+
+```python
+ocr = PaddleOCR(use_angle_cls=True, lang="ch") ## change the language to what you need
+image_text_list = []
+for index, crop_image_file in enumerate(crop_file_list):
+ print(f"[{index + 1}/{len(crop_file_list)}] OCRing '{crop_image_file}' ...")
+ result = ocr.ocr(crop_image_file, cls=True)
+ result = result[0] ## There is some inconsistence of official docs. Result is a list with single element.
+ line_text_list = [line[1][0] for line in result] ## a list of text str
+ image_text = "\n".join(line_text_list)
+ image_text_list.append(paragraph)
+```
+
+Now you can do any other things to the the `image_text_list` .
+
+## Finally
+
+Now just run the script. Or even better, customize it.
+
+By the way, `PaddleOCR` is far more accurate than [`tesseract`](https://tesseract-ocr.github.io) in __Chinese__. Maybe because it is created by _Baidu_, a Chinese local company or I missed some configuration. For English, I haven't tested.