Revision of 数据迁移计划

anduin revised this gist 1737707576. Go to revision

1 file changed, 2 insertions, 2 deletions

plan.md

			@@ -449,9 +449,9 @@ If bcache-super-show says that that the backing dev.data.cache_state state is cl
449	449
450	450		However, if clean, you could try force-starting the backing device without cache device:
451	451
452		-	```b
	452	+	```bash
453	453		echo 1 \| sudo tee /sys/class/block/$dev/bcache/running
454		-
	454	+	```
455	455
456	456
457	457

anduin revised this gist 1737384828. Go to revision

1 file changed, 33 insertions, 1 deletion

plan.md

			@@ -1,4 +1,4 @@
1		-	最近，我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。
	1	+	最近，我的磁盘空间不太够用了。
2	2
3	3		```bash
4	4		anduin@ms-server:~$ cd /swarm-vol/
			@@ -559,3 +559,35 @@ However, if clean, you could try force-starting the backing device without cache
559	559		```bash
560	560		echo 1 \| sudo tee /sys/class/block/$dev/bcache/running
561	561		```
	562	+
	563	+	## Eject cache
	564	+
	565	+	I used `bcache` only in a writethrough configuration, and IIRC even then `bcache` doesn't like at all if the cache device vanishes while the machine is running. Expect the `bcache` device to stall completely if that happens.
	566	+
	567	+	I haven't tried to remove the cache device while the machine is powered down, so I can't say anything about that. I do think though that `bcache` is still pretty touchy, so I'd recommend that you try that with a VM or a physical test machine first.
	568	+
	569	+
	570	+	----------
	571	+
	572	+
	573	+	To safely remove the cache device, you can detach the cache set from the bcache device:
	574	+
	575	+	echo <cache-set-uuid> > /sys/block/bcache0/bcache/detach
	576	+
	577	+	To determine the necessary cache set UUID, look in `/sys/fs/bcache/`:
	578	+
	579	+	host ~ # ll /sys/fs/bcache/
	580	+	total 0
	581	+	drwxr-xr-x 7 root root 0 Feb 19 00:11 eb99feda-fac7-43dc-b89d-18765e9febb6
	582	+	--w------- 1 root root 4096 Feb 19 00:11 register
	583	+	--w------- 1 root root 4096 Feb 7 07:17 register_quiet
	584	+
	585	+	So for example in this case, run:
	586	+
	587	+	echo eb99feda-fac7-43dc-b89d-18765e9febb6 > /sys/block/bcache0/bcache/detach
	588	+
	589	+	The `state` file should say `no cache` after that:
	590	+
	591	+	host ~ # cat /sys/block/bcache0/bcache/state
	592	+	no cache
	593	+

anduin revised this gist 1737378254. Go to revision

1 file changed, 643 deletions

plan.md

			@@ -298,649 +298,6 @@ ls -l /sys/block/bcache0/bcache
298	298
299	299		完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。
300	300
301		-	使用下面的命令检查其状态:
302		-
303		-	```bash
304		-	Related articles end}}
305		-
306		-	[https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual].
307		-
308		-	{{Tip\|An alternative to Bcache is the [[LVM#Cache\|LVM cache]].}}
309		-
310		-	Bcache needs the backing device to be formatted as a bcache block device. In most cases, [https://github.com/g2p/blocks blocks to-bcache] can do an in-place conversion.
311		-
312		-	{{Out of date\|Any source for bcache with btrfs causing corruption in 2024? The linked blog has no extra details }}
313		-
314		-	{{Warning\|1=<nowiki/>
315		-	* Be sure you back up any important data first.
316		-	* Bcache and [[btrfs]] could leave you with a corrupted filesystem. Please visit [https://www.hdevalence.ca/blog/2013-09-21-notes-on-my-archlinux-install this post] for more information. Btrfs wiki reports that it was fixed in kernels 3.19+ [https://btrfs.wiki.kernel.org/index.php/Gotchas#Historical_references].
317		-	}}
318		-
319		-	== Setting up bcached btrfs file systems on an existing system ==
320		-
321		-	{{Warning\|make-bcache '''will not''' import an existing drive or partition – it will reformat it.}}
322		-
323		-	=== Preparation ===
324		-
325		-	[[Install]] {{AUR\|bcache-tools}}.
326		-
327		-	Use fdisk to create the appropriate partitions on the SSD's and hard drives to hold the cache and the backing data.
328		-	{{Tip\| It is possible to create many partitions on a single drive. This allows for testing of elaborate setups before committing. Be aware all data will be lost when the drive fails. This will also kill performance of the drive, due to unfavorable access patterns.}}
329		-
330		-	=== Situation: 1 hard drive and 1 read cache SSD ===
331		-
332		-	{{Warning\|
333		-	* When a single hard drive fails, all data is lost.
334		-	* Do not enable write caching, as that can cause data loss when the SSD fails
335		-	}}
336		-	+--------------+
337		-	\| btrfs /mnt \|
338		-	+--------------+
339		-	\| /dev/Bcache0 \|
340		-	+--------------+
341		-	\| Cache \|
342		-	\| /dev/sdk1 \|
343		-	+--------------+
344		-	\| Data \|
345		-	\| /dev/sdv1 \|
346		-	+--------------+
347		-
348		-	1. Format the backing device (This will typically be your mechanical drive). The backing device can be a whole device, a partition or any other standard block device. This will create /dev/bcache0
349		-
350		-	# make-bcache -B /dev/sdv1
351		-
352		-	2. Format the cache device (This will typically be your SSD). The cache device can be a whole device, a partition or any other standard block device
353		-
354		-	# make-bcache -C /dev/sdk1
355		-
356		-	In this example the default block and bucket sizes of 512B and 128kB are used. The block size should match the backing devices sector size which will usually be either 512 or 4k. The bucket size should match the erase block size of the caching device with the intent of reducing write amplification. For example, using a HDD with 4k sectors and an SSD with an erase block size of 2MB this command would look like
357		-
358		-	# make-bcache --block 4k --bucket 2M -C /dev/sdk1
359		-
360		-	{{Note\|You may need to omit the {{ic\|--block 4k}} option, see [https://unix.stackexchange.com/questions/359508/cannot-attach-cache-device-to-backing-device Cannot attach cache device to backing device].}}
361		-
362		-	3. Get the uuid of the cache device
363		-
364		-	# bcache-super-show /dev/sdk1 \| grep cset
365		-	cset.uuid f0e01318-f4fd-4fab-abbb-d76d870503ec
366		-
367		-	4. Register the cache device against your backing device. Replace the example uuid with the uuid of your cache. Udev rules will take care of this on reboot and will only need to be done once.
368		-
369		-	# echo f0e01318-f4fd-4fab-abbb-d76d870503ec > /sys/block/bcache0/bcache/attach
370		-
371		-	5. Create the btrfs filesystem.
372		-
373		-	# mkfs.btrfs /dev/bcache0
374		-
375		-	6. mount the filesystem
376		-
377		-	# mount /dev/bcache0 /mnt
378		-
379		-	7. If you want to have this partition available during the initcpio (i.e. you require it at some point in the boot process) you need to add 'bcache' to your modules array in /etc/mkinitcpio.conf as well as adding the 'bcache' hook in your list between block and filesystems. You must then [[regenerate the initramfs]].
380		-
381		-	=== Situation: Prevent all write access to a HDD ===
382		-	{{Warning\|
383		-	* When the hard drive or the SSD fails, all data is lost.
384		-	* Consider using BTRFS RAID to prevent data loss when a SSD / HDD fails.
385		-	}}
386		-	In this situation the goal is to keep the HDD idle as long as possible. This is achieved by absorbing all writes with the SSD. The hard drive is only activated when the SSD is full, or when something is read that's not on the SSD.
387		-
388		-	Enable the writeback cache mode:
389		-
390		-	# echo writeback > /sys/block/bcache0/bcache/cache_mode
391		-
392		-	Let bcache completely sync with the hard drive.
393		-
394		-	# echo 0 > /sys/block/bcache0/bcache/writeback_percent
395		-
396		-	Don't let sequential IO bypass the cache:
397		-
398		-	# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
399		-
400		-	Let bcache wait a week after the previous sync is done:
401		-
402		-	# echo $((72460*60)) > /sys/block/bcache0/bcache/writeback_delay
403		-
404		-	Don't let bcache go around the cache when there's read / write congestion
405		-
406		-	# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
407		-	# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
408		-
409		-	Put the HDD to sleep after 20 minutes:
410		-	# hdparm -S 240 /dev/$(cat /sys/block/bcache0/bcache/backing_dev_name)
411		-	/dev/sdh1:
412		-	setting standby to 240 (20 minutes)
413		-
414		-
415		-	First use lsblk to get the device names of the HDD and SSD. In this example /dev/sdh1 is the HDD, /dev/sdc1 is the SSD:
416		-
417		-	# lsblk -M -s
418		-	bcache0 254:0 0 931.5G 0 disk
419		-	├─sdc1 8:33 0 111.8G 0 part
420		-	│ └─sdc 8:32 0 111.8G 0 disk
421		-	└─sdh1 8:113 0 931.5G 0 part
422		-	└─sdh 8:112 0 931.5G 0 disk
423		-
424		-	Now Dstat can be used to monitor disk access to the members of the bcache set.
425		-
426		-	$ dstat -D sdc1,sdh1
427		-
428		-	== Advanced operations ==
429		-
430		-	=== Resize backing device ===
431		-
432		-	It is possible to resize the backing device so long as you do not move the partition start. This process is described in [https://lore.kernel.org/linux-bcache/CAH+dOxJv-ajvLfbUSo8dqG0a8_grNBhfxJ1EbmSrYZz0YXJM2w@mail.gmail.com/T/ the mailing list]. Here is an example using btrfs volume directly on bcache0. For LVM containers or for other filesystems, procedure will differ.
433		-
434		-	==== Example of growing ====
435		-
436		-	In this example, I grow the filesystem by 4GB.
437		-
438		-	1. Reboot to a live CD/USB Drive (need not be bcache enabled) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G larger.
439		-
440		-	{{Warning\|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}}
441		-
442		-	2. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum. For btrfs, that is
443		-
444		-	# btrfs filesystem resize max /
445		-
446		-	For ext3/4, that is:
447		-
448		-	# resize2fs /dev/bcache0
449		-
450		-	==== Example of shrinking ====
451		-
452		-	In this example, I shrink the filesystem by 4GB.
453		-
454		-	1. Disable writeback cache (switch to writethrough cache) and wait for the disk to flush.
455		-
456		-	# echo writethrough > /sys/block/bcache0/bcache/cache_mode
457		-	$ watch cat /sys/block/bcache0/bcache/state
458		-
459		-	wait until state reports "clean". This might take a while.
460		-
461		-	===== Force flush of cache to backing device =====
462		-
463		-	I suggest to use
464		-
465		-	# echo 0 > /sys/block/bcache0/bcache/writeback_percent
466		-
467		-	This will flush the dirty data of the cache to the backing device in less a minute.
468		-
469		-	Revert back the value after with
470		-
471		-	# echo 10 > /sys/block/bcache0/bcache/writeback_percent
472		-
473		-	2. Shrink the mounted filesystem by something more than the desired amount, to ensure we do not accidentally clip it later. For btrfs, that is:
474		-
475		-	# btrfs filesystem resize -5G /
476		-
477		-	For ext3/4 you can use ''resize2fs'', but only if the partition is unmounted
478		-
479		-	{{hc\|$ df -h /home\|
480		-	/dev/bcache0 290G 20G 270G 1% /home
481		-	}}
482		-
483		-	# umount /home
484		-	# resize2fs /dev/bcache0 283G
485		-
486		-	3. Reboot to a LiveCD/USB drive (does not need to support bcache) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G smaller.
487		-
488		-	{{Warning\|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}}
489		-
490		-	4. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum (that is, the size we shrunk the actual partition to in step 3). For btrfs, that is:
491		-
492		-	# btrfs filesystem resize max /
493		-
494		-	For ext3/4, that is:
495		-
496		-	# resize2fs /dev/bcache0
497		-
498		-	5. Re-enable writeback cache if you want that enabled:
499		-
500		-	# echo writeback > /sys/block/bcache0/bcache/cache_mode
501		-
502		-	{{Note\|If you are very careful you can shrink the filesystem to the exact size in step 2 and avoid step 4. Be careful, though, many partition tools do not do exactly what you want, but instead adjust the requested partition start/end points to end on sector boundaries. This may be difficult to calculate ahead of time}}
503		-
504		-	== Troubleshooting ==
505		-
506		-	=== /dev/bcache device does not exist on bootup ===
507		-
508		-	If you are sent to a busy box shell with an error:
509		-
510		-	{{bc\|1=
511		-	ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'.
512		-	You are being dropped to a recovery shell
513		-	Type 'exit' to try and continue booting
514		-	}}
515		-
516		-	This might happen if the backing device is configured for "writeback" mode (default is writearound). When in "writeback" mode, the /dev/bcache0 device is not started until the cache device is both registered and attached. Registering is something that needs to happen every bootup, but attaching should only have to be done once.
517		-
518		-	To continue booting, try one of the following:
519		-
520		-	* Register both the backing device and the caching device
521		-
522		-	# echo /dev/sda3 > /sys/fs/bcache/register
523		-	# echo /dev/sdb > /sys/fs/bcache/register
524		-
525		-	If the /dev/bcache0 device now exists, type exit and continue booting. You will need to fix your initcpio to ensure devices are registered before mounting the root device.
526		-
527		-	{{Note\|
528		-	* An error of "sh: echo: write error: Invalid argument" means the device was already registered or is not recognized as either a bcache backing device or cache. If using the udev rule on boot it should only attempt to register a device if it finds a bcache superblock
529		-	* This can also happen if using udev's 69-bcache.rules in Installation's step 7 and blkid and bcache-probe "disagree" due to rogue superblocks. See [https://bcache.evilpiepirate.org/#index6h1 bcache's wiki] for a possible explanation/resolution.
530		-	}}
531		-
532		-	* Re-attach the cache to the backing device:
533		-
534		-	If the cache device was registered, a folder with the UUID of the cache should exist in {{ic\|/sys/fs/bcache}}. Use that UUID when following the example below:
535		-
536		-	{{hc\|# ls /sys/fs/bcache/\|
537		-	b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 register register_quiet
538		-	}}
539		-
540		-	# echo b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 > /sys/block/sda/sda3/bcache/attach
541		-
542		-	If the {{ic\|/dev/bcache0}} device now exists, type exit and continue booting. You should not have to do this again. If it persists, ask on the bcache mailing list.
543		-
544		-	{{Note\|An error of {{ic\|sh: echo: write error: Invalid argument}} means the device was already attached. An error of {{ic\|sh: echo: write error: No such file or directory}} means the UUID is not a valid cache (make sure you typed it correctly).}}
545		-
546		-	* Invalidate the cache and force the backing device to run without it. You might want to check some stats, such as "dirty_data" so you have some idea of how much data will be lost.
547		-
548		-	{{hc\|# cat /sys/block/sda/sda3/bcache/dirty_data\|
549		-	-3.9M
550		-	}}
551		-
552		-	dirty data is data in the cache that has not been written to the backing device. If you force the backing device to run, this data will be lost, even if you later re-attach the cache.
553		-
554		-	{{hc\|# cat /sys/block/sda/sda3/bcache/running\|
555		-	0
556		-	}}
557		-
558		-	# echo 1 > /sys/block/sda/sda3/bcache/running
559		-
560		-	The {{ic\|/dev/bcache0}} device will now exist. Type exit and continue booting. You might want to unregister the cache device and run make-bcache again. An fsck on {{ic\|/dev/bcache0}} would also be wise. See the [https://docs.kernel.org/admin-guide/bcache.html bcache documentation].
561		-
562		-	{{Warning\|Only invalidate the cache if one of the two options above did not work.}}
563		-
564		-	=== /sys/fs/bcache/ does not exist ===
565		-
566		-	The kernel you booted is not bcache enabled, or you the bcache [[Kernel module#Manual module handling\|module is not loaded]]
567		-
568		-	=== write error: Invalid argument when trying to attach a device due to mismatched block parameter ===
569		-
570		-	Given {{ic\|bash: echo: write error: Invalid argument}} when trying to attach a device, and the actual error is shown with [[dmesg]]:
571		-
572		-	bcache: bch_cached_dev_attach() Couldn't attach sdc: block size less than set's block size
573		-
574		-	This happens because the {{ic\|--block 4k}} parameter was not set on either device and defaults can mismatch.
575		-
576		-	Creating both the backing and caching device in one command automatically solves the issue, but when using separate commands the block size parameter sometimes needs to be set manually on both devices.
577		-
578		-	=== Device or resource busy ===
579		-	When a device is in use as a bcache backing device, it can not be formatted nor partitioned:
580		-	# make-bcache -C /dev/sdb1
581		-	Can't open dev /dev/sdb1: Device or resource busy
582		-
583		-	# fdisk /dev/sdb
584		-
585		-	Welcome to fdisk (util-linux 2.37.2).
586		-	Changes will remain in memory only, until you decide to write them.
587		-	Be careful before using the write command.
588		-
589		-	This disk is currently in use - repartitioning is probably a bad idea.
590		-	It's recommended to umount all file systems, and swapoff all swap
591		-	partitions on this disk.
592		-
593		-
594		-	Command (m for help): q
595		-
596		-	To fix this, first run this command to confirm the disk is actually used as a bcache backing device:
597		-	# bcache-super-show /dev/sdb1
598		-	sb.magic ok
599		-	sb.first_sector 8 [match]
600		-	sb.csum A3D2B8610F6C5E35 [match]
601		-	sb.version 1 [backing device]
602		-
603		-	dev.label (empty)
604		-	dev.uuid 5a868788-65a2-4564-b4b7-c1817d0b6080
605		-	dev.sectors_per_block 1
606		-	dev.sectors_per_bucket 1024
607		-	dev.data.first_sector 16
608		-	dev.data.cache_mode 1 [writeback]
609		-	dev.data.cache_state 2 [dirty]
610		-
611		-	cset.uuid 42dcb651-6b53-4b65-bc49-9b1ca0acc5b1
612		-
613		-	Then stop the backing device. This will also remove the corresponding /dev/bcache device.
614		-	# echo 1 > /sys/class/block/sdb1/bcache/stop
615		-
616		-	# dmesg
617		-	[ 3171.263577] bcache: bcache_device_free() bcache0 stopped
618		-	Now the device can be partitioned:
619		-	# fdisk /dev/sdb
620		-
621		-	Welcome to fdisk (util-linux 2.37.2).
622		-	Changes will remain in memory only, until you decide to write them.
623		-	Be careful before using the write command.
624		-
625		-
626		-	Command (m for help): q
627		-	When fdisk exits, the kernel scans the drive again, notices it's a bcache backing device, and uses the drive as a backing device.
628		-	# dmesg
629		-	[ 3190.643270] sdb: sdb1
630		-	[ 3190.833029] bcache: register_bdev() registered backing device sdb1
631		-	This creates the directory bcache under /sys/class/block/sdb1/
632		-	# ls /sys/class/block/sdb1/
633		-	alignment_offset bcache dev discard_alignment holders inflight partition power ro size start stat subsystem uevent
634		-
635		-	== See also ==
636		-
637		-	* [https://bcache.evilpiepirate.org Bcache Homepage]
638		-	* [https://docs.kernel.org/admin-guide/bcache.html Bcache Manual]
639		-
640		-	==================================================
641		-
642		-	上面的信息是我从别的地方摘抄的。可能有用，可能没用。可以参考然后回答下面的问题。
643		-
644		-	最近，我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。
645		-
646		-	```bash
647		-	anduin@ms-server:~$ cd /swarm-vol/
648		-	anduin@ms-server:/swarm-vol$ df . -Th
649		-	Filesystem Type Size Used Avail Use% Mounted on
650		-	/dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol
651		-	anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/
652		-	anduin@ms-server:/swarm-vol/nextcloud$ df . -Th
653		-	Filesystem Type Size Used Avail Use% Mounted on
654		-	/dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud
655		-	anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l
656		-	Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors
657		-	Disk model: INTEL SSDPED1D480GA
658		-	Units: sectors of 1 * 512 = 512 bytes
659		-	Sector size (logical/physical): 512 bytes / 512 bytes
660		-	I/O size (minimum/optimal): 512 bytes / 512 bytes
661		-	Disklabel type: gpt
662		-	Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4
663		-
664		-	Device Start End Sectors Size Type
665		-	/dev/nvme1n1p1 2048 1050623 1048576 512M EFI System
666		-	/dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem
667		-
668		-
669		-	Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors
670		-	Disk model: WUS4BB076D7P3E3
671		-	Units: sectors of 1 * 4096 = 4096 bytes
672		-	Sector size (logical/physical): 4096 bytes / 4096 bytes
673		-	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
674		-
675		-
676		-	Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
677		-	Disk model: CT1000P3PSSD8
678		-	Units: sectors of 1 * 512 = 512 bytes
679		-	Sector size (logical/physical): 512 bytes / 512 bytes
680		-	I/O size (minimum/optimal): 512 bytes / 512 bytes
681		-	anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/
682		-	anduin@ms-server:/dev/disk/by-uuid$ ls -ashl
683		-	total 0
684		-	0 drwxr-xr-x 2 root root 140 Jan 17 15:21 .
685		-	0 drwxr-xr-x 7 root root 140 Dec 28 05:45 ..
686		-	0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1
687		-	0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1
688		-	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1
689		-	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2
690		-	anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab
691		-	# /etc/fstab: static file system information.
692		-	#
693		-	# Use 'blkid' to print the universally unique identifier for a
694		-	# device; this may be used with UUID= as a more robust way to name devices
695		-	# that works even if disks are added and removed. See fstab(5).
696		-	#
697		-	# <file system> <mount point> <type> <options> <dump> <pass>
698		-	UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1
699		-	UUID=9C58-514E /boot/efi vfat umask=0077 0 1
700		-	/dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
701		-	/dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0
702		-	/swapfile none swap sw 0 0
703		-	```
704		-
705		-	由上面的信息，不难判断出：
706		-
707		-	我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ，当然这和我们要调查的内容没什么关系。
708		-
709		-	我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5`
710		-
711		-	即使我暂时使用奇技淫巧，将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下，还是濒临不够了。
712		-
713		-	但是，幸运的是，我购买了一个全新的大而慢的机械硬盘：
714		-
715		-	```bash
716		-	Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors
717		-	Disk model: RAID5
718		-	Units: sectors of 1 * 512 = 512 bytes
719		-	Sector size (logical/physical): 512 bytes / 4096 bytes
720		-	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
721		-	```
722		-
723		-	为了测试它，我暂时挂载到了这里：
724		-
725		-	```bash
726		-	/dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0
727		-	```
728		-
729		-	接下来，我认为我需要开始设计我的迁移改造计划。
730		-
731		-	为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5，也就是nvme2n1，也就是 /swarm-vol 的快的固态的特性，又能发挥 /dev/sda 大的优点，我计划这样设计：
732		-
733		-	使用 bcache 系统，让 /dev/sda 作为真正的存储设备，再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘，同时开启写入缓存和阅读缓存，这样我就拥有又大有快的存储了。
734		-
735		-	考虑到我的缓存盘非常大（上面的信息可以得出，它足足有 6.99 TB 对吧？），我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠，它几乎不会损坏，我也不担心短暂的数据丢失。我又不是银行，就都是电影。
736		-
737		-	接下来，为了方便迁移，我开始设计我的迁移计划：
738		-
739		-	# 阶段概要
740		-
741		-	## 第一阶段 - 双数据阶段
742		-
743		-	将 sda 格式化清空，作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。
744		-
745		-	## 第二阶段 - 暂停业务阶段
746		-
747		-	将业务暂停，然后我最后运行一次rsync。这次rsync应该会跑得很快，因为只产生了增量数据差异。此时此刻，nvme2n1 （ext4）的数据，和 sda （bacache的后端）的数据是完全相同的。
748		-
749		-	## 第三阶段 - 重构存储阶段
750		-
751		-	将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘，挂载到 /swarm-vol，实现业务无感。然后重启业务。
752		-
753		-	注意：我没有任何额外的新空间可以用于备份！所以我的命令必须一次成功！一旦失败我们将万劫不复！
754		-
755		-	## 第一阶段
756		-
757		-	接下来，我要开始第一阶段的迁移了。我第一阶段计划这么做：
758		-
759		-	目标
760		-
761		-	* 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端（backing device）。
762		-	* 先不动现有 /dev/nvme2n1（现挂载于 /swarm-vol）上的业务数据，让业务继续运行。
763		-	* 格式化出的 /dev/bcache0 上创建一个文件系统（例如 ext4），然后将现有数据从 /swarm-vol 同步到这个新地方。
764		-	* 这是“第一阶段”，意在让 /dev/sda 上也有一份业务数据拷贝，从而腾出后续的操作空间。
765		-
766		-	结果
767		-
768		-	* 最终会拥有两份数据：
769		-	* 原始：/swarm-vol（在 /dev/nvme2n1 上）
770		-	* 新的：/mnt/bcache（对应 /dev/bcache0，后端实际上是 /dev/sda）
771		-	* 业务不中断
772		-
773		-	我可以让服务继续使用 /swarm-vol，只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。
774		-	在第一阶段结束后，等我准备好，可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。
775		-
776		-	```bash
777		-	# 安装 bcache-tools
778		-	sudo apt install bcache-tools
779		-
780		-	# 仅示例，注意操作前先确认 /dev/sda 确实空置
781		-	# (在 fdisk 交互式命令中，删除旧分区、新建分区)
782		-	sudo fdisk /dev/sda
783		-
784		-	# 使用 wipefs 清除 sda 上的所有签名
785		-	sudo wipefs -a /dev/sda
786		-
787		-	# 创建 bcache 后端
788		-	sudo make-bcache -B /dev/sda
789		-
790		-	# 如果在 fdisk 里没有找到 /dev/bcache0，可以尝试
791		-	# 重新加载内核模块:
792		-	sudo modprobe bcache
793		-
794		-	# 如果还是没有，尝试手工创建
795		-	sudo echo /dev/sda > /sys/fs/bcache/register
796		-
797		-	# 确认后端创建成功
798		-	# UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c
799		-	# Set UUID: 01442457-240d-4bf4-8140-b7a647659beb
800		-	# version: 1
801		-	# block_size: 1
802		-	# data_offset: 16
803		-
804		-	# 格式化后端
805		-	ls -ashl /dev/bcache0
806		-	sudo mkfs.ext4 /dev/bcache0
807		-
808		-	# 创建挂载点
809		-	sudo mkdir /mnt/bcache
810		-
811		-	# 挂载 bcache 后端
812		-	sudo mount /dev/bcache0 /mnt/bcache
813		-
814		-	# 确认挂载成功
815		-	cd /mnt/bcache
816		-
817		-	# 确认挂载成功
818		-	df . -Th
819		-
820		-	# (确认挂载成功后，开始 rsync)
821		-	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
822		-
823		-	# 同步 nextcloud 文件夹
824		-	sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/
825		-	```
826		-
827		-	## 第二阶段 - 暂停业务并做最终同步
828		-
829		-	在这一阶段，我将：
830		-	1. 暂停业务，使其不再写入 `/swarm-vol`（也就是旧的 nvme2n1）。
831		-	2. 做最后一次增量 rsync，保证数据在 /dev/bcache0（后端 sda）上与旧数据完全一致。
832		-	3. 卸载旧的 `/swarm-vol`，改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`，这样就完成了切换。
833		-
834		-	示例脚本（在生产环境中，请根据自己实际服务的暂停方式作相应调整）：
835		-
836		-	```bash
837		-	# 1) 暂停业务
838		-	echo "停止相关业务/服务 (示例：docker-compose 或 systemctl stop 等)"
839		-	docker-compose down
840		-	sudo reboot # 重启服务器，确保业务不再写入
841		-
842		-	# 2) 做最后一次增量同步
843		-	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
844		-	sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/
845		-
846		-	# 3) 切换挂载点
847		-	sudo umount /swarm-vol
848		-
849		-	echo "将 bcache0 挂载为新的 /swarm-vol..."
850		-	sudo mount /dev/bcache0 /swarm-vol
851		-
852		-	echo "检查挂载..."
853		-	df -Th /swarm-vol
854		-
855		-	echo "请人工确认 /swarm-vol 中的数据完整性；若无误，可以继续。"
856		-	```
857		-
858		-	在执行完成后，`/swarm-vol` 已经切换到基于 `/dev/bcache0`（后端是 `/dev/sda`）的存储，业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务，但仍在物理上保留（尚未被清空）。
859		-
860		-	---
861		-
862		-	## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备
863		-
864		-	在这一阶段，我将：
865		-	1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。
866		-	2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。
867		-	3. 将缓存盘附加到已经存在的 bcache 后端（即 `/dev/sda`）上，使两者变为真正的 “大容量 + SSD 缓存” 组合。
868		-	4. 根据需求，启用写回缓存（writeback）等激进模式。
869		-
870		-	示例脚本：
871		-
872		-	```bash
873		-	# 1) 确认当前 /swarm-vol 已经是 /dev/bcache0，且业务正常
874		-	# （需人工自行验证，确认数据已在 /dev/sda + /dev/bcache0 上）
875		-	# 此时可以停一下业务，或保持低负载也行，避免写入影响。
876		-
877		-	# 2) 清空 nvme2n1 （原来的 /swarm-vol）注意，这将销毁原数据！
878		-	echo "准备清空 /dev/nvme2n1..."
879		-	sudo umount /dev/nvme2n1 \|\| true # 若尚未卸载，可忽略报错
880		-	sudo wipefs -a /dev/nvme2n1
881		-
882		-	# 3) 将 nvme2n1 作为缓存盘初始化
883		-	echo "对 /dev/nvme2n1 执行 make-bcache -C（cache）..."
884		-	#在这个例子里，默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配（通常是512或者4k）。bucket的大小应该与缓存设备的擦除block大小匹配（以减少写入放大）。例如，如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配，命令就应该是这样的：
885		-	# sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1
886		-	# 如果你需要查看 /dev/sda （也就是后端）的 block size，可以使用 fdisk -l /dev/sda 等命令。
887		-	# 如果你需要查看 /dev/nvme2n1 的擦除块大小，可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M
888		-	sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1
889		-
890		-	echo "检查生成的缓存盘信息..."
891		-	sudo bcache-super-show /dev/nvme2n1 \| grep -E "cset.uuid\|dev.uuid"
892		-
893		-	# 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555
894		-	# (这里仅演示，我需要看实际输出)
895		-
896		-	CACHE_UUID="(此处填上实际的 cset.uuid)"
897		-
898		-	# 4) 将缓存设备附加到现有的 /dev/bcache0（后端 /dev/sda）
899		-	# /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认
900		-	echo "附加缓存到现有 bcache 后端..."
901		-	echo "$CACHE_UUID" \| sudo tee /sys/block/bcache0/bcache/attach
902		-
903		-	# 如果我看到 echo: write error: Invalid argument，通常是 block size 不匹配等问题
904		-	# 如果成功，则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现
905		-
906		-	# 5) 为 bcache0 启用写回缓存模式（可选）
907		-	echo "启用写回 (writeback) 缓存模式..."
908		-	echo writeback \| sudo tee /sys/block/bcache0/bcache/cache_mode
909		-
910		-	# 可选：关闭顺序IO绕过等更激进的做法
911		-	# echo 0 \| sudo tee /sys/block/bcache0/bcache/sequential_cutoff
912		-	# echo 0 \| sudo tee /sys/block/bcache0/bcache/writeback_percent
913		-
914		-	# 6) 确认缓存已生效
915		-	echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol，并检查 sysfs 等信息："
916		-	mount \| grep /swarm-vol
917		-	ls -l /sys/block/bcache0/bcache
918		-	```
919		-
920		-	至此，我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作，并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括：
921		-
922		-	1. 开机自动挂载
923		-	- 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。
924		-	- 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device（以免重启后没了 /dev/bcache0）。在 Ubuntu 下，一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。
925		-
926		-	在 `/etc/fsabt` 中添加：
927		-
928		-	```bash
929		-	# 删除旧的 /swarm-vol 挂载
930		-	# /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
931		-	# 然后添加新的 /swarm-vol 挂载
932		-	/dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0
933		-	```
934		-
935		-	2. 确认写回模式的风险
936		-	- 写回模式（writeback）可以大幅提高速度，但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好，且并不特别在意短期丢失风险，可以大胆使用。
937		-
938		-	3. 调优与监控
939		-	- 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。
940		-	- 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。
941		-
942		-	完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。
943		-
944	301		使用下面的命令检查其状态：
945	302
946	303		```bash

anduin revised this gist 1737378171. Go to revision

1 file changed, 460 insertions

plan.md

			@@ -1,3 +1,306 @@
	1	+	最近，我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。
	2	+
	3	+	```bash
	4	+	anduin@ms-server:~$ cd /swarm-vol/
	5	+	anduin@ms-server:/swarm-vol$ df . -Th
	6	+	Filesystem Type Size Used Avail Use% Mounted on
	7	+	/dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol
	8	+	anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/
	9	+	anduin@ms-server:/swarm-vol/nextcloud$ df . -Th
	10	+	Filesystem Type Size Used Avail Use% Mounted on
	11	+	/dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud
	12	+	anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l
	13	+	Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors
	14	+	Disk model: INTEL SSDPED1D480GA
	15	+	Units: sectors of 1 * 512 = 512 bytes
	16	+	Sector size (logical/physical): 512 bytes / 512 bytes
	17	+	I/O size (minimum/optimal): 512 bytes / 512 bytes
	18	+	Disklabel type: gpt
	19	+	Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4
	20	+
	21	+	Device Start End Sectors Size Type
	22	+	/dev/nvme1n1p1 2048 1050623 1048576 512M EFI System
	23	+	/dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem
	24	+
	25	+
	26	+	Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors
	27	+	Disk model: WUS4BB076D7P3E3
	28	+	Units: sectors of 1 * 4096 = 4096 bytes
	29	+	Sector size (logical/physical): 4096 bytes / 4096 bytes
	30	+	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
	31	+
	32	+
	33	+	Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
	34	+	Disk model: CT1000P3PSSD8
	35	+	Units: sectors of 1 * 512 = 512 bytes
	36	+	Sector size (logical/physical): 512 bytes / 512 bytes
	37	+	I/O size (minimum/optimal): 512 bytes / 512 bytes
	38	+	anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/
	39	+	anduin@ms-server:/dev/disk/by-uuid$ ls -ashl
	40	+	total 0
	41	+	0 drwxr-xr-x 2 root root 140 Jan 17 15:21 .
	42	+	0 drwxr-xr-x 7 root root 140 Dec 28 05:45 ..
	43	+	0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1
	44	+	0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1
	45	+	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1
	46	+	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2
	47	+	anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab
	48	+	# /etc/fstab: static file system information.
	49	+	#
	50	+	# Use 'blkid' to print the universally unique identifier for a
	51	+	# device; this may be used with UUID= as a more robust way to name devices
	52	+	# that works even if disks are added and removed. See fstab(5).
	53	+	#
	54	+	# <file system> <mount point> <type> <options> <dump> <pass>
	55	+	UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1
	56	+	UUID=9C58-514E /boot/efi vfat umask=0077 0 1
	57	+	/dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
	58	+	/dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0
	59	+	/swapfile none swap sw 0 0
	60	+	```
	61	+
	62	+	由上面的信息，不难判断出：
	63	+
	64	+	我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ，当然这和我们要调查的内容没什么关系。
	65	+
	66	+	我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5`
	67	+
	68	+	即使我暂时使用奇技淫巧，将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下，还是濒临不够了。
	69	+
	70	+	但是，幸运的是，我购买了一个全新的大而慢的机械硬盘：
	71	+
	72	+	```bash
	73	+	Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors
	74	+	Disk model: RAID5
	75	+	Units: sectors of 1 * 512 = 512 bytes
	76	+	Sector size (logical/physical): 512 bytes / 4096 bytes
	77	+	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
	78	+	```
	79	+
	80	+	为了测试它，我暂时挂载到了这里：
	81	+
	82	+	```bash
	83	+	/dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0
	84	+	```
	85	+
	86	+	接下来，我认为我需要开始设计我的迁移改造计划。
	87	+
	88	+	为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5，也就是nvme2n1，也就是 /swarm-vol 的快的固态的特性，又能发挥 /dev/sda 大的优点，我计划这样设计：
	89	+
	90	+	使用 bcache 系统，让 /dev/sda 作为真正的存储设备，再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘，同时开启写入缓存和阅读缓存，这样我就拥有又大有快的存储了。
	91	+
	92	+	考虑到我的缓存盘非常大（上面的信息可以得出，它足足有 6.99 TB 对吧？），我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠，它几乎不会损坏，我也不担心短暂的数据丢失。我又不是银行，就都是电影。
	93	+
	94	+	接下来，为了方便迁移，我开始设计我的迁移计划：
	95	+
	96	+	# 阶段概要
	97	+
	98	+	## 第一阶段 - 双数据阶段
	99	+
	100	+	将 sda 格式化清空，作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。
	101	+
	102	+	## 第二阶段 - 暂停业务阶段
	103	+
	104	+	将业务暂停，然后我最后运行一次rsync。这次rsync应该会跑得很快，因为只产生了增量数据差异。此时此刻，nvme2n1 （ext4）的数据，和 sda （bacache的后端）的数据是完全相同的。
	105	+
	106	+	## 第三阶段 - 重构存储阶段
	107	+
	108	+	将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘，挂载到 /swarm-vol，实现业务无感。然后重启业务。
	109	+
	110	+	注意：我没有任何额外的新空间可以用于备份！所以我的命令必须一次成功！一旦失败我们将万劫不复！
	111	+
	112	+	## 第一阶段
	113	+
	114	+	接下来，我要开始第一阶段的迁移了。我第一阶段计划这么做：
	115	+
	116	+	目标
	117	+
	118	+	* 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端（backing device）。
	119	+	* 先不动现有 /dev/nvme2n1（现挂载于 /swarm-vol）上的业务数据，让业务继续运行。
	120	+	* 格式化出的 /dev/bcache0 上创建一个文件系统（例如 ext4），然后将现有数据从 /swarm-vol 同步到这个新地方。
	121	+	* 这是“第一阶段”，意在让 /dev/sda 上也有一份业务数据拷贝，从而腾出后续的操作空间。
	122	+
	123	+	结果
	124	+
	125	+	* 最终会拥有两份数据：
	126	+	* 原始：/swarm-vol（在 /dev/nvme2n1 上）
	127	+	* 新的：/mnt/bcache（对应 /dev/bcache0，后端实际上是 /dev/sda）
	128	+	* 业务不中断
	129	+
	130	+	我可以让服务继续使用 /swarm-vol，只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。
	131	+	在第一阶段结束后，等我准备好，可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。
	132	+
	133	+	```bash
	134	+	# 安装 bcache-tools
	135	+	sudo apt install bcache-tools
	136	+
	137	+	# 仅示例，注意操作前先确认 /dev/sda 确实空置
	138	+	# (在 fdisk 交互式命令中，删除旧分区、新建分区)
	139	+	sudo fdisk /dev/sda
	140	+
	141	+	# 使用 wipefs 清除 sda 上的所有签名
	142	+	sudo wipefs -a /dev/sda
	143	+
	144	+	# 创建 bcache 后端
	145	+	sudo make-bcache -B /dev/sda
	146	+
	147	+	# 如果在 fdisk 里没有找到 /dev/bcache0，可以尝试
	148	+	# 重新加载内核模块:
	149	+	sudo modprobe bcache
	150	+
	151	+	# 如果还是没有，尝试手工创建
	152	+	sudo echo /dev/sda > /sys/fs/bcache/register
	153	+
	154	+	# 确认后端创建成功
	155	+	# UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c
	156	+	# Set UUID: 01442457-240d-4bf4-8140-b7a647659beb
	157	+	# version: 1
	158	+	# block_size: 1
	159	+	# data_offset: 16
	160	+
	161	+	# 格式化后端
	162	+	ls -ashl /dev/bcache0
	163	+	sudo mkfs.ext4 /dev/bcache0
	164	+
	165	+	# 创建挂载点
	166	+	sudo mkdir /mnt/bcache
	167	+
	168	+	# 挂载 bcache 后端
	169	+	sudo mount /dev/bcache0 /mnt/bcache
	170	+
	171	+	# 确认挂载成功
	172	+	cd /mnt/bcache
	173	+
	174	+	# 确认挂载成功
	175	+	df . -Th
	176	+
	177	+	# (确认挂载成功后，开始 rsync)
	178	+	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
	179	+
	180	+	# 同步 nextcloud 文件夹
	181	+	sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/
	182	+	```
	183	+
	184	+	## 第二阶段 - 暂停业务并做最终同步
	185	+
	186	+	在这一阶段，我将：
	187	+	1. 暂停业务，使其不再写入 `/swarm-vol`（也就是旧的 nvme2n1）。
	188	+	2. 做最后一次增量 rsync，保证数据在 /dev/bcache0（后端 sda）上与旧数据完全一致。
	189	+	3. 卸载旧的 `/swarm-vol`，改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`，这样就完成了切换。
	190	+
	191	+	示例脚本（在生产环境中，请根据自己实际服务的暂停方式作相应调整）：
	192	+
	193	+	```bash
	194	+	# 1) 暂停业务
	195	+	echo "停止相关业务/服务 (示例：docker-compose 或 systemctl stop 等)"
	196	+	docker-compose down
	197	+	sudo reboot # 重启服务器，确保业务不再写入
	198	+
	199	+	# 2) 做最后一次增量同步
	200	+	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
	201	+	sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/
	202	+
	203	+	# 3) 切换挂载点
	204	+	sudo umount /swarm-vol
	205	+
	206	+	echo "将 bcache0 挂载为新的 /swarm-vol..."
	207	+	sudo mount /dev/bcache0 /swarm-vol
	208	+
	209	+	echo "检查挂载..."
	210	+	df -Th /swarm-vol
	211	+
	212	+	echo "请人工确认 /swarm-vol 中的数据完整性；若无误，可以继续。"
	213	+	```
	214	+
	215	+	在执行完成后，`/swarm-vol` 已经切换到基于 `/dev/bcache0`（后端是 `/dev/sda`）的存储，业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务，但仍在物理上保留（尚未被清空）。
	216	+
	217	+	---
	218	+
	219	+	## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备
	220	+
	221	+	在这一阶段，我将：
	222	+	1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。
	223	+	2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。
	224	+	3. 将缓存盘附加到已经存在的 bcache 后端（即 `/dev/sda`）上，使两者变为真正的 “大容量 + SSD 缓存” 组合。
	225	+	4. 根据需求，启用写回缓存（writeback）等激进模式。
	226	+
	227	+	示例脚本：
	228	+
	229	+	```bash
	230	+	# 1) 确认当前 /swarm-vol 已经是 /dev/bcache0，且业务正常
	231	+	# （需人工自行验证，确认数据已在 /dev/sda + /dev/bcache0 上）
	232	+	# 此时可以停一下业务，或保持低负载也行，避免写入影响。
	233	+
	234	+	# 2) 清空 nvme2n1 （原来的 /swarm-vol）注意，这将销毁原数据！
	235	+	echo "准备清空 /dev/nvme2n1..."
	236	+	sudo umount /dev/nvme2n1 \|\| true # 若尚未卸载，可忽略报错
	237	+	sudo wipefs -a /dev/nvme2n1
	238	+
	239	+	# 3) 将 nvme2n1 作为缓存盘初始化
	240	+	echo "对 /dev/nvme2n1 执行 make-bcache -C（cache）..."
	241	+	#在这个例子里，默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配（通常是512或者4k）。bucket的大小应该与缓存设备的擦除block大小匹配（以减少写入放大）。例如，如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配，命令就应该是这样的：
	242	+	# sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1
	243	+	# 如果你需要查看 /dev/sda （也就是后端）的 block size，可以使用 fdisk -l /dev/sda 等命令。
	244	+	# 如果你需要查看 /dev/nvme2n1 的擦除块大小，可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M
	245	+	sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1
	246	+
	247	+	echo "检查生成的缓存盘信息..."
	248	+	sudo bcache-super-show /dev/nvme2n1 \| grep -E "cset.uuid\|dev.uuid"
	249	+
	250	+	# 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555
	251	+	# (这里仅演示，我需要看实际输出)
	252	+
	253	+	CACHE_UUID="(此处填上实际的 cset.uuid)"
	254	+
	255	+	# 4) 将缓存设备附加到现有的 /dev/bcache0（后端 /dev/sda）
	256	+	# /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认
	257	+	echo "附加缓存到现有 bcache 后端..."
	258	+	echo "$CACHE_UUID" \| sudo tee /sys/block/bcache0/bcache/attach
	259	+
	260	+	# 如果我看到 echo: write error: Invalid argument，通常是 block size 不匹配等问题
	261	+	# 如果成功，则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现
	262	+
	263	+	# 5) 为 bcache0 启用写回缓存模式（可选）
	264	+	echo "启用写回 (writeback) 缓存模式..."
	265	+	echo writeback \| sudo tee /sys/block/bcache0/bcache/cache_mode
	266	+
	267	+	# 可选：关闭顺序IO绕过等更激进的做法
	268	+	# echo 0 \| sudo tee /sys/block/bcache0/bcache/sequential_cutoff
	269	+	# echo 0 \| sudo tee /sys/block/bcache0/bcache/writeback_percent
	270	+
	271	+	# 6) 确认缓存已生效
	272	+	echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol，并检查 sysfs 等信息："
	273	+	mount \| grep /swarm-vol
	274	+	ls -l /sys/block/bcache0/bcache
	275	+	```
	276	+
	277	+	至此，我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作，并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括：
	278	+
	279	+	1. 开机自动挂载
	280	+	- 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。
	281	+	- 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device（以免重启后没了 /dev/bcache0）。在 Ubuntu 下，一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。
	282	+
	283	+	在 `/etc/fsabt` 中添加：
	284	+
	285	+	```bash
	286	+	# 删除旧的 /swarm-vol 挂载
	287	+	# /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
	288	+	# 然后添加新的 /swarm-vol 挂载
	289	+	/dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0
	290	+	```
	291	+
	292	+	2. 确认写回模式的风险
	293	+	- 写回模式（writeback）可以大幅提高速度，但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好，且并不特别在意短期丢失风险，可以大胆使用。
	294	+
	295	+	3. 调优与监控
	296	+	- 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。
	297	+	- 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。
	298	+
	299	+	完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。
	300	+
	301	+	使用下面的命令检查其状态:
	302	+
	303	+	```bash
1	304		Related articles end}}
2	305
3	306		[https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual].
			@@ -638,6 +941,163 @@ ls -l /sys/block/bcache0/bcache
638	941
639	942		完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。
640	943
	944	+	使用下面的命令检查其状态：
	945	+
	946	+	```bash
	947	+	anduin@ms-server:/sys/block/bcache0/bcache$ ls
	948	+	attach dirty_data sequential_cutoff stripe_size writeback_rate_fp_term_low
	949	+	backing_dev_name io_disable state writeback_consider_fragment writeback_rate_fp_term_mid
	950	+	backing_dev_uuid io_error_limit stats_day writeback_delay writeback_rate_i_term_inverse
	951	+	cache io_errors stats_five_minute writeback_metadata writeback_rate_minimum
	952	+	cache_mode label stats_hour writeback_percent writeback_rate_p_term_inverse
	953	+	clear_stats partial_stripes_expensive stats_total writeback_rate writeback_rate_update_seconds
	954	+	detach readahead_cache_policy stop writeback_rate_debug writeback_running
	955	+	dev running stop_when_cache_set_failed writeback_rate_fp_term_high
	956	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./running
	957	+	1
	958	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./state
	959	+	dirty
	960	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./dirty_data
	961	+	775.9M
	962	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./writeback_running
	963	+	1
	964	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./backing_dev_name
	965	+	sda
	966	+	anduin@ms-server:/sys/block/bcache0/bcache$ cat ./cache_mode
	967	+	writethrough [writeback] writearound none
	968	+	anduin@ms-server:/sys/block/bcache0/bcache$ cd ./cache
	969	+	anduin@ms-server:/sys/block/bcache0/bcache/cache$ ls
	970	+	average_key_size bucket_size congested flash_vol_create journal_delay_ms stats_hour tree_depth
	971	+	bdev0 cache0 congested_read_threshold_us internal root_usage_percent stats_total unregister
	972	+	block_size cache_available_percent congested_write_threshold_us io_error_halflife stats_day stop
	973	+	btree_cache_size clear_stats errors io_error_limit stats_five_minute synchronous
	974	+	anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./errors
	975	+	[unregister] panic
	976	+	anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./bucket_size
	977	+	512.0k
	978	+	anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./block_size
	979	+	0.5k
	980	+	anduin@ms-server:/sys/block/bcache0/bcache/cache$ cd ./stats_day/
	981	+	anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ ls
	982	+	bypassed cache_bypass_hits cache_bypass_misses cache_hit_ratio cache_hits cache_miss_collisions cache_misses
	983	+	anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_hit_ratio
	984	+	4
	985	+	anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_hits
	986	+	11611
	987	+	anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_misses
	988	+	269927
	989	+	anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cd /swarm-vol/
	990	+	anduin@ms-server:/swarm-vol$ df . -Th
	991	+	Filesystem Type Size Used Avail Use% Mounted on
	992	+	/dev/bcache0 ext4 58T 6.7T 49T 13% /swarm-vol
	993	+
	994	+	```
	995	+
	996	+	# If unable to run `wipefs` on a device due to `Device or resource busy` error
	997	+
	998	+	The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue:
	999	+
	1000	+	---
	1001	+
	1002	+	### 1. Check if the device is mounted
	1003	+	Run:
	1004	+	```bash
	1005	+	mount \| grep /dev/nvme1n1
	1006	+	```
	1007	+	If it is mounted, unmount it:
	1008	+	```bash
	1009	+	sudo umount /dev/nvme1n1
	1010	+	```
	1011	+
	1012	+	---
	1013	+
	1014	+	### 2. Check for active partitions
	1015	+	If any partitions on `/dev/nvme1n1` are in use, they need to be unmounted:
	1016	+	```bash
	1017	+	lsblk
	1018	+	```
	1019	+	Unmap active partitions:
	1020	+	```bash
	1021	+	sudo umount /dev/nvme1n1pX # Replace "X" with the partition number
	1022	+	```
	1023	+
	1024	+	---
	1025	+
	1026	+	### 4. Check for `bcache` association
	1027	+	The presence of `bcache0` suggests `bcache` is in use. Verify:
	1028	+	```bash
	1029	+	sudo bcache-super-show /dev/nvme1n1
	1030	+	```
	1031	+	If it is associated, unregister it:
	1032	+	```bash
	1033	+	echo 1 \| sudo tee /sys/block/bcacheX/bcache/stop # Replace "bcacheX" appropriately
	1034	+	```
	1035	+	Clear the `bcache` superblock:
	1036	+	```bash
	1037	+	sudo wipefs -a /dev/nvme1n1
	1038	+	```
	1039	+
	1040	+	---
	1041	+
	1042	+	### 5. Check for swap usage
	1043	+	If the device is used as swap:
	1044	+	```bash
	1045	+	cat /proc/swaps
	1046	+	sudo swapoff /dev/nvme1n1
	1047	+	```
	1048	+
	1049	+	---
	1050	+
	1051	+	### 6. Check for open file descriptors
	1052	+	List processes using the device:
	1053	+	```bash
	1054	+	sudo lsof \| grep /dev/nvme1n1
	1055	+	```
	1056	+	Kill the processes if necessary:
	1057	+	```bash
	1058	+	sudo kill -9 <PID>
	1059	+	```
	1060	+
	1061	+	---
	1062	+
	1063	+	### 7. Retry `wipefs`
	1064	+	Once the device is no longer in use, retry:
	1065	+	```bash
	1066	+	sudo wipefs -a /dev/nvme1n1
	1067	+	```
	1068	+
	1069	+	If issues persist, let me know the exact usage scenario, and I can assist further!
	1070	+
	1071	+	## If bcache device not showing up on fdisk
	1072	+
	1073	+	First, try some clean-up:
	1074	+
	1075	+	```
	1076	+	echo $cset_uuid \| sudo tee /sys/fs/bcache/pendings_cleanup
	1077	+	echo $backing_uuid \| sudo tee /sys/fs/bcache/pendings_cleanup
	1078	+	```
	1079	+
	1080	+	Use bcache-super-show to get the uuids.
	1081	+
	1082	+	Then try again to register:
	1083	+
	1084	+	```bash
	1085	+	echo $cset_uuid \| sudo tee /sys/fs/bcache/register
	1086	+	echo $backing_uuid \| sudo tee /sys/fs/bcache/register
	1087	+	```
	1088	+
	1089	+	The cache uuid should exist in /dev/fs/bcache if the cache device is successfully registered.
	1090	+
	1091	+	If bcache-super-show says that that the backing dev.data.cache_state state is clean and the cset.uuid consists only of zeros, the bcache device is in the invalid state and must be recreated. [source]
	1092	+
	1093	+	However, if clean, you could try force-starting the backing device without cache device:
	1094	+
	1095	+	```b
	1096	+	echo 1 \| sudo tee /sys/class/block/$dev/bcache/running
	1097	+
	1098	+
	1099	+
	1100	+
641	1101		# If unable to run `wipefs` on a device due to `Device or resource busy` error
642	1102
643	1103		The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue:

anduin revised this gist 1737378071. Go to revision

1 file changed, 116 insertions, 6 deletions

plan.md

			@@ -484,6 +484,13 @@ sudo wipefs -a /dev/sda
484	484		# 创建 bcache 后端
485	485		sudo make-bcache -B /dev/sda
486	486
	487	+	# 如果在 fdisk 里没有找到 /dev/bcache0，可以尝试
	488	+	# 重新加载内核模块:
	489	+	sudo modprobe bcache
	490	+
	491	+	# 如果还是没有，尝试手工创建
	492	+	sudo echo /dev/sda > /sys/fs/bcache/register
	493	+
487	494		# 确认后端创建成功
488	495		# UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c
489	496		# Set UUID: 01442457-240d-4bf4-8140-b7a647659beb
			@@ -514,10 +521,6 @@ sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
514	521		sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/
515	522		```
516	523
517		-	下面给出示例脚本，供我在“第二阶段”和“第三阶段”中参考使用。思路与第一阶段相同：谨慎操作、一次成功，避免数据丢失。
518		-
519		-	---
520		-
521	524		## 第二阶段 - 暂停业务并做最终同步
522	525
523	526		在这一阶段，我将：
			@@ -575,9 +578,11 @@ sudo wipefs -a /dev/nvme2n1
575	578
576	579		# 3) 将 nvme2n1 作为缓存盘初始化
577	580		echo "对 /dev/nvme2n1 执行 make-bcache -C（cache）..."
578		-	sudo make-bcache -C /dev/nvme2n1
579		-	# 若有需要，可以带上 --block/--bucket 参数。例如：
	581	+	#在这个例子里，默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配（通常是512或者4k）。bucket的大小应该与缓存设备的擦除block大小匹配（以减少写入放大）。例如，如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配，命令就应该是这样的：
580	582		# sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1
	583	+	# 如果你需要查看 /dev/sda （也就是后端）的 block size，可以使用 fdisk -l /dev/sda 等命令。
	584	+	# 如果你需要查看 /dev/nvme2n1 的擦除块大小，可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M
	585	+	sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1
581	586
582	587		echo "检查生成的缓存盘信息..."
583	588		sudo bcache-super-show /dev/nvme2n1 \| grep -E "cset.uuid\|dev.uuid"
			@@ -632,3 +637,108 @@ ls -l /sys/block/bcache0/bcache
632	637		- 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。
633	638
634	639		完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。
	640	+
	641	+	# If unable to run `wipefs` on a device due to `Device or resource busy` error
	642	+
	643	+	The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue:
	644	+
	645	+	---
	646	+
	647	+	### 1. Check if the device is mounted
	648	+	Run:
	649	+	```bash
	650	+	mount \| grep /dev/nvme1n1
	651	+	```
	652	+	If it is mounted, unmount it:
	653	+	```bash
	654	+	sudo umount /dev/nvme1n1
	655	+	```
	656	+
	657	+	---
	658	+
	659	+	### 2. Check for active partitions
	660	+	If any partitions on `/dev/nvme1n1` are in use, they need to be unmounted:
	661	+	```bash
	662	+	lsblk
	663	+	```
	664	+	Unmap active partitions:
	665	+	```bash
	666	+	sudo umount /dev/nvme1n1pX # Replace "X" with the partition number
	667	+	```
	668	+
	669	+	---
	670	+
	671	+	### 4. Check for `bcache` association
	672	+	The presence of `bcache0` suggests `bcache` is in use. Verify:
	673	+	```bash
	674	+	sudo bcache-super-show /dev/nvme1n1
	675	+	```
	676	+	If it is associated, unregister it:
	677	+	```bash
	678	+	echo 1 \| sudo tee /sys/block/bcacheX/bcache/stop # Replace "bcacheX" appropriately
	679	+	```
	680	+	Clear the `bcache` superblock:
	681	+	```bash
	682	+	sudo wipefs -a /dev/nvme1n1
	683	+	```
	684	+
	685	+	---
	686	+
	687	+	### 5. Check for swap usage
	688	+	If the device is used as swap:
	689	+	```bash
	690	+	cat /proc/swaps
	691	+	sudo swapoff /dev/nvme1n1
	692	+	```
	693	+
	694	+	---
	695	+
	696	+	### 6. Check for open file descriptors
	697	+	List processes using the device:
	698	+	```bash
	699	+	sudo lsof \| grep /dev/nvme1n1
	700	+	```
	701	+	Kill the processes if necessary:
	702	+	```bash
	703	+	sudo kill -9 <PID>
	704	+	```
	705	+
	706	+	---
	707	+
	708	+	### 7. Retry `wipefs`
	709	+	Once the device is no longer in use, retry:
	710	+	```bash
	711	+	sudo wipefs -a /dev/nvme1n1
	712	+	```
	713	+
	714	+	If issues persist, let me know the exact usage scenario, and I can assist further!
	715	+
	716	+	## If bcache device not showing up on fdisk
	717	+
	718	+	2
	719	+
	720	+	First, try some clean-up:
	721	+
	722	+	```
	723	+	echo $cset_uuid \| sudo tee /sys/fs/bcache/pendings_cleanup
	724	+	echo $backing_uuid \| sudo tee /sys/fs/bcache/pendings_cleanup
	725	+	```
	726	+
	727	+	Use bcache-super-show to get the uuids.
	728	+
	729	+	Then try again to register:
	730	+
	731	+	```bash
	732	+	echo $cset_uuid \| sudo tee /sys/fs/bcache/register
	733	+	echo $backing_uuid \| sudo tee /sys/fs/bcache/register
	734	+	```
	735	+
	736	+	The cache uuid should exist in /dev/fs/bcache if the cache device is successfully registered.
	737	+
	738	+	If bcache-super-show says that that the backing dev.data.cache_state state is clean and the cset.uuid consists only of zeros, the bcache device is in the invalid state and must be recreated. [source]
	739	+
	740	+	However, if clean, you could try force-starting the backing device without cache device:
	741	+
	742	+	```bash
	743	+	echo 1 \| sudo tee /sys/class/block/$dev/bcache/running
	744	+	```

anduin revised this gist 1737267957. Go to revision

1 file changed, 634 insertions

plan.md(file created)

		@@ -0,0 +1,634 @@
1	+	Related articles end}}
2	+
3	+	[https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual].
4	+
5	+	{{Tip\|An alternative to Bcache is the [[LVM#Cache\|LVM cache]].}}
6	+
7	+	Bcache needs the backing device to be formatted as a bcache block device. In most cases, [https://github.com/g2p/blocks blocks to-bcache] can do an in-place conversion.
8	+
9	+	{{Out of date\|Any source for bcache with btrfs causing corruption in 2024? The linked blog has no extra details }}
10	+
11	+	{{Warning\|1=<nowiki/>
12	+	* Be sure you back up any important data first.
13	+	* Bcache and [[btrfs]] could leave you with a corrupted filesystem. Please visit [https://www.hdevalence.ca/blog/2013-09-21-notes-on-my-archlinux-install this post] for more information. Btrfs wiki reports that it was fixed in kernels 3.19+ [https://btrfs.wiki.kernel.org/index.php/Gotchas#Historical_references].
14	+	}}
15	+
16	+	== Setting up bcached btrfs file systems on an existing system ==
17	+
18	+	{{Warning\|make-bcache '''will not''' import an existing drive or partition – it will reformat it.}}
19	+
20	+	=== Preparation ===
21	+
22	+	[[Install]] {{AUR\|bcache-tools}}.
23	+
24	+	Use fdisk to create the appropriate partitions on the SSD's and hard drives to hold the cache and the backing data.
25	+	{{Tip\| It is possible to create many partitions on a single drive. This allows for testing of elaborate setups before committing. Be aware all data will be lost when the drive fails. This will also kill performance of the drive, due to unfavorable access patterns.}}
26	+
27	+	=== Situation: 1 hard drive and 1 read cache SSD ===
28	+
29	+	{{Warning\|
30	+	* When a single hard drive fails, all data is lost.
31	+	* Do not enable write caching, as that can cause data loss when the SSD fails
32	+	}}
33	+	+--------------+
34	+	\| btrfs /mnt \|
35	+	+--------------+
36	+	\| /dev/Bcache0 \|
37	+	+--------------+
38	+	\| Cache \|
39	+	\| /dev/sdk1 \|
40	+	+--------------+
41	+	\| Data \|
42	+	\| /dev/sdv1 \|
43	+	+--------------+
44	+
45	+	1. Format the backing device (This will typically be your mechanical drive). The backing device can be a whole device, a partition or any other standard block device. This will create /dev/bcache0
46	+
47	+	# make-bcache -B /dev/sdv1
48	+
49	+	2. Format the cache device (This will typically be your SSD). The cache device can be a whole device, a partition or any other standard block device
50	+
51	+	# make-bcache -C /dev/sdk1
52	+
53	+	In this example the default block and bucket sizes of 512B and 128kB are used. The block size should match the backing devices sector size which will usually be either 512 or 4k. The bucket size should match the erase block size of the caching device with the intent of reducing write amplification. For example, using a HDD with 4k sectors and an SSD with an erase block size of 2MB this command would look like
54	+
55	+	# make-bcache --block 4k --bucket 2M -C /dev/sdk1
56	+
57	+	{{Note\|You may need to omit the {{ic\|--block 4k}} option, see [https://unix.stackexchange.com/questions/359508/cannot-attach-cache-device-to-backing-device Cannot attach cache device to backing device].}}
58	+
59	+	3. Get the uuid of the cache device
60	+
61	+	# bcache-super-show /dev/sdk1 \| grep cset
62	+	cset.uuid f0e01318-f4fd-4fab-abbb-d76d870503ec
63	+
64	+	4. Register the cache device against your backing device. Replace the example uuid with the uuid of your cache. Udev rules will take care of this on reboot and will only need to be done once.
65	+
66	+	# echo f0e01318-f4fd-4fab-abbb-d76d870503ec > /sys/block/bcache0/bcache/attach
67	+
68	+	5. Create the btrfs filesystem.
69	+
70	+	# mkfs.btrfs /dev/bcache0
71	+
72	+	6. mount the filesystem
73	+
74	+	# mount /dev/bcache0 /mnt
75	+
76	+	7. If you want to have this partition available during the initcpio (i.e. you require it at some point in the boot process) you need to add 'bcache' to your modules array in /etc/mkinitcpio.conf as well as adding the 'bcache' hook in your list between block and filesystems. You must then [[regenerate the initramfs]].
77	+
78	+	=== Situation: Prevent all write access to a HDD ===
79	+	{{Warning\|
80	+	* When the hard drive or the SSD fails, all data is lost.
81	+	* Consider using BTRFS RAID to prevent data loss when a SSD / HDD fails.
82	+	}}
83	+	In this situation the goal is to keep the HDD idle as long as possible. This is achieved by absorbing all writes with the SSD. The hard drive is only activated when the SSD is full, or when something is read that's not on the SSD.
84	+
85	+	Enable the writeback cache mode:
86	+
87	+	# echo writeback > /sys/block/bcache0/bcache/cache_mode
88	+
89	+	Let bcache completely sync with the hard drive.
90	+
91	+	# echo 0 > /sys/block/bcache0/bcache/writeback_percent
92	+
93	+	Don't let sequential IO bypass the cache:
94	+
95	+	# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
96	+
97	+	Let bcache wait a week after the previous sync is done:
98	+
99	+	# echo $((72460*60)) > /sys/block/bcache0/bcache/writeback_delay
100	+
101	+	Don't let bcache go around the cache when there's read / write congestion
102	+
103	+	# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
104	+	# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
105	+
106	+	Put the HDD to sleep after 20 minutes:
107	+	# hdparm -S 240 /dev/$(cat /sys/block/bcache0/bcache/backing_dev_name)
108	+	/dev/sdh1:
109	+	setting standby to 240 (20 minutes)
110	+
111	+
112	+	First use lsblk to get the device names of the HDD and SSD. In this example /dev/sdh1 is the HDD, /dev/sdc1 is the SSD:
113	+
114	+	# lsblk -M -s
115	+	bcache0 254:0 0 931.5G 0 disk
116	+	├─sdc1 8:33 0 111.8G 0 part
117	+	│ └─sdc 8:32 0 111.8G 0 disk
118	+	└─sdh1 8:113 0 931.5G 0 part
119	+	└─sdh 8:112 0 931.5G 0 disk
120	+
121	+	Now Dstat can be used to monitor disk access to the members of the bcache set.
122	+
123	+	$ dstat -D sdc1,sdh1
124	+
125	+	== Advanced operations ==
126	+
127	+	=== Resize backing device ===
128	+
129	+	It is possible to resize the backing device so long as you do not move the partition start. This process is described in [https://lore.kernel.org/linux-bcache/CAH+dOxJv-ajvLfbUSo8dqG0a8_grNBhfxJ1EbmSrYZz0YXJM2w@mail.gmail.com/T/ the mailing list]. Here is an example using btrfs volume directly on bcache0. For LVM containers or for other filesystems, procedure will differ.
130	+
131	+	==== Example of growing ====
132	+
133	+	In this example, I grow the filesystem by 4GB.
134	+
135	+	1. Reboot to a live CD/USB Drive (need not be bcache enabled) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G larger.
136	+
137	+	{{Warning\|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}}
138	+
139	+	2. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum. For btrfs, that is
140	+
141	+	# btrfs filesystem resize max /
142	+
143	+	For ext3/4, that is:
144	+
145	+	# resize2fs /dev/bcache0
146	+
147	+	==== Example of shrinking ====
148	+
149	+	In this example, I shrink the filesystem by 4GB.
150	+
151	+	1. Disable writeback cache (switch to writethrough cache) and wait for the disk to flush.
152	+
153	+	# echo writethrough > /sys/block/bcache0/bcache/cache_mode
154	+	$ watch cat /sys/block/bcache0/bcache/state
155	+
156	+	wait until state reports "clean". This might take a while.
157	+
158	+	===== Force flush of cache to backing device =====
159	+
160	+	I suggest to use
161	+
162	+	# echo 0 > /sys/block/bcache0/bcache/writeback_percent
163	+
164	+	This will flush the dirty data of the cache to the backing device in less a minute.
165	+
166	+	Revert back the value after with
167	+
168	+	# echo 10 > /sys/block/bcache0/bcache/writeback_percent
169	+
170	+	2. Shrink the mounted filesystem by something more than the desired amount, to ensure we do not accidentally clip it later. For btrfs, that is:
171	+
172	+	# btrfs filesystem resize -5G /
173	+
174	+	For ext3/4 you can use ''resize2fs'', but only if the partition is unmounted
175	+
176	+	{{hc\|$ df -h /home\|
177	+	/dev/bcache0 290G 20G 270G 1% /home
178	+	}}
179	+
180	+	# umount /home
181	+	# resize2fs /dev/bcache0 283G
182	+
183	+	3. Reboot to a LiveCD/USB drive (does not need to support bcache) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G smaller.
184	+
185	+	{{Warning\|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}}
186	+
187	+	4. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum (that is, the size we shrunk the actual partition to in step 3). For btrfs, that is:
188	+
189	+	# btrfs filesystem resize max /
190	+
191	+	For ext3/4, that is:
192	+
193	+	# resize2fs /dev/bcache0
194	+
195	+	5. Re-enable writeback cache if you want that enabled:
196	+
197	+	# echo writeback > /sys/block/bcache0/bcache/cache_mode
198	+
199	+	{{Note\|If you are very careful you can shrink the filesystem to the exact size in step 2 and avoid step 4. Be careful, though, many partition tools do not do exactly what you want, but instead adjust the requested partition start/end points to end on sector boundaries. This may be difficult to calculate ahead of time}}
200	+
201	+	== Troubleshooting ==
202	+
203	+	=== /dev/bcache device does not exist on bootup ===
204	+
205	+	If you are sent to a busy box shell with an error:
206	+
207	+	{{bc\|1=
208	+	ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'.
209	+	You are being dropped to a recovery shell
210	+	Type 'exit' to try and continue booting
211	+	}}
212	+
213	+	This might happen if the backing device is configured for "writeback" mode (default is writearound). When in "writeback" mode, the /dev/bcache0 device is not started until the cache device is both registered and attached. Registering is something that needs to happen every bootup, but attaching should only have to be done once.
214	+
215	+	To continue booting, try one of the following:
216	+
217	+	* Register both the backing device and the caching device
218	+
219	+	# echo /dev/sda3 > /sys/fs/bcache/register
220	+	# echo /dev/sdb > /sys/fs/bcache/register
221	+
222	+	If the /dev/bcache0 device now exists, type exit and continue booting. You will need to fix your initcpio to ensure devices are registered before mounting the root device.
223	+
224	+	{{Note\|
225	+	* An error of "sh: echo: write error: Invalid argument" means the device was already registered or is not recognized as either a bcache backing device or cache. If using the udev rule on boot it should only attempt to register a device if it finds a bcache superblock
226	+	* This can also happen if using udev's 69-bcache.rules in Installation's step 7 and blkid and bcache-probe "disagree" due to rogue superblocks. See [https://bcache.evilpiepirate.org/#index6h1 bcache's wiki] for a possible explanation/resolution.
227	+	}}
228	+
229	+	* Re-attach the cache to the backing device:
230	+
231	+	If the cache device was registered, a folder with the UUID of the cache should exist in {{ic\|/sys/fs/bcache}}. Use that UUID when following the example below:
232	+
233	+	{{hc\|# ls /sys/fs/bcache/\|
234	+	b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 register register_quiet
235	+	}}
236	+
237	+	# echo b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 > /sys/block/sda/sda3/bcache/attach
238	+
239	+	If the {{ic\|/dev/bcache0}} device now exists, type exit and continue booting. You should not have to do this again. If it persists, ask on the bcache mailing list.
240	+
241	+	{{Note\|An error of {{ic\|sh: echo: write error: Invalid argument}} means the device was already attached. An error of {{ic\|sh: echo: write error: No such file or directory}} means the UUID is not a valid cache (make sure you typed it correctly).}}
242	+
243	+	* Invalidate the cache and force the backing device to run without it. You might want to check some stats, such as "dirty_data" so you have some idea of how much data will be lost.
244	+
245	+	{{hc\|# cat /sys/block/sda/sda3/bcache/dirty_data\|
246	+	-3.9M
247	+	}}
248	+
249	+	dirty data is data in the cache that has not been written to the backing device. If you force the backing device to run, this data will be lost, even if you later re-attach the cache.
250	+
251	+	{{hc\|# cat /sys/block/sda/sda3/bcache/running\|
252	+	0
253	+	}}
254	+
255	+	# echo 1 > /sys/block/sda/sda3/bcache/running
256	+
257	+	The {{ic\|/dev/bcache0}} device will now exist. Type exit and continue booting. You might want to unregister the cache device and run make-bcache again. An fsck on {{ic\|/dev/bcache0}} would also be wise. See the [https://docs.kernel.org/admin-guide/bcache.html bcache documentation].
258	+
259	+	{{Warning\|Only invalidate the cache if one of the two options above did not work.}}
260	+
261	+	=== /sys/fs/bcache/ does not exist ===
262	+
263	+	The kernel you booted is not bcache enabled, or you the bcache [[Kernel module#Manual module handling\|module is not loaded]]
264	+
265	+	=== write error: Invalid argument when trying to attach a device due to mismatched block parameter ===
266	+
267	+	Given {{ic\|bash: echo: write error: Invalid argument}} when trying to attach a device, and the actual error is shown with [[dmesg]]:
268	+
269	+	bcache: bch_cached_dev_attach() Couldn't attach sdc: block size less than set's block size
270	+
271	+	This happens because the {{ic\|--block 4k}} parameter was not set on either device and defaults can mismatch.
272	+
273	+	Creating both the backing and caching device in one command automatically solves the issue, but when using separate commands the block size parameter sometimes needs to be set manually on both devices.
274	+
275	+	=== Device or resource busy ===
276	+	When a device is in use as a bcache backing device, it can not be formatted nor partitioned:
277	+	# make-bcache -C /dev/sdb1
278	+	Can't open dev /dev/sdb1: Device or resource busy
279	+
280	+	# fdisk /dev/sdb
281	+
282	+	Welcome to fdisk (util-linux 2.37.2).
283	+	Changes will remain in memory only, until you decide to write them.
284	+	Be careful before using the write command.
285	+
286	+	This disk is currently in use - repartitioning is probably a bad idea.
287	+	It's recommended to umount all file systems, and swapoff all swap
288	+	partitions on this disk.
289	+
290	+
291	+	Command (m for help): q
292	+
293	+	To fix this, first run this command to confirm the disk is actually used as a bcache backing device:
294	+	# bcache-super-show /dev/sdb1
295	+	sb.magic ok
296	+	sb.first_sector 8 [match]
297	+	sb.csum A3D2B8610F6C5E35 [match]
298	+	sb.version 1 [backing device]
299	+
300	+	dev.label (empty)
301	+	dev.uuid 5a868788-65a2-4564-b4b7-c1817d0b6080
302	+	dev.sectors_per_block 1
303	+	dev.sectors_per_bucket 1024
304	+	dev.data.first_sector 16
305	+	dev.data.cache_mode 1 [writeback]
306	+	dev.data.cache_state 2 [dirty]
307	+
308	+	cset.uuid 42dcb651-6b53-4b65-bc49-9b1ca0acc5b1
309	+
310	+	Then stop the backing device. This will also remove the corresponding /dev/bcache device.
311	+	# echo 1 > /sys/class/block/sdb1/bcache/stop
312	+
313	+	# dmesg
314	+	[ 3171.263577] bcache: bcache_device_free() bcache0 stopped
315	+	Now the device can be partitioned:
316	+	# fdisk /dev/sdb
317	+
318	+	Welcome to fdisk (util-linux 2.37.2).
319	+	Changes will remain in memory only, until you decide to write them.
320	+	Be careful before using the write command.
321	+
322	+
323	+	Command (m for help): q
324	+	When fdisk exits, the kernel scans the drive again, notices it's a bcache backing device, and uses the drive as a backing device.
325	+	# dmesg
326	+	[ 3190.643270] sdb: sdb1
327	+	[ 3190.833029] bcache: register_bdev() registered backing device sdb1
328	+	This creates the directory bcache under /sys/class/block/sdb1/
329	+	# ls /sys/class/block/sdb1/
330	+	alignment_offset bcache dev discard_alignment holders inflight partition power ro size start stat subsystem uevent
331	+
332	+	== See also ==
333	+
334	+	* [https://bcache.evilpiepirate.org Bcache Homepage]
335	+	* [https://docs.kernel.org/admin-guide/bcache.html Bcache Manual]
336	+
337	+	==================================================
338	+
339	+	上面的信息是我从别的地方摘抄的。可能有用，可能没用。可以参考然后回答下面的问题。
340	+
341	+	最近，我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。
342	+
343	+	```bash
344	+	anduin@ms-server:~$ cd /swarm-vol/
345	+	anduin@ms-server:/swarm-vol$ df . -Th
346	+	Filesystem Type Size Used Avail Use% Mounted on
347	+	/dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol
348	+	anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/
349	+	anduin@ms-server:/swarm-vol/nextcloud$ df . -Th
350	+	Filesystem Type Size Used Avail Use% Mounted on
351	+	/dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud
352	+	anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l
353	+	Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors
354	+	Disk model: INTEL SSDPED1D480GA
355	+	Units: sectors of 1 * 512 = 512 bytes
356	+	Sector size (logical/physical): 512 bytes / 512 bytes
357	+	I/O size (minimum/optimal): 512 bytes / 512 bytes
358	+	Disklabel type: gpt
359	+	Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4
360	+
361	+	Device Start End Sectors Size Type
362	+	/dev/nvme1n1p1 2048 1050623 1048576 512M EFI System
363	+	/dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem
364	+
365	+
366	+	Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors
367	+	Disk model: WUS4BB076D7P3E3
368	+	Units: sectors of 1 * 4096 = 4096 bytes
369	+	Sector size (logical/physical): 4096 bytes / 4096 bytes
370	+	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
371	+
372	+
373	+	Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
374	+	Disk model: CT1000P3PSSD8
375	+	Units: sectors of 1 * 512 = 512 bytes
376	+	Sector size (logical/physical): 512 bytes / 512 bytes
377	+	I/O size (minimum/optimal): 512 bytes / 512 bytes
378	+	anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/
379	+	anduin@ms-server:/dev/disk/by-uuid$ ls -ashl
380	+	total 0
381	+	0 drwxr-xr-x 2 root root 140 Jan 17 15:21 .
382	+	0 drwxr-xr-x 7 root root 140 Dec 28 05:45 ..
383	+	0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1
384	+	0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1
385	+	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1
386	+	0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2
387	+	anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab
388	+	# /etc/fstab: static file system information.
389	+	#
390	+	# Use 'blkid' to print the universally unique identifier for a
391	+	# device; this may be used with UUID= as a more robust way to name devices
392	+	# that works even if disks are added and removed. See fstab(5).
393	+	#
394	+	# <file system> <mount point> <type> <options> <dump> <pass>
395	+	UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1
396	+	UUID=9C58-514E /boot/efi vfat umask=0077 0 1
397	+	/dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
398	+	/dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0
399	+	/swapfile none swap sw 0 0
400	+	```
401	+
402	+	由上面的信息，不难判断出：
403	+
404	+	我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ，当然这和我们要调查的内容没什么关系。
405	+
406	+	我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5`
407	+
408	+	即使我暂时使用奇技淫巧，将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下，还是濒临不够了。
409	+
410	+	但是，幸运的是，我购买了一个全新的大而慢的机械硬盘：
411	+
412	+	```bash
413	+	Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors
414	+	Disk model: RAID5
415	+	Units: sectors of 1 * 512 = 512 bytes
416	+	Sector size (logical/physical): 512 bytes / 4096 bytes
417	+	I/O size (minimum/optimal): 4096 bytes / 4096 bytes
418	+	```
419	+
420	+	为了测试它，我暂时挂载到了这里：
421	+
422	+	```bash
423	+	/dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0
424	+	```
425	+
426	+	接下来，我认为我需要开始设计我的迁移改造计划。
427	+
428	+	为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5，也就是nvme2n1，也就是 /swarm-vol 的快的固态的特性，又能发挥 /dev/sda 大的优点，我计划这样设计：
429	+
430	+	使用 bcache 系统，让 /dev/sda 作为真正的存储设备，再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘，同时开启写入缓存和阅读缓存，这样我就拥有又大有快的存储了。
431	+
432	+	考虑到我的缓存盘非常大（上面的信息可以得出，它足足有 6.99 TB 对吧？），我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠，它几乎不会损坏，我也不担心短暂的数据丢失。我又不是银行，就都是电影。
433	+
434	+	接下来，为了方便迁移，我开始设计我的迁移计划：
435	+
436	+	# 阶段概要
437	+
438	+	## 第一阶段 - 双数据阶段
439	+
440	+	将 sda 格式化清空，作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。
441	+
442	+	## 第二阶段 - 暂停业务阶段
443	+
444	+	将业务暂停，然后我最后运行一次rsync。这次rsync应该会跑得很快，因为只产生了增量数据差异。此时此刻，nvme2n1 （ext4）的数据，和 sda （bacache的后端）的数据是完全相同的。
445	+
446	+	## 第三阶段 - 重构存储阶段
447	+
448	+	将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘，挂载到 /swarm-vol，实现业务无感。然后重启业务。
449	+
450	+	注意：我没有任何额外的新空间可以用于备份！所以我的命令必须一次成功！一旦失败我们将万劫不复！
451	+
452	+	## 第一阶段
453	+
454	+	接下来，我要开始第一阶段的迁移了。我第一阶段计划这么做：
455	+
456	+	目标
457	+
458	+	* 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端（backing device）。
459	+	* 先不动现有 /dev/nvme2n1（现挂载于 /swarm-vol）上的业务数据，让业务继续运行。
460	+	* 格式化出的 /dev/bcache0 上创建一个文件系统（例如 ext4），然后将现有数据从 /swarm-vol 同步到这个新地方。
461	+	* 这是“第一阶段”，意在让 /dev/sda 上也有一份业务数据拷贝，从而腾出后续的操作空间。
462	+
463	+	结果
464	+
465	+	* 最终会拥有两份数据：
466	+	* 原始：/swarm-vol（在 /dev/nvme2n1 上）
467	+	* 新的：/mnt/bcache（对应 /dev/bcache0，后端实际上是 /dev/sda）
468	+	* 业务不中断
469	+
470	+	我可以让服务继续使用 /swarm-vol，只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。
471	+	在第一阶段结束后，等我准备好，可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。
472	+
473	+	```bash
474	+	# 安装 bcache-tools
475	+	sudo apt install bcache-tools
476	+
477	+	# 仅示例，注意操作前先确认 /dev/sda 确实空置
478	+	# (在 fdisk 交互式命令中，删除旧分区、新建分区)
479	+	sudo fdisk /dev/sda
480	+
481	+	# 使用 wipefs 清除 sda 上的所有签名
482	+	sudo wipefs -a /dev/sda
483	+
484	+	# 创建 bcache 后端
485	+	sudo make-bcache -B /dev/sda
486	+
487	+	# 确认后端创建成功
488	+	# UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c
489	+	# Set UUID: 01442457-240d-4bf4-8140-b7a647659beb
490	+	# version: 1
491	+	# block_size: 1
492	+	# data_offset: 16
493	+
494	+	# 格式化后端
495	+	ls -ashl /dev/bcache0
496	+	sudo mkfs.ext4 /dev/bcache0
497	+
498	+	# 创建挂载点
499	+	sudo mkdir /mnt/bcache
500	+
501	+	# 挂载 bcache 后端
502	+	sudo mount /dev/bcache0 /mnt/bcache
503	+
504	+	# 确认挂载成功
505	+	cd /mnt/bcache
506	+
507	+	# 确认挂载成功
508	+	df . -Th
509	+
510	+	# (确认挂载成功后，开始 rsync)
511	+	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
512	+
513	+	# 同步 nextcloud 文件夹
514	+	sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/
515	+	```
516	+
517	+	下面给出示例脚本，供我在“第二阶段”和“第三阶段”中参考使用。思路与第一阶段相同：谨慎操作、一次成功，避免数据丢失。
518	+
519	+	---
520	+
521	+	## 第二阶段 - 暂停业务并做最终同步
522	+
523	+	在这一阶段，我将：
524	+	1. 暂停业务，使其不再写入 `/swarm-vol`（也就是旧的 nvme2n1）。
525	+	2. 做最后一次增量 rsync，保证数据在 /dev/bcache0（后端 sda）上与旧数据完全一致。
526	+	3. 卸载旧的 `/swarm-vol`，改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`，这样就完成了切换。
527	+
528	+	示例脚本（在生产环境中，请根据自己实际服务的暂停方式作相应调整）：
529	+
530	+	```bash
531	+	# 1) 暂停业务
532	+	echo "停止相关业务/服务 (示例：docker-compose 或 systemctl stop 等)"
533	+	docker-compose down
534	+	sudo reboot # 重启服务器，确保业务不再写入
535	+
536	+	# 2) 做最后一次增量同步
537	+	sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/
538	+	sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/
539	+
540	+	# 3) 切换挂载点
541	+	sudo umount /swarm-vol
542	+
543	+	echo "将 bcache0 挂载为新的 /swarm-vol..."
544	+	sudo mount /dev/bcache0 /swarm-vol
545	+
546	+	echo "检查挂载..."
547	+	df -Th /swarm-vol
548	+
549	+	echo "请人工确认 /swarm-vol 中的数据完整性；若无误，可以继续。"
550	+	```
551	+
552	+	在执行完成后，`/swarm-vol` 已经切换到基于 `/dev/bcache0`（后端是 `/dev/sda`）的存储，业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务，但仍在物理上保留（尚未被清空）。
553	+
554	+	---
555	+
556	+	## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备
557	+
558	+	在这一阶段，我将：
559	+	1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。
560	+	2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。
561	+	3. 将缓存盘附加到已经存在的 bcache 后端（即 `/dev/sda`）上，使两者变为真正的 “大容量 + SSD 缓存” 组合。
562	+	4. 根据需求，启用写回缓存（writeback）等激进模式。
563	+
564	+	示例脚本：
565	+
566	+	```bash
567	+	# 1) 确认当前 /swarm-vol 已经是 /dev/bcache0，且业务正常
568	+	# （需人工自行验证，确认数据已在 /dev/sda + /dev/bcache0 上）
569	+	# 此时可以停一下业务，或保持低负载也行，避免写入影响。
570	+
571	+	# 2) 清空 nvme2n1 （原来的 /swarm-vol）注意，这将销毁原数据！
572	+	echo "准备清空 /dev/nvme2n1..."
573	+	sudo umount /dev/nvme2n1 \|\| true # 若尚未卸载，可忽略报错
574	+	sudo wipefs -a /dev/nvme2n1
575	+
576	+	# 3) 将 nvme2n1 作为缓存盘初始化
577	+	echo "对 /dev/nvme2n1 执行 make-bcache -C（cache）..."
578	+	sudo make-bcache -C /dev/nvme2n1
579	+	# 若有需要，可以带上 --block/--bucket 参数。例如：
580	+	# sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1
581	+
582	+	echo "检查生成的缓存盘信息..."
583	+	sudo bcache-super-show /dev/nvme2n1 \| grep -E "cset.uuid\|dev.uuid"
584	+
585	+	# 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555
586	+	# (这里仅演示，我需要看实际输出)
587	+
588	+	CACHE_UUID="(此处填上实际的 cset.uuid)"
589	+
590	+	# 4) 将缓存设备附加到现有的 /dev/bcache0（后端 /dev/sda）
591	+	# /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认
592	+	echo "附加缓存到现有 bcache 后端..."
593	+	echo "$CACHE_UUID" \| sudo tee /sys/block/bcache0/bcache/attach
594	+
595	+	# 如果我看到 echo: write error: Invalid argument，通常是 block size 不匹配等问题
596	+	# 如果成功，则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现
597	+
598	+	# 5) 为 bcache0 启用写回缓存模式（可选）
599	+	echo "启用写回 (writeback) 缓存模式..."
600	+	echo writeback \| sudo tee /sys/block/bcache0/bcache/cache_mode
601	+
602	+	# 可选：关闭顺序IO绕过等更激进的做法
603	+	# echo 0 \| sudo tee /sys/block/bcache0/bcache/sequential_cutoff
604	+	# echo 0 \| sudo tee /sys/block/bcache0/bcache/writeback_percent
605	+
606	+	# 6) 确认缓存已生效
607	+	echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol，并检查 sysfs 等信息："
608	+	mount \| grep /swarm-vol
609	+	ls -l /sys/block/bcache0/bcache
610	+	```
611	+
612	+	至此，我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作，并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括：
613	+
614	+	1. 开机自动挂载
615	+	- 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。
616	+	- 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device（以免重启后没了 /dev/bcache0）。在 Ubuntu 下，一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。
617	+
618	+	在 `/etc/fsabt` 中添加：
619	+
620	+	```bash
621	+	# 删除旧的 /swarm-vol 挂载
622	+	# /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0
623	+	# 然后添加新的 /swarm-vol 挂载
624	+	/dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0
625	+	```
626	+
627	+	2. 确认写回模式的风险
628	+	- 写回模式（writeback）可以大幅提高速度，但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好，且并不特别在意短期丢失风险，可以大胆使用。
629	+
630	+	3. 调优与监控
631	+	- 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。
632	+	- 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。
633	+
634	+	完成后，我就拥有一个后端极大（/dev/sda）+ 前端极快（/dev/nvme2n1 作为缓存）的综合存储系统，挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。

Newer Older