anduin revised this gist . Go to revision
1 file changed, 2 insertions, 2 deletions
plan.md
@@ -449,9 +449,9 @@ If bcache-super-show says that that the backing dev.data.cache_state state is cl | |||
449 | 449 | ||
450 | 450 | However, if clean, you could try force-starting the backing device without cache device: | |
451 | 451 | ||
452 | - | ```b | |
452 | + | ```bash | |
453 | 453 | echo 1 | sudo tee /sys/class/block/$dev/bcache/running | |
454 | - | ||
454 | + | ``` | |
455 | 455 | ||
456 | 456 | ||
457 | 457 |
anduin revised this gist . Go to revision
1 file changed, 33 insertions, 1 deletion
plan.md
@@ -1,4 +1,4 @@ | |||
1 | - | 最近,我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。 | |
1 | + | 最近,我的磁盘空间不太够用了。 | |
2 | 2 | ||
3 | 3 | ```bash | |
4 | 4 | anduin@ms-server:~$ cd /swarm-vol/ | |
@@ -559,3 +559,35 @@ However, if clean, you could try force-starting the backing device without cache | |||
559 | 559 | ```bash | |
560 | 560 | echo 1 | sudo tee /sys/class/block/$dev/bcache/running | |
561 | 561 | ``` | |
562 | + | ||
563 | + | ## Eject cache | |
564 | + | ||
565 | + | I used `bcache` only in a writethrough configuration, and IIRC even then `bcache` doesn't like at all if the cache device vanishes while the machine is running. Expect the `bcache` device to stall completely if that happens. | |
566 | + | ||
567 | + | I haven't tried to remove the cache device while the machine is powered down, so I can't say anything about that. I do think though that `bcache` is still pretty touchy, so I'd recommend that you try that with a VM or a physical test machine first. | |
568 | + | ||
569 | + | ||
570 | + | ---------- | |
571 | + | ||
572 | + | ||
573 | + | To safely remove the cache device, you can detach the cache set from the bcache device: | |
574 | + | ||
575 | + | echo <cache-set-uuid> > /sys/block/bcache0/bcache/detach | |
576 | + | ||
577 | + | To determine the necessary cache set UUID, look in `/sys/fs/bcache/`: | |
578 | + | ||
579 | + | host ~ # ll /sys/fs/bcache/ | |
580 | + | total 0 | |
581 | + | drwxr-xr-x 7 root root 0 Feb 19 00:11 eb99feda-fac7-43dc-b89d-18765e9febb6 | |
582 | + | --w------- 1 root root 4096 Feb 19 00:11 register | |
583 | + | --w------- 1 root root 4096 Feb 7 07:17 register_quiet | |
584 | + | ||
585 | + | So for example in this case, run: | |
586 | + | ||
587 | + | echo eb99feda-fac7-43dc-b89d-18765e9febb6 > /sys/block/bcache0/bcache/detach | |
588 | + | ||
589 | + | The `state` file should say `no cache` after that: | |
590 | + | ||
591 | + | host ~ # cat /sys/block/bcache0/bcache/state | |
592 | + | no cache | |
593 | + |
anduin revised this gist . Go to revision
1 file changed, 643 deletions
plan.md
@@ -298,649 +298,6 @@ ls -l /sys/block/bcache0/bcache | |||
298 | 298 | ||
299 | 299 | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 | |
300 | 300 | ||
301 | - | 使用下面的命令检查其状态: | |
302 | - | ||
303 | - | ```bash | |
304 | - | Related articles end}} | |
305 | - | ||
306 | - | [https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual]. | |
307 | - | ||
308 | - | {{Tip|An alternative to Bcache is the [[LVM#Cache|LVM cache]].}} | |
309 | - | ||
310 | - | Bcache needs the backing device to be formatted as a bcache block device. In most cases, [https://github.com/g2p/blocks blocks to-bcache] can do an in-place conversion. | |
311 | - | ||
312 | - | {{Out of date|Any source for bcache with btrfs causing corruption in 2024? The linked blog has no extra details }} | |
313 | - | ||
314 | - | {{Warning|1=<nowiki/> | |
315 | - | * Be sure you back up any important data first. | |
316 | - | * Bcache and [[btrfs]] could leave you with a corrupted filesystem. Please visit [https://www.hdevalence.ca/blog/2013-09-21-notes-on-my-archlinux-install this post] for more information. Btrfs wiki reports that it was fixed in kernels 3.19+ [https://btrfs.wiki.kernel.org/index.php/Gotchas#Historical_references]. | |
317 | - | }} | |
318 | - | ||
319 | - | == Setting up bcached btrfs file systems on an existing system == | |
320 | - | ||
321 | - | {{Warning|make-bcache '''will not''' import an existing drive or partition – it will reformat it.}} | |
322 | - | ||
323 | - | === Preparation === | |
324 | - | ||
325 | - | [[Install]] {{AUR|bcache-tools}}. | |
326 | - | ||
327 | - | Use fdisk to create the appropriate partitions on the SSD's and hard drives to hold the cache and the backing data. | |
328 | - | {{Tip| It is possible to create many partitions on a single drive. This allows for testing of elaborate setups before committing. Be aware all data will be lost when the drive fails. This will also kill performance of the drive, due to unfavorable access patterns.}} | |
329 | - | ||
330 | - | === Situation: 1 hard drive and 1 read cache SSD === | |
331 | - | ||
332 | - | {{Warning| | |
333 | - | * When a single hard drive fails, all data is lost. | |
334 | - | * Do not enable write caching, as that can cause data loss when the SSD fails | |
335 | - | }} | |
336 | - | +--------------+ | |
337 | - | | btrfs /mnt | | |
338 | - | +--------------+ | |
339 | - | | /dev/Bcache0 | | |
340 | - | +--------------+ | |
341 | - | | Cache | | |
342 | - | | /dev/sdk1 | | |
343 | - | +--------------+ | |
344 | - | | Data | | |
345 | - | | /dev/sdv1 | | |
346 | - | +--------------+ | |
347 | - | ||
348 | - | 1. Format the backing device (This will typically be your mechanical drive). The backing device can be a whole device, a partition or any other standard block device. This will create /dev/bcache0 | |
349 | - | ||
350 | - | # make-bcache -B /dev/sdv1 | |
351 | - | ||
352 | - | 2. Format the cache device (This will typically be your SSD). The cache device can be a whole device, a partition or any other standard block device | |
353 | - | ||
354 | - | # make-bcache -C /dev/sdk1 | |
355 | - | ||
356 | - | In this example the default block and bucket sizes of 512B and 128kB are used. The block size should match the backing devices sector size which will usually be either 512 or 4k. The bucket size should match the erase block size of the caching device with the intent of reducing write amplification. For example, using a HDD with 4k sectors and an SSD with an erase block size of 2MB this command would look like | |
357 | - | ||
358 | - | # make-bcache --block 4k --bucket 2M -C /dev/sdk1 | |
359 | - | ||
360 | - | {{Note|You may need to omit the {{ic|--block 4k}} option, see [https://unix.stackexchange.com/questions/359508/cannot-attach-cache-device-to-backing-device Cannot attach cache device to backing device].}} | |
361 | - | ||
362 | - | 3. Get the uuid of the cache device | |
363 | - | ||
364 | - | # bcache-super-show /dev/sdk1 | grep cset | |
365 | - | cset.uuid f0e01318-f4fd-4fab-abbb-d76d870503ec | |
366 | - | ||
367 | - | 4. Register the cache device against your backing device. Replace the example uuid with the uuid of your cache. Udev rules will take care of this on reboot and will only need to be done once. | |
368 | - | ||
369 | - | # echo f0e01318-f4fd-4fab-abbb-d76d870503ec > /sys/block/bcache0/bcache/attach | |
370 | - | ||
371 | - | 5. Create the btrfs filesystem. | |
372 | - | ||
373 | - | # mkfs.btrfs /dev/bcache0 | |
374 | - | ||
375 | - | 6. mount the filesystem | |
376 | - | ||
377 | - | # mount /dev/bcache0 /mnt | |
378 | - | ||
379 | - | 7. If you want to have this partition available during the initcpio (i.e. you require it at some point in the boot process) you need to add 'bcache' to your modules array in /etc/mkinitcpio.conf as well as adding the 'bcache' hook in your list between block and filesystems. You must then [[regenerate the initramfs]]. | |
380 | - | ||
381 | - | === Situation: Prevent all write access to a HDD === | |
382 | - | {{Warning| | |
383 | - | * When the hard drive or the SSD fails, all data is lost. | |
384 | - | * Consider using BTRFS RAID to prevent data loss when a SSD / HDD fails. | |
385 | - | }} | |
386 | - | In this situation the goal is to keep the HDD idle as long as possible. This is achieved by absorbing all writes with the SSD. The hard drive is only activated when the SSD is full, or when something is read that's not on the SSD. | |
387 | - | ||
388 | - | Enable the writeback cache mode: | |
389 | - | ||
390 | - | # echo writeback > /sys/block/bcache0/bcache/cache_mode | |
391 | - | ||
392 | - | Let bcache completely sync with the hard drive. | |
393 | - | ||
394 | - | # echo 0 > /sys/block/bcache0/bcache/writeback_percent | |
395 | - | ||
396 | - | Don't let sequential IO bypass the cache: | |
397 | - | ||
398 | - | # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff | |
399 | - | ||
400 | - | Let bcache wait a week after the previous sync is done: | |
401 | - | ||
402 | - | # echo $((7*24*60*60)) > /sys/block/bcache0/bcache/writeback_delay | |
403 | - | ||
404 | - | Don't let bcache go around the cache when there's read / write congestion | |
405 | - | ||
406 | - | # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us | |
407 | - | # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us | |
408 | - | ||
409 | - | Put the HDD to sleep after 20 minutes: | |
410 | - | # hdparm -S 240 /dev/$(cat /sys/block/bcache0/bcache/backing_dev_name) | |
411 | - | /dev/sdh1: | |
412 | - | setting standby to 240 (20 minutes) | |
413 | - | ||
414 | - | ||
415 | - | First use lsblk to get the device names of the HDD and SSD. In this example /dev/sdh1 is the HDD, /dev/sdc1 is the SSD: | |
416 | - | ||
417 | - | # lsblk -M -s | |
418 | - | bcache0 254:0 0 931.5G 0 disk | |
419 | - | ├─sdc1 8:33 0 111.8G 0 part | |
420 | - | │ └─sdc 8:32 0 111.8G 0 disk | |
421 | - | └─sdh1 8:113 0 931.5G 0 part | |
422 | - | └─sdh 8:112 0 931.5G 0 disk | |
423 | - | ||
424 | - | Now Dstat can be used to monitor disk access to the members of the bcache set. | |
425 | - | ||
426 | - | $ dstat -D sdc1,sdh1 | |
427 | - | ||
428 | - | == Advanced operations == | |
429 | - | ||
430 | - | === Resize backing device === | |
431 | - | ||
432 | - | It is possible to resize the backing device so long as you do not move the partition start. This process is described in [https://lore.kernel.org/linux-bcache/CAH+dOxJv-ajvLfbUSo8dqG0a8_grNBhfxJ1EbmSrYZz0YXJM2w@mail.gmail.com/T/ the mailing list]. Here is an example using btrfs volume directly on bcache0. For LVM containers or for other filesystems, procedure will differ. | |
433 | - | ||
434 | - | ==== Example of growing ==== | |
435 | - | ||
436 | - | In this example, I grow the filesystem by 4GB. | |
437 | - | ||
438 | - | 1. Reboot to a live CD/USB Drive (need not be bcache enabled) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G larger. | |
439 | - | ||
440 | - | {{Warning|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}} | |
441 | - | ||
442 | - | 2. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum. For btrfs, that is | |
443 | - | ||
444 | - | # btrfs filesystem resize max / | |
445 | - | ||
446 | - | For ext3/4, that is: | |
447 | - | ||
448 | - | # resize2fs /dev/bcache0 | |
449 | - | ||
450 | - | ==== Example of shrinking ==== | |
451 | - | ||
452 | - | In this example, I shrink the filesystem by 4GB. | |
453 | - | ||
454 | - | 1. Disable writeback cache (switch to writethrough cache) and wait for the disk to flush. | |
455 | - | ||
456 | - | # echo writethrough > /sys/block/bcache0/bcache/cache_mode | |
457 | - | $ watch cat /sys/block/bcache0/bcache/state | |
458 | - | ||
459 | - | wait until state reports "clean". This might take a while. | |
460 | - | ||
461 | - | ===== Force flush of cache to backing device ===== | |
462 | - | ||
463 | - | I suggest to use | |
464 | - | ||
465 | - | # echo 0 > /sys/block/bcache0/bcache/writeback_percent | |
466 | - | ||
467 | - | This will flush the dirty data of the cache to the backing device in less a minute. | |
468 | - | ||
469 | - | Revert back the value after with | |
470 | - | ||
471 | - | # echo 10 > /sys/block/bcache0/bcache/writeback_percent | |
472 | - | ||
473 | - | 2. Shrink the mounted filesystem by something more than the desired amount, to ensure we do not accidentally clip it later. For btrfs, that is: | |
474 | - | ||
475 | - | # btrfs filesystem resize -5G / | |
476 | - | ||
477 | - | For ext3/4 you can use ''resize2fs'', but only if the partition is unmounted | |
478 | - | ||
479 | - | {{hc|$ df -h /home| | |
480 | - | /dev/bcache0 290G 20G 270G 1% /home | |
481 | - | }} | |
482 | - | ||
483 | - | # umount /home | |
484 | - | # resize2fs /dev/bcache0 283G | |
485 | - | ||
486 | - | 3. Reboot to a LiveCD/USB drive (does not need to support bcache) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G smaller. | |
487 | - | ||
488 | - | {{Warning|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}} | |
489 | - | ||
490 | - | 4. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum (that is, the size we shrunk the actual partition to in step 3). For btrfs, that is: | |
491 | - | ||
492 | - | # btrfs filesystem resize max / | |
493 | - | ||
494 | - | For ext3/4, that is: | |
495 | - | ||
496 | - | # resize2fs /dev/bcache0 | |
497 | - | ||
498 | - | 5. Re-enable writeback cache if you want that enabled: | |
499 | - | ||
500 | - | # echo writeback > /sys/block/bcache0/bcache/cache_mode | |
501 | - | ||
502 | - | {{Note|If you are very careful you can shrink the filesystem to the exact size in step 2 and avoid step 4. Be careful, though, many partition tools do not do exactly what you want, but instead adjust the requested partition start/end points to end on sector boundaries. This may be difficult to calculate ahead of time}} | |
503 | - | ||
504 | - | == Troubleshooting == | |
505 | - | ||
506 | - | === /dev/bcache device does not exist on bootup === | |
507 | - | ||
508 | - | If you are sent to a busy box shell with an error: | |
509 | - | ||
510 | - | {{bc|1= | |
511 | - | ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'. | |
512 | - | You are being dropped to a recovery shell | |
513 | - | Type 'exit' to try and continue booting | |
514 | - | }} | |
515 | - | ||
516 | - | This might happen if the backing device is configured for "writeback" mode (default is writearound). When in "writeback" mode, the /dev/bcache0 device is not started until the cache device is both registered and attached. Registering is something that needs to happen every bootup, but attaching should only have to be done once. | |
517 | - | ||
518 | - | To continue booting, try one of the following: | |
519 | - | ||
520 | - | * Register both the backing device and the caching device | |
521 | - | ||
522 | - | # echo /dev/sda3 > /sys/fs/bcache/register | |
523 | - | # echo /dev/sdb > /sys/fs/bcache/register | |
524 | - | ||
525 | - | If the /dev/bcache0 device now exists, type exit and continue booting. You will need to fix your initcpio to ensure devices are registered before mounting the root device. | |
526 | - | ||
527 | - | {{Note| | |
528 | - | * An error of "sh: echo: write error: Invalid argument" means the device was already registered or is not recognized as either a bcache backing device or cache. If using the udev rule on boot it should only attempt to register a device if it finds a bcache superblock | |
529 | - | * This can also happen if using udev's 69-bcache.rules in Installation's step 7 and blkid and bcache-probe "disagree" due to rogue superblocks. See [https://bcache.evilpiepirate.org/#index6h1 bcache's wiki] for a possible explanation/resolution. | |
530 | - | }} | |
531 | - | ||
532 | - | * Re-attach the cache to the backing device: | |
533 | - | ||
534 | - | If the cache device was registered, a folder with the UUID of the cache should exist in {{ic|/sys/fs/bcache}}. Use that UUID when following the example below: | |
535 | - | ||
536 | - | {{hc|# ls /sys/fs/bcache/| | |
537 | - | b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 register register_quiet | |
538 | - | }} | |
539 | - | ||
540 | - | # echo b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 > /sys/block/sda/sda3/bcache/attach | |
541 | - | ||
542 | - | If the {{ic|/dev/bcache0}} device now exists, type exit and continue booting. You should not have to do this again. If it persists, ask on the bcache mailing list. | |
543 | - | ||
544 | - | {{Note|An error of {{ic|sh: echo: write error: Invalid argument}} means the device was already attached. An error of {{ic|sh: echo: write error: No such file or directory}} means the UUID is not a valid cache (make sure you typed it correctly).}} | |
545 | - | ||
546 | - | * Invalidate the cache and force the backing device to run without it. You might want to check some stats, such as "dirty_data" so you have some idea of how much data will be lost. | |
547 | - | ||
548 | - | {{hc|# cat /sys/block/sda/sda3/bcache/dirty_data| | |
549 | - | -3.9M | |
550 | - | }} | |
551 | - | ||
552 | - | dirty data is data in the cache that has not been written to the backing device. If you force the backing device to run, this data will be lost, even if you later re-attach the cache. | |
553 | - | ||
554 | - | {{hc|# cat /sys/block/sda/sda3/bcache/running| | |
555 | - | 0 | |
556 | - | }} | |
557 | - | ||
558 | - | # echo 1 > /sys/block/sda/sda3/bcache/running | |
559 | - | ||
560 | - | The {{ic|/dev/bcache0}} device will now exist. Type exit and continue booting. You might want to unregister the cache device and run make-bcache again. An fsck on {{ic|/dev/bcache0}} would also be wise. See the [https://docs.kernel.org/admin-guide/bcache.html bcache documentation]. | |
561 | - | ||
562 | - | {{Warning|Only invalidate the cache if one of the two options above did not work.}} | |
563 | - | ||
564 | - | === /sys/fs/bcache/ does not exist === | |
565 | - | ||
566 | - | The kernel you booted is not bcache enabled, or you the bcache [[Kernel module#Manual module handling|module is not loaded]] | |
567 | - | ||
568 | - | === write error: Invalid argument when trying to attach a device due to mismatched block parameter === | |
569 | - | ||
570 | - | Given {{ic|bash: echo: write error: Invalid argument}} when trying to attach a device, and the actual error is shown with [[dmesg]]: | |
571 | - | ||
572 | - | bcache: bch_cached_dev_attach() Couldn't attach sdc: block size less than set's block size | |
573 | - | ||
574 | - | This happens because the {{ic|--block 4k}} parameter was not set on either device and defaults can mismatch. | |
575 | - | ||
576 | - | Creating both the backing and caching device in one command automatically solves the issue, but when using separate commands the block size parameter sometimes needs to be set manually on both devices. | |
577 | - | ||
578 | - | === Device or resource busy === | |
579 | - | When a device is in use as a bcache backing device, it can not be formatted nor partitioned: | |
580 | - | # make-bcache -C /dev/sdb1 | |
581 | - | Can't open dev /dev/sdb1: Device or resource busy | |
582 | - | ||
583 | - | # fdisk /dev/sdb | |
584 | - | ||
585 | - | Welcome to fdisk (util-linux 2.37.2). | |
586 | - | Changes will remain in memory only, until you decide to write them. | |
587 | - | Be careful before using the write command. | |
588 | - | ||
589 | - | This disk is currently in use - repartitioning is probably a bad idea. | |
590 | - | It's recommended to umount all file systems, and swapoff all swap | |
591 | - | partitions on this disk. | |
592 | - | ||
593 | - | ||
594 | - | Command (m for help): q | |
595 | - | ||
596 | - | To fix this, first run this command to confirm the disk is actually used as a bcache backing device: | |
597 | - | # bcache-super-show /dev/sdb1 | |
598 | - | sb.magic ok | |
599 | - | sb.first_sector 8 [match] | |
600 | - | sb.csum A3D2B8610F6C5E35 [match] | |
601 | - | sb.version 1 [backing device] | |
602 | - | ||
603 | - | dev.label (empty) | |
604 | - | dev.uuid 5a868788-65a2-4564-b4b7-c1817d0b6080 | |
605 | - | dev.sectors_per_block 1 | |
606 | - | dev.sectors_per_bucket 1024 | |
607 | - | dev.data.first_sector 16 | |
608 | - | dev.data.cache_mode 1 [writeback] | |
609 | - | dev.data.cache_state 2 [dirty] | |
610 | - | ||
611 | - | cset.uuid 42dcb651-6b53-4b65-bc49-9b1ca0acc5b1 | |
612 | - | ||
613 | - | Then stop the backing device. This will also remove the corresponding /dev/bcache device. | |
614 | - | # echo 1 > /sys/class/block/sdb1/bcache/stop | |
615 | - | ||
616 | - | # dmesg | |
617 | - | [ 3171.263577] bcache: bcache_device_free() bcache0 stopped | |
618 | - | Now the device can be partitioned: | |
619 | - | # fdisk /dev/sdb | |
620 | - | ||
621 | - | Welcome to fdisk (util-linux 2.37.2). | |
622 | - | Changes will remain in memory only, until you decide to write them. | |
623 | - | Be careful before using the write command. | |
624 | - | ||
625 | - | ||
626 | - | Command (m for help): q | |
627 | - | When fdisk exits, the kernel scans the drive again, notices it's a bcache backing device, and uses the drive as a backing device. | |
628 | - | # dmesg | |
629 | - | [ 3190.643270] sdb: sdb1 | |
630 | - | [ 3190.833029] bcache: register_bdev() registered backing device sdb1 | |
631 | - | This creates the directory bcache under /sys/class/block/sdb1/ | |
632 | - | # ls /sys/class/block/sdb1/ | |
633 | - | alignment_offset bcache dev discard_alignment holders inflight partition power ro size start stat subsystem uevent | |
634 | - | ||
635 | - | == See also == | |
636 | - | ||
637 | - | * [https://bcache.evilpiepirate.org Bcache Homepage] | |
638 | - | * [https://docs.kernel.org/admin-guide/bcache.html Bcache Manual] | |
639 | - | ||
640 | - | ================================================== | |
641 | - | ||
642 | - | 上面的信息是我从别的地方摘抄的。可能有用,可能没用。可以参考然后回答下面的问题。 | |
643 | - | ||
644 | - | 最近,我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。 | |
645 | - | ||
646 | - | ```bash | |
647 | - | anduin@ms-server:~$ cd /swarm-vol/ | |
648 | - | anduin@ms-server:/swarm-vol$ df . -Th | |
649 | - | Filesystem Type Size Used Avail Use% Mounted on | |
650 | - | /dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol | |
651 | - | anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/ | |
652 | - | anduin@ms-server:/swarm-vol/nextcloud$ df . -Th | |
653 | - | Filesystem Type Size Used Avail Use% Mounted on | |
654 | - | /dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud | |
655 | - | anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l | |
656 | - | Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | |
657 | - | Disk model: INTEL SSDPED1D480GA | |
658 | - | Units: sectors of 1 * 512 = 512 bytes | |
659 | - | Sector size (logical/physical): 512 bytes / 512 bytes | |
660 | - | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
661 | - | Disklabel type: gpt | |
662 | - | Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4 | |
663 | - | ||
664 | - | Device Start End Sectors Size Type | |
665 | - | /dev/nvme1n1p1 2048 1050623 1048576 512M EFI System | |
666 | - | /dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem | |
667 | - | ||
668 | - | ||
669 | - | Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors | |
670 | - | Disk model: WUS4BB076D7P3E3 | |
671 | - | Units: sectors of 1 * 4096 = 4096 bytes | |
672 | - | Sector size (logical/physical): 4096 bytes / 4096 bytes | |
673 | - | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
674 | - | ||
675 | - | ||
676 | - | Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors | |
677 | - | Disk model: CT1000P3PSSD8 | |
678 | - | Units: sectors of 1 * 512 = 512 bytes | |
679 | - | Sector size (logical/physical): 512 bytes / 512 bytes | |
680 | - | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
681 | - | anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/ | |
682 | - | anduin@ms-server:/dev/disk/by-uuid$ ls -ashl | |
683 | - | total 0 | |
684 | - | 0 drwxr-xr-x 2 root root 140 Jan 17 15:21 . | |
685 | - | 0 drwxr-xr-x 7 root root 140 Dec 28 05:45 .. | |
686 | - | 0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1 | |
687 | - | 0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1 | |
688 | - | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1 | |
689 | - | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2 | |
690 | - | anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab | |
691 | - | # /etc/fstab: static file system information. | |
692 | - | # | |
693 | - | # Use 'blkid' to print the universally unique identifier for a | |
694 | - | # device; this may be used with UUID= as a more robust way to name devices | |
695 | - | # that works even if disks are added and removed. See fstab(5). | |
696 | - | # | |
697 | - | # <file system> <mount point> <type> <options> <dump> <pass> | |
698 | - | UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1 | |
699 | - | UUID=9C58-514E /boot/efi vfat umask=0077 0 1 | |
700 | - | /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
701 | - | /dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0 | |
702 | - | /swapfile none swap sw 0 0 | |
703 | - | ``` | |
704 | - | ||
705 | - | 由上面的信息,不难判断出: | |
706 | - | ||
707 | - | 我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ,当然这和我们要调查的内容没什么关系。 | |
708 | - | ||
709 | - | 我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5` | |
710 | - | ||
711 | - | 即使我暂时使用奇技淫巧,将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下,还是濒临不够了。 | |
712 | - | ||
713 | - | 但是,幸运的是,我购买了一个全新的大而慢的机械硬盘: | |
714 | - | ||
715 | - | ```bash | |
716 | - | Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors | |
717 | - | Disk model: RAID5 | |
718 | - | Units: sectors of 1 * 512 = 512 bytes | |
719 | - | Sector size (logical/physical): 512 bytes / 4096 bytes | |
720 | - | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
721 | - | ``` | |
722 | - | ||
723 | - | 为了测试它,我暂时挂载到了这里: | |
724 | - | ||
725 | - | ```bash | |
726 | - | /dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0 | |
727 | - | ``` | |
728 | - | ||
729 | - | 接下来,我认为我需要开始设计我的迁移改造计划。 | |
730 | - | ||
731 | - | 为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5,也就是nvme2n1,也就是 /swarm-vol 的快的固态的特性,又能发挥 /dev/sda 大的优点,我计划这样设计: | |
732 | - | ||
733 | - | 使用 bcache 系统,让 /dev/sda 作为真正的存储设备,再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘,同时开启写入缓存和阅读缓存,这样我就拥有又大有快的存储了。 | |
734 | - | ||
735 | - | 考虑到我的缓存盘非常大(上面的信息可以得出,它足足有 6.99 TB 对吧?),我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠,它几乎不会损坏,我也不担心短暂的数据丢失。我又不是银行,就都是电影。 | |
736 | - | ||
737 | - | 接下来,为了方便迁移,我开始设计我的迁移计划: | |
738 | - | ||
739 | - | # 阶段概要 | |
740 | - | ||
741 | - | ## 第一阶段 - 双数据阶段 | |
742 | - | ||
743 | - | 将 sda 格式化清空,作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。 | |
744 | - | ||
745 | - | ## 第二阶段 - 暂停业务阶段 | |
746 | - | ||
747 | - | 将业务暂停,然后我最后运行一次rsync。这次rsync应该会跑得很快,因为只产生了增量数据差异。此时此刻,nvme2n1 (ext4)的数据,和 sda (bacache的后端)的数据是完全相同的。 | |
748 | - | ||
749 | - | ## 第三阶段 - 重构存储阶段 | |
750 | - | ||
751 | - | 将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘,挂载到 /swarm-vol,实现业务无感。然后重启业务。 | |
752 | - | ||
753 | - | 注意:我没有任何额外的新空间可以用于备份!所以我的命令必须一次成功!一旦失败我们将万劫不复! | |
754 | - | ||
755 | - | ## 第一阶段 | |
756 | - | ||
757 | - | 接下来,我要开始第一阶段的迁移了。我第一阶段计划这么做: | |
758 | - | ||
759 | - | 目标 | |
760 | - | ||
761 | - | * 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端(backing device)。 | |
762 | - | * 先不动现有 /dev/nvme2n1(现挂载于 /swarm-vol)上的业务数据,让业务继续运行。 | |
763 | - | * 格式化出的 /dev/bcache0 上创建一个文件系统(例如 ext4),然后将现有数据从 /swarm-vol 同步到这个新地方。 | |
764 | - | * 这是“第一阶段”,意在让 /dev/sda 上也有一份业务数据拷贝,从而腾出后续的操作空间。 | |
765 | - | ||
766 | - | 结果 | |
767 | - | ||
768 | - | * 最终会拥有两份数据: | |
769 | - | * 原始:/swarm-vol(在 /dev/nvme2n1 上) | |
770 | - | * 新的:/mnt/bcache(对应 /dev/bcache0,后端实际上是 /dev/sda) | |
771 | - | * 业务不中断 | |
772 | - | ||
773 | - | 我可以让服务继续使用 /swarm-vol,只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。 | |
774 | - | 在第一阶段结束后,等我准备好,可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。 | |
775 | - | ||
776 | - | ```bash | |
777 | - | # 安装 bcache-tools | |
778 | - | sudo apt install bcache-tools | |
779 | - | ||
780 | - | # 仅示例,注意操作前先确认 /dev/sda 确实空置 | |
781 | - | # (在 fdisk 交互式命令中,删除旧分区、新建分区) | |
782 | - | sudo fdisk /dev/sda | |
783 | - | ||
784 | - | # 使用 wipefs 清除 sda 上的所有签名 | |
785 | - | sudo wipefs -a /dev/sda | |
786 | - | ||
787 | - | # 创建 bcache 后端 | |
788 | - | sudo make-bcache -B /dev/sda | |
789 | - | ||
790 | - | # 如果在 fdisk 里没有找到 /dev/bcache0,可以尝试 | |
791 | - | # 重新加载内核模块: | |
792 | - | sudo modprobe bcache | |
793 | - | ||
794 | - | # 如果还是没有,尝试手工创建 | |
795 | - | sudo echo /dev/sda > /sys/fs/bcache/register | |
796 | - | ||
797 | - | # 确认后端创建成功 | |
798 | - | # UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c | |
799 | - | # Set UUID: 01442457-240d-4bf4-8140-b7a647659beb | |
800 | - | # version: 1 | |
801 | - | # block_size: 1 | |
802 | - | # data_offset: 16 | |
803 | - | ||
804 | - | # 格式化后端 | |
805 | - | ls -ashl /dev/bcache0 | |
806 | - | sudo mkfs.ext4 /dev/bcache0 | |
807 | - | ||
808 | - | # 创建挂载点 | |
809 | - | sudo mkdir /mnt/bcache | |
810 | - | ||
811 | - | # 挂载 bcache 后端 | |
812 | - | sudo mount /dev/bcache0 /mnt/bcache | |
813 | - | ||
814 | - | # 确认挂载成功 | |
815 | - | cd /mnt/bcache | |
816 | - | ||
817 | - | # 确认挂载成功 | |
818 | - | df . -Th | |
819 | - | ||
820 | - | # (确认挂载成功后,开始 rsync) | |
821 | - | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
822 | - | ||
823 | - | # 同步 nextcloud 文件夹 | |
824 | - | sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/ | |
825 | - | ``` | |
826 | - | ||
827 | - | ## 第二阶段 - 暂停业务并做最终同步 | |
828 | - | ||
829 | - | 在这一阶段,我将: | |
830 | - | 1. 暂停业务,使其不再写入 `/swarm-vol`(也就是旧的 nvme2n1)。 | |
831 | - | 2. 做最后一次增量 rsync,保证数据在 /dev/bcache0(后端 sda)上与旧数据完全一致。 | |
832 | - | 3. 卸载旧的 `/swarm-vol`,改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`,这样就完成了切换。 | |
833 | - | ||
834 | - | **示例脚本**(在生产环境中,请根据自己实际服务的暂停方式作相应调整): | |
835 | - | ||
836 | - | ```bash | |
837 | - | # 1) 暂停业务 | |
838 | - | echo "停止相关业务/服务 (示例:docker-compose 或 systemctl stop 等)" | |
839 | - | docker-compose down | |
840 | - | sudo reboot # 重启服务器,确保业务不再写入 | |
841 | - | ||
842 | - | # 2) 做最后一次增量同步 | |
843 | - | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
844 | - | sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/ | |
845 | - | ||
846 | - | # 3) 切换挂载点 | |
847 | - | sudo umount /swarm-vol | |
848 | - | ||
849 | - | echo "将 bcache0 挂载为新的 /swarm-vol..." | |
850 | - | sudo mount /dev/bcache0 /swarm-vol | |
851 | - | ||
852 | - | echo "检查挂载..." | |
853 | - | df -Th /swarm-vol | |
854 | - | ||
855 | - | echo "请人工确认 /swarm-vol 中的数据完整性;若无误,可以继续。" | |
856 | - | ``` | |
857 | - | ||
858 | - | 在执行完成后,`/swarm-vol` 已经切换到基于 `/dev/bcache0`(后端是 `/dev/sda`)的存储,业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务,但仍在物理上保留(尚未被清空)。 | |
859 | - | ||
860 | - | --- | |
861 | - | ||
862 | - | ## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备 | |
863 | - | ||
864 | - | 在这一阶段,我将: | |
865 | - | 1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。 | |
866 | - | 2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。 | |
867 | - | 3. 将缓存盘附加到已经存在的 bcache 后端(即 `/dev/sda`)上,使两者变为真正的 “大容量 + SSD 缓存” 组合。 | |
868 | - | 4. 根据需求,启用写回缓存(writeback)等激进模式。 | |
869 | - | ||
870 | - | **示例脚本**: | |
871 | - | ||
872 | - | ```bash | |
873 | - | # 1) 确认当前 /swarm-vol 已经是 /dev/bcache0,且业务正常 | |
874 | - | # (需人工自行验证,确认数据已在 /dev/sda + /dev/bcache0 上) | |
875 | - | # 此时可以停一下业务,或保持低负载也行,避免写入影响。 | |
876 | - | ||
877 | - | # 2) 清空 nvme2n1 (原来的 /swarm-vol) 注意,这将销毁原数据! | |
878 | - | echo "准备清空 /dev/nvme2n1..." | |
879 | - | sudo umount /dev/nvme2n1 || true # 若尚未卸载,可忽略报错 | |
880 | - | sudo wipefs -a /dev/nvme2n1 | |
881 | - | ||
882 | - | # 3) 将 nvme2n1 作为缓存盘初始化 | |
883 | - | echo "对 /dev/nvme2n1 执行 make-bcache -C(cache)..." | |
884 | - | #在这个例子里,默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配(通常是512或者4k)。bucket的大小应该与缓存设备的擦除block大小匹配(以减少写入放大)。例如,如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配,命令就应该是这样的: | |
885 | - | # sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1 | |
886 | - | # 如果你需要查看 /dev/sda (也就是后端)的 block size,可以使用 fdisk -l /dev/sda 等命令。 | |
887 | - | # 如果你需要查看 /dev/nvme2n1 的擦除块大小,可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M | |
888 | - | sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1 | |
889 | - | ||
890 | - | echo "检查生成的缓存盘信息..." | |
891 | - | sudo bcache-super-show /dev/nvme2n1 | grep -E "cset.uuid|dev.uuid" | |
892 | - | ||
893 | - | # 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555 | |
894 | - | # (这里仅演示,我需要看实际输出) | |
895 | - | ||
896 | - | CACHE_UUID="(此处填上实际的 cset.uuid)" | |
897 | - | ||
898 | - | # 4) 将缓存设备附加到现有的 /dev/bcache0(后端 /dev/sda) | |
899 | - | # /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认 | |
900 | - | echo "附加缓存到现有 bcache 后端..." | |
901 | - | echo "$CACHE_UUID" | sudo tee /sys/block/bcache0/bcache/attach | |
902 | - | ||
903 | - | # 如果我看到 echo: write error: Invalid argument,通常是 block size 不匹配等问题 | |
904 | - | # 如果成功,则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现 | |
905 | - | ||
906 | - | # 5) 为 bcache0 启用写回缓存模式(可选) | |
907 | - | echo "启用写回 (writeback) 缓存模式..." | |
908 | - | echo writeback | sudo tee /sys/block/bcache0/bcache/cache_mode | |
909 | - | ||
910 | - | # 可选:关闭顺序IO绕过等更激进的做法 | |
911 | - | # echo 0 | sudo tee /sys/block/bcache0/bcache/sequential_cutoff | |
912 | - | # echo 0 | sudo tee /sys/block/bcache0/bcache/writeback_percent | |
913 | - | ||
914 | - | # 6) 确认缓存已生效 | |
915 | - | echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol,并检查 sysfs 等信息:" | |
916 | - | mount | grep /swarm-vol | |
917 | - | ls -l /sys/block/bcache0/bcache | |
918 | - | ``` | |
919 | - | ||
920 | - | 至此,我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作,并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括: | |
921 | - | ||
922 | - | 1. **开机自动挂载** | |
923 | - | - 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。 | |
924 | - | - 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device(以免重启后没了 /dev/bcache0)。在 Ubuntu 下,一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。 | |
925 | - | ||
926 | - | 在 `/etc/fsabt` 中添加: | |
927 | - | ||
928 | - | ```bash | |
929 | - | # 删除旧的 /swarm-vol 挂载 | |
930 | - | # /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
931 | - | # 然后添加新的 /swarm-vol 挂载 | |
932 | - | /dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
933 | - | ``` | |
934 | - | ||
935 | - | 2. **确认写回模式的风险** | |
936 | - | - 写回模式(writeback)可以大幅提高速度,但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好,且并不特别在意短期丢失风险,可以大胆使用。 | |
937 | - | ||
938 | - | 3. **调优与监控** | |
939 | - | - 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。 | |
940 | - | - 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。 | |
941 | - | ||
942 | - | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 | |
943 | - | ||
944 | 301 | 使用下面的命令检查其状态: | |
945 | 302 | ||
946 | 303 | ```bash |
anduin revised this gist . Go to revision
1 file changed, 460 insertions
plan.md
@@ -1,3 +1,306 @@ | |||
1 | + | 最近,我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。 | |
2 | + | ||
3 | + | ```bash | |
4 | + | anduin@ms-server:~$ cd /swarm-vol/ | |
5 | + | anduin@ms-server:/swarm-vol$ df . -Th | |
6 | + | Filesystem Type Size Used Avail Use% Mounted on | |
7 | + | /dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol | |
8 | + | anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/ | |
9 | + | anduin@ms-server:/swarm-vol/nextcloud$ df . -Th | |
10 | + | Filesystem Type Size Used Avail Use% Mounted on | |
11 | + | /dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud | |
12 | + | anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l | |
13 | + | Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | |
14 | + | Disk model: INTEL SSDPED1D480GA | |
15 | + | Units: sectors of 1 * 512 = 512 bytes | |
16 | + | Sector size (logical/physical): 512 bytes / 512 bytes | |
17 | + | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
18 | + | Disklabel type: gpt | |
19 | + | Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4 | |
20 | + | ||
21 | + | Device Start End Sectors Size Type | |
22 | + | /dev/nvme1n1p1 2048 1050623 1048576 512M EFI System | |
23 | + | /dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem | |
24 | + | ||
25 | + | ||
26 | + | Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors | |
27 | + | Disk model: WUS4BB076D7P3E3 | |
28 | + | Units: sectors of 1 * 4096 = 4096 bytes | |
29 | + | Sector size (logical/physical): 4096 bytes / 4096 bytes | |
30 | + | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
31 | + | ||
32 | + | ||
33 | + | Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors | |
34 | + | Disk model: CT1000P3PSSD8 | |
35 | + | Units: sectors of 1 * 512 = 512 bytes | |
36 | + | Sector size (logical/physical): 512 bytes / 512 bytes | |
37 | + | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
38 | + | anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/ | |
39 | + | anduin@ms-server:/dev/disk/by-uuid$ ls -ashl | |
40 | + | total 0 | |
41 | + | 0 drwxr-xr-x 2 root root 140 Jan 17 15:21 . | |
42 | + | 0 drwxr-xr-x 7 root root 140 Dec 28 05:45 .. | |
43 | + | 0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1 | |
44 | + | 0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1 | |
45 | + | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1 | |
46 | + | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2 | |
47 | + | anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab | |
48 | + | # /etc/fstab: static file system information. | |
49 | + | # | |
50 | + | # Use 'blkid' to print the universally unique identifier for a | |
51 | + | # device; this may be used with UUID= as a more robust way to name devices | |
52 | + | # that works even if disks are added and removed. See fstab(5). | |
53 | + | # | |
54 | + | # <file system> <mount point> <type> <options> <dump> <pass> | |
55 | + | UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1 | |
56 | + | UUID=9C58-514E /boot/efi vfat umask=0077 0 1 | |
57 | + | /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
58 | + | /dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0 | |
59 | + | /swapfile none swap sw 0 0 | |
60 | + | ``` | |
61 | + | ||
62 | + | 由上面的信息,不难判断出: | |
63 | + | ||
64 | + | 我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ,当然这和我们要调查的内容没什么关系。 | |
65 | + | ||
66 | + | 我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5` | |
67 | + | ||
68 | + | 即使我暂时使用奇技淫巧,将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下,还是濒临不够了。 | |
69 | + | ||
70 | + | 但是,幸运的是,我购买了一个全新的大而慢的机械硬盘: | |
71 | + | ||
72 | + | ```bash | |
73 | + | Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors | |
74 | + | Disk model: RAID5 | |
75 | + | Units: sectors of 1 * 512 = 512 bytes | |
76 | + | Sector size (logical/physical): 512 bytes / 4096 bytes | |
77 | + | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
78 | + | ``` | |
79 | + | ||
80 | + | 为了测试它,我暂时挂载到了这里: | |
81 | + | ||
82 | + | ```bash | |
83 | + | /dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0 | |
84 | + | ``` | |
85 | + | ||
86 | + | 接下来,我认为我需要开始设计我的迁移改造计划。 | |
87 | + | ||
88 | + | 为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5,也就是nvme2n1,也就是 /swarm-vol 的快的固态的特性,又能发挥 /dev/sda 大的优点,我计划这样设计: | |
89 | + | ||
90 | + | 使用 bcache 系统,让 /dev/sda 作为真正的存储设备,再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘,同时开启写入缓存和阅读缓存,这样我就拥有又大有快的存储了。 | |
91 | + | ||
92 | + | 考虑到我的缓存盘非常大(上面的信息可以得出,它足足有 6.99 TB 对吧?),我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠,它几乎不会损坏,我也不担心短暂的数据丢失。我又不是银行,就都是电影。 | |
93 | + | ||
94 | + | 接下来,为了方便迁移,我开始设计我的迁移计划: | |
95 | + | ||
96 | + | # 阶段概要 | |
97 | + | ||
98 | + | ## 第一阶段 - 双数据阶段 | |
99 | + | ||
100 | + | 将 sda 格式化清空,作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。 | |
101 | + | ||
102 | + | ## 第二阶段 - 暂停业务阶段 | |
103 | + | ||
104 | + | 将业务暂停,然后我最后运行一次rsync。这次rsync应该会跑得很快,因为只产生了增量数据差异。此时此刻,nvme2n1 (ext4)的数据,和 sda (bacache的后端)的数据是完全相同的。 | |
105 | + | ||
106 | + | ## 第三阶段 - 重构存储阶段 | |
107 | + | ||
108 | + | 将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘,挂载到 /swarm-vol,实现业务无感。然后重启业务。 | |
109 | + | ||
110 | + | 注意:我没有任何额外的新空间可以用于备份!所以我的命令必须一次成功!一旦失败我们将万劫不复! | |
111 | + | ||
112 | + | ## 第一阶段 | |
113 | + | ||
114 | + | 接下来,我要开始第一阶段的迁移了。我第一阶段计划这么做: | |
115 | + | ||
116 | + | 目标 | |
117 | + | ||
118 | + | * 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端(backing device)。 | |
119 | + | * 先不动现有 /dev/nvme2n1(现挂载于 /swarm-vol)上的业务数据,让业务继续运行。 | |
120 | + | * 格式化出的 /dev/bcache0 上创建一个文件系统(例如 ext4),然后将现有数据从 /swarm-vol 同步到这个新地方。 | |
121 | + | * 这是“第一阶段”,意在让 /dev/sda 上也有一份业务数据拷贝,从而腾出后续的操作空间。 | |
122 | + | ||
123 | + | 结果 | |
124 | + | ||
125 | + | * 最终会拥有两份数据: | |
126 | + | * 原始:/swarm-vol(在 /dev/nvme2n1 上) | |
127 | + | * 新的:/mnt/bcache(对应 /dev/bcache0,后端实际上是 /dev/sda) | |
128 | + | * 业务不中断 | |
129 | + | ||
130 | + | 我可以让服务继续使用 /swarm-vol,只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。 | |
131 | + | 在第一阶段结束后,等我准备好,可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。 | |
132 | + | ||
133 | + | ```bash | |
134 | + | # 安装 bcache-tools | |
135 | + | sudo apt install bcache-tools | |
136 | + | ||
137 | + | # 仅示例,注意操作前先确认 /dev/sda 确实空置 | |
138 | + | # (在 fdisk 交互式命令中,删除旧分区、新建分区) | |
139 | + | sudo fdisk /dev/sda | |
140 | + | ||
141 | + | # 使用 wipefs 清除 sda 上的所有签名 | |
142 | + | sudo wipefs -a /dev/sda | |
143 | + | ||
144 | + | # 创建 bcache 后端 | |
145 | + | sudo make-bcache -B /dev/sda | |
146 | + | ||
147 | + | # 如果在 fdisk 里没有找到 /dev/bcache0,可以尝试 | |
148 | + | # 重新加载内核模块: | |
149 | + | sudo modprobe bcache | |
150 | + | ||
151 | + | # 如果还是没有,尝试手工创建 | |
152 | + | sudo echo /dev/sda > /sys/fs/bcache/register | |
153 | + | ||
154 | + | # 确认后端创建成功 | |
155 | + | # UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c | |
156 | + | # Set UUID: 01442457-240d-4bf4-8140-b7a647659beb | |
157 | + | # version: 1 | |
158 | + | # block_size: 1 | |
159 | + | # data_offset: 16 | |
160 | + | ||
161 | + | # 格式化后端 | |
162 | + | ls -ashl /dev/bcache0 | |
163 | + | sudo mkfs.ext4 /dev/bcache0 | |
164 | + | ||
165 | + | # 创建挂载点 | |
166 | + | sudo mkdir /mnt/bcache | |
167 | + | ||
168 | + | # 挂载 bcache 后端 | |
169 | + | sudo mount /dev/bcache0 /mnt/bcache | |
170 | + | ||
171 | + | # 确认挂载成功 | |
172 | + | cd /mnt/bcache | |
173 | + | ||
174 | + | # 确认挂载成功 | |
175 | + | df . -Th | |
176 | + | ||
177 | + | # (确认挂载成功后,开始 rsync) | |
178 | + | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
179 | + | ||
180 | + | # 同步 nextcloud 文件夹 | |
181 | + | sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/ | |
182 | + | ``` | |
183 | + | ||
184 | + | ## 第二阶段 - 暂停业务并做最终同步 | |
185 | + | ||
186 | + | 在这一阶段,我将: | |
187 | + | 1. 暂停业务,使其不再写入 `/swarm-vol`(也就是旧的 nvme2n1)。 | |
188 | + | 2. 做最后一次增量 rsync,保证数据在 /dev/bcache0(后端 sda)上与旧数据完全一致。 | |
189 | + | 3. 卸载旧的 `/swarm-vol`,改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`,这样就完成了切换。 | |
190 | + | ||
191 | + | **示例脚本**(在生产环境中,请根据自己实际服务的暂停方式作相应调整): | |
192 | + | ||
193 | + | ```bash | |
194 | + | # 1) 暂停业务 | |
195 | + | echo "停止相关业务/服务 (示例:docker-compose 或 systemctl stop 等)" | |
196 | + | docker-compose down | |
197 | + | sudo reboot # 重启服务器,确保业务不再写入 | |
198 | + | ||
199 | + | # 2) 做最后一次增量同步 | |
200 | + | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
201 | + | sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/ | |
202 | + | ||
203 | + | # 3) 切换挂载点 | |
204 | + | sudo umount /swarm-vol | |
205 | + | ||
206 | + | echo "将 bcache0 挂载为新的 /swarm-vol..." | |
207 | + | sudo mount /dev/bcache0 /swarm-vol | |
208 | + | ||
209 | + | echo "检查挂载..." | |
210 | + | df -Th /swarm-vol | |
211 | + | ||
212 | + | echo "请人工确认 /swarm-vol 中的数据完整性;若无误,可以继续。" | |
213 | + | ``` | |
214 | + | ||
215 | + | 在执行完成后,`/swarm-vol` 已经切换到基于 `/dev/bcache0`(后端是 `/dev/sda`)的存储,业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务,但仍在物理上保留(尚未被清空)。 | |
216 | + | ||
217 | + | --- | |
218 | + | ||
219 | + | ## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备 | |
220 | + | ||
221 | + | 在这一阶段,我将: | |
222 | + | 1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。 | |
223 | + | 2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。 | |
224 | + | 3. 将缓存盘附加到已经存在的 bcache 后端(即 `/dev/sda`)上,使两者变为真正的 “大容量 + SSD 缓存” 组合。 | |
225 | + | 4. 根据需求,启用写回缓存(writeback)等激进模式。 | |
226 | + | ||
227 | + | **示例脚本**: | |
228 | + | ||
229 | + | ```bash | |
230 | + | # 1) 确认当前 /swarm-vol 已经是 /dev/bcache0,且业务正常 | |
231 | + | # (需人工自行验证,确认数据已在 /dev/sda + /dev/bcache0 上) | |
232 | + | # 此时可以停一下业务,或保持低负载也行,避免写入影响。 | |
233 | + | ||
234 | + | # 2) 清空 nvme2n1 (原来的 /swarm-vol) 注意,这将销毁原数据! | |
235 | + | echo "准备清空 /dev/nvme2n1..." | |
236 | + | sudo umount /dev/nvme2n1 || true # 若尚未卸载,可忽略报错 | |
237 | + | sudo wipefs -a /dev/nvme2n1 | |
238 | + | ||
239 | + | # 3) 将 nvme2n1 作为缓存盘初始化 | |
240 | + | echo "对 /dev/nvme2n1 执行 make-bcache -C(cache)..." | |
241 | + | #在这个例子里,默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配(通常是512或者4k)。bucket的大小应该与缓存设备的擦除block大小匹配(以减少写入放大)。例如,如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配,命令就应该是这样的: | |
242 | + | # sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1 | |
243 | + | # 如果你需要查看 /dev/sda (也就是后端)的 block size,可以使用 fdisk -l /dev/sda 等命令。 | |
244 | + | # 如果你需要查看 /dev/nvme2n1 的擦除块大小,可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M | |
245 | + | sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1 | |
246 | + | ||
247 | + | echo "检查生成的缓存盘信息..." | |
248 | + | sudo bcache-super-show /dev/nvme2n1 | grep -E "cset.uuid|dev.uuid" | |
249 | + | ||
250 | + | # 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555 | |
251 | + | # (这里仅演示,我需要看实际输出) | |
252 | + | ||
253 | + | CACHE_UUID="(此处填上实际的 cset.uuid)" | |
254 | + | ||
255 | + | # 4) 将缓存设备附加到现有的 /dev/bcache0(后端 /dev/sda) | |
256 | + | # /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认 | |
257 | + | echo "附加缓存到现有 bcache 后端..." | |
258 | + | echo "$CACHE_UUID" | sudo tee /sys/block/bcache0/bcache/attach | |
259 | + | ||
260 | + | # 如果我看到 echo: write error: Invalid argument,通常是 block size 不匹配等问题 | |
261 | + | # 如果成功,则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现 | |
262 | + | ||
263 | + | # 5) 为 bcache0 启用写回缓存模式(可选) | |
264 | + | echo "启用写回 (writeback) 缓存模式..." | |
265 | + | echo writeback | sudo tee /sys/block/bcache0/bcache/cache_mode | |
266 | + | ||
267 | + | # 可选:关闭顺序IO绕过等更激进的做法 | |
268 | + | # echo 0 | sudo tee /sys/block/bcache0/bcache/sequential_cutoff | |
269 | + | # echo 0 | sudo tee /sys/block/bcache0/bcache/writeback_percent | |
270 | + | ||
271 | + | # 6) 确认缓存已生效 | |
272 | + | echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol,并检查 sysfs 等信息:" | |
273 | + | mount | grep /swarm-vol | |
274 | + | ls -l /sys/block/bcache0/bcache | |
275 | + | ``` | |
276 | + | ||
277 | + | 至此,我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作,并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括: | |
278 | + | ||
279 | + | 1. **开机自动挂载** | |
280 | + | - 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。 | |
281 | + | - 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device(以免重启后没了 /dev/bcache0)。在 Ubuntu 下,一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。 | |
282 | + | ||
283 | + | 在 `/etc/fsabt` 中添加: | |
284 | + | ||
285 | + | ```bash | |
286 | + | # 删除旧的 /swarm-vol 挂载 | |
287 | + | # /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
288 | + | # 然后添加新的 /swarm-vol 挂载 | |
289 | + | /dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
290 | + | ``` | |
291 | + | ||
292 | + | 2. **确认写回模式的风险** | |
293 | + | - 写回模式(writeback)可以大幅提高速度,但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好,且并不特别在意短期丢失风险,可以大胆使用。 | |
294 | + | ||
295 | + | 3. **调优与监控** | |
296 | + | - 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。 | |
297 | + | - 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。 | |
298 | + | ||
299 | + | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 | |
300 | + | ||
301 | + | 使用下面的命令检查其状态: | |
302 | + | ||
303 | + | ```bash | |
1 | 304 | Related articles end}} | |
2 | 305 | ||
3 | 306 | [https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual]. | |
@@ -638,6 +941,163 @@ ls -l /sys/block/bcache0/bcache | |||
638 | 941 | ||
639 | 942 | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 | |
640 | 943 | ||
944 | + | 使用下面的命令检查其状态: | |
945 | + | ||
946 | + | ```bash | |
947 | + | anduin@ms-server:/sys/block/bcache0/bcache$ ls | |
948 | + | attach dirty_data sequential_cutoff stripe_size writeback_rate_fp_term_low | |
949 | + | backing_dev_name io_disable state writeback_consider_fragment writeback_rate_fp_term_mid | |
950 | + | backing_dev_uuid io_error_limit stats_day writeback_delay writeback_rate_i_term_inverse | |
951 | + | cache io_errors stats_five_minute writeback_metadata writeback_rate_minimum | |
952 | + | cache_mode label stats_hour writeback_percent writeback_rate_p_term_inverse | |
953 | + | clear_stats partial_stripes_expensive stats_total writeback_rate writeback_rate_update_seconds | |
954 | + | detach readahead_cache_policy stop writeback_rate_debug writeback_running | |
955 | + | dev running stop_when_cache_set_failed writeback_rate_fp_term_high | |
956 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./running | |
957 | + | 1 | |
958 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./state | |
959 | + | dirty | |
960 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./dirty_data | |
961 | + | 775.9M | |
962 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./writeback_running | |
963 | + | 1 | |
964 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./backing_dev_name | |
965 | + | sda | |
966 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cat ./cache_mode | |
967 | + | writethrough [writeback] writearound none | |
968 | + | anduin@ms-server:/sys/block/bcache0/bcache$ cd ./cache | |
969 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache$ ls | |
970 | + | average_key_size bucket_size congested flash_vol_create journal_delay_ms stats_hour tree_depth | |
971 | + | bdev0 cache0 congested_read_threshold_us internal root_usage_percent stats_total unregister | |
972 | + | block_size cache_available_percent congested_write_threshold_us io_error_halflife stats_day stop | |
973 | + | btree_cache_size clear_stats errors io_error_limit stats_five_minute synchronous | |
974 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./errors | |
975 | + | [unregister] panic | |
976 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./bucket_size | |
977 | + | 512.0k | |
978 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache$ cat ./block_size | |
979 | + | 0.5k | |
980 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache$ cd ./stats_day/ | |
981 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ ls | |
982 | + | bypassed cache_bypass_hits cache_bypass_misses cache_hit_ratio cache_hits cache_miss_collisions cache_misses | |
983 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_hit_ratio | |
984 | + | 4 | |
985 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_hits | |
986 | + | 11611 | |
987 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cat ./cache_misses | |
988 | + | 269927 | |
989 | + | anduin@ms-server:/sys/block/bcache0/bcache/cache/stats_day$ cd /swarm-vol/ | |
990 | + | anduin@ms-server:/swarm-vol$ df . -Th | |
991 | + | Filesystem Type Size Used Avail Use% Mounted on | |
992 | + | /dev/bcache0 ext4 58T 6.7T 49T 13% /swarm-vol | |
993 | + | ||
994 | + | ``` | |
995 | + | ||
996 | + | # If unable to run `wipefs` on a device due to `Device or resource busy` error | |
997 | + | ||
998 | + | The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue: | |
999 | + | ||
1000 | + | --- | |
1001 | + | ||
1002 | + | ### **1. Check if the device is mounted** | |
1003 | + | Run: | |
1004 | + | ```bash | |
1005 | + | mount | grep /dev/nvme1n1 | |
1006 | + | ``` | |
1007 | + | If it is mounted, unmount it: | |
1008 | + | ```bash | |
1009 | + | sudo umount /dev/nvme1n1 | |
1010 | + | ``` | |
1011 | + | ||
1012 | + | --- | |
1013 | + | ||
1014 | + | ### **2. Check for active partitions** | |
1015 | + | If any partitions on `/dev/nvme1n1` are in use, they need to be unmounted: | |
1016 | + | ```bash | |
1017 | + | lsblk | |
1018 | + | ``` | |
1019 | + | Unmap active partitions: | |
1020 | + | ```bash | |
1021 | + | sudo umount /dev/nvme1n1pX # Replace "X" with the partition number | |
1022 | + | ``` | |
1023 | + | ||
1024 | + | --- | |
1025 | + | ||
1026 | + | ### **4. Check for `bcache` association** | |
1027 | + | The presence of `bcache0` suggests `bcache` is in use. Verify: | |
1028 | + | ```bash | |
1029 | + | sudo bcache-super-show /dev/nvme1n1 | |
1030 | + | ``` | |
1031 | + | If it is associated, unregister it: | |
1032 | + | ```bash | |
1033 | + | echo 1 | sudo tee /sys/block/bcacheX/bcache/stop # Replace "bcacheX" appropriately | |
1034 | + | ``` | |
1035 | + | Clear the `bcache` superblock: | |
1036 | + | ```bash | |
1037 | + | sudo wipefs -a /dev/nvme1n1 | |
1038 | + | ``` | |
1039 | + | ||
1040 | + | --- | |
1041 | + | ||
1042 | + | ### **5. Check for swap usage** | |
1043 | + | If the device is used as swap: | |
1044 | + | ```bash | |
1045 | + | cat /proc/swaps | |
1046 | + | sudo swapoff /dev/nvme1n1 | |
1047 | + | ``` | |
1048 | + | ||
1049 | + | --- | |
1050 | + | ||
1051 | + | ### **6. Check for open file descriptors** | |
1052 | + | List processes using the device: | |
1053 | + | ```bash | |
1054 | + | sudo lsof | grep /dev/nvme1n1 | |
1055 | + | ``` | |
1056 | + | Kill the processes if necessary: | |
1057 | + | ```bash | |
1058 | + | sudo kill -9 <PID> | |
1059 | + | ``` | |
1060 | + | ||
1061 | + | --- | |
1062 | + | ||
1063 | + | ### **7. Retry `wipefs`** | |
1064 | + | Once the device is no longer in use, retry: | |
1065 | + | ```bash | |
1066 | + | sudo wipefs -a /dev/nvme1n1 | |
1067 | + | ``` | |
1068 | + | ||
1069 | + | If issues persist, let me know the exact usage scenario, and I can assist further! | |
1070 | + | ||
1071 | + | ## If bcache device not showing up on fdisk | |
1072 | + | ||
1073 | + | First, try some clean-up: | |
1074 | + | ||
1075 | + | ``` | |
1076 | + | echo $cset_uuid | sudo tee /sys/fs/bcache/pendings_cleanup | |
1077 | + | echo $backing_uuid | sudo tee /sys/fs/bcache/pendings_cleanup | |
1078 | + | ``` | |
1079 | + | ||
1080 | + | Use bcache-super-show to get the uuids. | |
1081 | + | ||
1082 | + | Then try again to register: | |
1083 | + | ||
1084 | + | ```bash | |
1085 | + | echo $cset_uuid | sudo tee /sys/fs/bcache/register | |
1086 | + | echo $backing_uuid | sudo tee /sys/fs/bcache/register | |
1087 | + | ``` | |
1088 | + | ||
1089 | + | The cache uuid should exist in /dev/fs/bcache if the cache device is successfully registered. | |
1090 | + | ||
1091 | + | If bcache-super-show says that that the backing dev.data.cache_state state is clean and the cset.uuid consists only of zeros, the bcache device is in the invalid state and must be recreated. [source] | |
1092 | + | ||
1093 | + | However, if clean, you could try force-starting the backing device without cache device: | |
1094 | + | ||
1095 | + | ```b | |
1096 | + | echo 1 | sudo tee /sys/class/block/$dev/bcache/running | |
1097 | + | ||
1098 | + | ||
1099 | + | ||
1100 | + | ||
641 | 1101 | # If unable to run `wipefs` on a device due to `Device or resource busy` error | |
642 | 1102 | ||
643 | 1103 | The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue: |
anduin revised this gist . Go to revision
1 file changed, 116 insertions, 6 deletions
plan.md
@@ -484,6 +484,13 @@ sudo wipefs -a /dev/sda | |||
484 | 484 | # 创建 bcache 后端 | |
485 | 485 | sudo make-bcache -B /dev/sda | |
486 | 486 | ||
487 | + | # 如果在 fdisk 里没有找到 /dev/bcache0,可以尝试 | |
488 | + | # 重新加载内核模块: | |
489 | + | sudo modprobe bcache | |
490 | + | ||
491 | + | # 如果还是没有,尝试手工创建 | |
492 | + | sudo echo /dev/sda > /sys/fs/bcache/register | |
493 | + | ||
487 | 494 | # 确认后端创建成功 | |
488 | 495 | # UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c | |
489 | 496 | # Set UUID: 01442457-240d-4bf4-8140-b7a647659beb | |
@@ -514,10 +521,6 @@ sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |||
514 | 521 | sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/ | |
515 | 522 | ``` | |
516 | 523 | ||
517 | - | 下面给出示例脚本,供我在“第二阶段”和“第三阶段”中参考使用。思路与第一阶段相同:谨慎操作、一次成功,避免数据丢失。 | |
518 | - | ||
519 | - | --- | |
520 | - | ||
521 | 524 | ## 第二阶段 - 暂停业务并做最终同步 | |
522 | 525 | ||
523 | 526 | 在这一阶段,我将: | |
@@ -575,9 +578,11 @@ sudo wipefs -a /dev/nvme2n1 | |||
575 | 578 | ||
576 | 579 | # 3) 将 nvme2n1 作为缓存盘初始化 | |
577 | 580 | echo "对 /dev/nvme2n1 执行 make-bcache -C(cache)..." | |
578 | - | sudo make-bcache -C /dev/nvme2n1 | |
579 | - | # 若有需要,可以带上 --block/--bucket 参数。例如: | |
581 | + | #在这个例子里,默认的block大小是512B、bucket大小是128kB。block的大小应该与后端设备的sector大小匹配(通常是512或者4k)。bucket的大小应该与缓存设备的擦除block大小匹配(以减少写入放大)。例如,如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配,命令就应该是这样的: | |
580 | 582 | # sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1 | |
583 | + | # 如果你需要查看 /dev/sda (也就是后端)的 block size,可以使用 fdisk -l /dev/sda 等命令。 | |
584 | + | # 如果你需要查看 /dev/nvme2n1 的擦除块大小,可以使用 nvme id-ns /dev/nvme2n1 等命令。一般是 4M | |
585 | + | sudo make-bcache --block 512 --bucket 4M -C /dev/nvme2n1 | |
581 | 586 | ||
582 | 587 | echo "检查生成的缓存盘信息..." | |
583 | 588 | sudo bcache-super-show /dev/nvme2n1 | grep -E "cset.uuid|dev.uuid" | |
@@ -632,3 +637,108 @@ ls -l /sys/block/bcache0/bcache | |||
632 | 637 | - 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。 | |
633 | 638 | ||
634 | 639 | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 | |
640 | + | ||
641 | + | # If unable to run `wipefs` on a device due to `Device or resource busy` error | |
642 | + | ||
643 | + | The error `Device or resource busy` indicates that the device `/dev/nvme1n1` is currently in use, preventing `wipefs` from accessing it. To resolve this, you need to ensure that no processes or mount points are actively using the device. Here are some steps to identify and resolve the issue: | |
644 | + | ||
645 | + | --- | |
646 | + | ||
647 | + | ### **1. Check if the device is mounted** | |
648 | + | Run: | |
649 | + | ```bash | |
650 | + | mount | grep /dev/nvme1n1 | |
651 | + | ``` | |
652 | + | If it is mounted, unmount it: | |
653 | + | ```bash | |
654 | + | sudo umount /dev/nvme1n1 | |
655 | + | ``` | |
656 | + | ||
657 | + | --- | |
658 | + | ||
659 | + | ### **2. Check for active partitions** | |
660 | + | If any partitions on `/dev/nvme1n1` are in use, they need to be unmounted: | |
661 | + | ```bash | |
662 | + | lsblk | |
663 | + | ``` | |
664 | + | Unmap active partitions: | |
665 | + | ```bash | |
666 | + | sudo umount /dev/nvme1n1pX # Replace "X" with the partition number | |
667 | + | ``` | |
668 | + | ||
669 | + | --- | |
670 | + | ||
671 | + | ### **4. Check for `bcache` association** | |
672 | + | The presence of `bcache0` suggests `bcache` is in use. Verify: | |
673 | + | ```bash | |
674 | + | sudo bcache-super-show /dev/nvme1n1 | |
675 | + | ``` | |
676 | + | If it is associated, unregister it: | |
677 | + | ```bash | |
678 | + | echo 1 | sudo tee /sys/block/bcacheX/bcache/stop # Replace "bcacheX" appropriately | |
679 | + | ``` | |
680 | + | Clear the `bcache` superblock: | |
681 | + | ```bash | |
682 | + | sudo wipefs -a /dev/nvme1n1 | |
683 | + | ``` | |
684 | + | ||
685 | + | --- | |
686 | + | ||
687 | + | ### **5. Check for swap usage** | |
688 | + | If the device is used as swap: | |
689 | + | ```bash | |
690 | + | cat /proc/swaps | |
691 | + | sudo swapoff /dev/nvme1n1 | |
692 | + | ``` | |
693 | + | ||
694 | + | --- | |
695 | + | ||
696 | + | ### **6. Check for open file descriptors** | |
697 | + | List processes using the device: | |
698 | + | ```bash | |
699 | + | sudo lsof | grep /dev/nvme1n1 | |
700 | + | ``` | |
701 | + | Kill the processes if necessary: | |
702 | + | ```bash | |
703 | + | sudo kill -9 <PID> | |
704 | + | ``` | |
705 | + | ||
706 | + | --- | |
707 | + | ||
708 | + | ### **7. Retry `wipefs`** | |
709 | + | Once the device is no longer in use, retry: | |
710 | + | ```bash | |
711 | + | sudo wipefs -a /dev/nvme1n1 | |
712 | + | ``` | |
713 | + | ||
714 | + | If issues persist, let me know the exact usage scenario, and I can assist further! | |
715 | + | ||
716 | + | ## If bcache device not showing up on fdisk | |
717 | + | ||
718 | + | 2 | |
719 | + | ||
720 | + | First, try some clean-up: | |
721 | + | ||
722 | + | ``` | |
723 | + | echo $cset_uuid | sudo tee /sys/fs/bcache/pendings_cleanup | |
724 | + | echo $backing_uuid | sudo tee /sys/fs/bcache/pendings_cleanup | |
725 | + | ``` | |
726 | + | ||
727 | + | Use bcache-super-show to get the uuids. | |
728 | + | ||
729 | + | Then try again to register: | |
730 | + | ||
731 | + | ```bash | |
732 | + | echo $cset_uuid | sudo tee /sys/fs/bcache/register | |
733 | + | echo $backing_uuid | sudo tee /sys/fs/bcache/register | |
734 | + | ``` | |
735 | + | ||
736 | + | The cache uuid should exist in /dev/fs/bcache if the cache device is successfully registered. | |
737 | + | ||
738 | + | If bcache-super-show says that that the backing dev.data.cache_state state is clean and the cset.uuid consists only of zeros, the bcache device is in the invalid state and must be recreated. [source] | |
739 | + | ||
740 | + | However, if clean, you could try force-starting the backing device without cache device: | |
741 | + | ||
742 | + | ```bash | |
743 | + | echo 1 | sudo tee /sys/class/block/$dev/bcache/running | |
744 | + | ``` |
anduin revised this gist . Go to revision
1 file changed, 634 insertions
plan.md(file created)
@@ -0,0 +1,634 @@ | |||
1 | + | Related articles end}} | |
2 | + | ||
3 | + | [https://bcache.evilpiepirate.org/ Bcache] (block cache) allows one to use an SSD as a read/write cache (in writeback mode) or read cache (writethrough or writearound) for another blockdevice (generally a rotating HDD or array). This article will show how to install Arch using Bcache as the root partition. For an intro to bcache itself, see [https://bcache.evilpiepirate.org/ the bcache homepage]. Be sure to read and reference [https://docs.kernel.org/admin-guide/bcache.html the bcache manual]. | |
4 | + | ||
5 | + | {{Tip|An alternative to Bcache is the [[LVM#Cache|LVM cache]].}} | |
6 | + | ||
7 | + | Bcache needs the backing device to be formatted as a bcache block device. In most cases, [https://github.com/g2p/blocks blocks to-bcache] can do an in-place conversion. | |
8 | + | ||
9 | + | {{Out of date|Any source for bcache with btrfs causing corruption in 2024? The linked blog has no extra details }} | |
10 | + | ||
11 | + | {{Warning|1=<nowiki/> | |
12 | + | * Be sure you back up any important data first. | |
13 | + | * Bcache and [[btrfs]] could leave you with a corrupted filesystem. Please visit [https://www.hdevalence.ca/blog/2013-09-21-notes-on-my-archlinux-install this post] for more information. Btrfs wiki reports that it was fixed in kernels 3.19+ [https://btrfs.wiki.kernel.org/index.php/Gotchas#Historical_references]. | |
14 | + | }} | |
15 | + | ||
16 | + | == Setting up bcached btrfs file systems on an existing system == | |
17 | + | ||
18 | + | {{Warning|make-bcache '''will not''' import an existing drive or partition – it will reformat it.}} | |
19 | + | ||
20 | + | === Preparation === | |
21 | + | ||
22 | + | [[Install]] {{AUR|bcache-tools}}. | |
23 | + | ||
24 | + | Use fdisk to create the appropriate partitions on the SSD's and hard drives to hold the cache and the backing data. | |
25 | + | {{Tip| It is possible to create many partitions on a single drive. This allows for testing of elaborate setups before committing. Be aware all data will be lost when the drive fails. This will also kill performance of the drive, due to unfavorable access patterns.}} | |
26 | + | ||
27 | + | === Situation: 1 hard drive and 1 read cache SSD === | |
28 | + | ||
29 | + | {{Warning| | |
30 | + | * When a single hard drive fails, all data is lost. | |
31 | + | * Do not enable write caching, as that can cause data loss when the SSD fails | |
32 | + | }} | |
33 | + | +--------------+ | |
34 | + | | btrfs /mnt | | |
35 | + | +--------------+ | |
36 | + | | /dev/Bcache0 | | |
37 | + | +--------------+ | |
38 | + | | Cache | | |
39 | + | | /dev/sdk1 | | |
40 | + | +--------------+ | |
41 | + | | Data | | |
42 | + | | /dev/sdv1 | | |
43 | + | +--------------+ | |
44 | + | ||
45 | + | 1. Format the backing device (This will typically be your mechanical drive). The backing device can be a whole device, a partition or any other standard block device. This will create /dev/bcache0 | |
46 | + | ||
47 | + | # make-bcache -B /dev/sdv1 | |
48 | + | ||
49 | + | 2. Format the cache device (This will typically be your SSD). The cache device can be a whole device, a partition or any other standard block device | |
50 | + | ||
51 | + | # make-bcache -C /dev/sdk1 | |
52 | + | ||
53 | + | In this example the default block and bucket sizes of 512B and 128kB are used. The block size should match the backing devices sector size which will usually be either 512 or 4k. The bucket size should match the erase block size of the caching device with the intent of reducing write amplification. For example, using a HDD with 4k sectors and an SSD with an erase block size of 2MB this command would look like | |
54 | + | ||
55 | + | # make-bcache --block 4k --bucket 2M -C /dev/sdk1 | |
56 | + | ||
57 | + | {{Note|You may need to omit the {{ic|--block 4k}} option, see [https://unix.stackexchange.com/questions/359508/cannot-attach-cache-device-to-backing-device Cannot attach cache device to backing device].}} | |
58 | + | ||
59 | + | 3. Get the uuid of the cache device | |
60 | + | ||
61 | + | # bcache-super-show /dev/sdk1 | grep cset | |
62 | + | cset.uuid f0e01318-f4fd-4fab-abbb-d76d870503ec | |
63 | + | ||
64 | + | 4. Register the cache device against your backing device. Replace the example uuid with the uuid of your cache. Udev rules will take care of this on reboot and will only need to be done once. | |
65 | + | ||
66 | + | # echo f0e01318-f4fd-4fab-abbb-d76d870503ec > /sys/block/bcache0/bcache/attach | |
67 | + | ||
68 | + | 5. Create the btrfs filesystem. | |
69 | + | ||
70 | + | # mkfs.btrfs /dev/bcache0 | |
71 | + | ||
72 | + | 6. mount the filesystem | |
73 | + | ||
74 | + | # mount /dev/bcache0 /mnt | |
75 | + | ||
76 | + | 7. If you want to have this partition available during the initcpio (i.e. you require it at some point in the boot process) you need to add 'bcache' to your modules array in /etc/mkinitcpio.conf as well as adding the 'bcache' hook in your list between block and filesystems. You must then [[regenerate the initramfs]]. | |
77 | + | ||
78 | + | === Situation: Prevent all write access to a HDD === | |
79 | + | {{Warning| | |
80 | + | * When the hard drive or the SSD fails, all data is lost. | |
81 | + | * Consider using BTRFS RAID to prevent data loss when a SSD / HDD fails. | |
82 | + | }} | |
83 | + | In this situation the goal is to keep the HDD idle as long as possible. This is achieved by absorbing all writes with the SSD. The hard drive is only activated when the SSD is full, or when something is read that's not on the SSD. | |
84 | + | ||
85 | + | Enable the writeback cache mode: | |
86 | + | ||
87 | + | # echo writeback > /sys/block/bcache0/bcache/cache_mode | |
88 | + | ||
89 | + | Let bcache completely sync with the hard drive. | |
90 | + | ||
91 | + | # echo 0 > /sys/block/bcache0/bcache/writeback_percent | |
92 | + | ||
93 | + | Don't let sequential IO bypass the cache: | |
94 | + | ||
95 | + | # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff | |
96 | + | ||
97 | + | Let bcache wait a week after the previous sync is done: | |
98 | + | ||
99 | + | # echo $((7*24*60*60)) > /sys/block/bcache0/bcache/writeback_delay | |
100 | + | ||
101 | + | Don't let bcache go around the cache when there's read / write congestion | |
102 | + | ||
103 | + | # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us | |
104 | + | # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us | |
105 | + | ||
106 | + | Put the HDD to sleep after 20 minutes: | |
107 | + | # hdparm -S 240 /dev/$(cat /sys/block/bcache0/bcache/backing_dev_name) | |
108 | + | /dev/sdh1: | |
109 | + | setting standby to 240 (20 minutes) | |
110 | + | ||
111 | + | ||
112 | + | First use lsblk to get the device names of the HDD and SSD. In this example /dev/sdh1 is the HDD, /dev/sdc1 is the SSD: | |
113 | + | ||
114 | + | # lsblk -M -s | |
115 | + | bcache0 254:0 0 931.5G 0 disk | |
116 | + | ├─sdc1 8:33 0 111.8G 0 part | |
117 | + | │ └─sdc 8:32 0 111.8G 0 disk | |
118 | + | └─sdh1 8:113 0 931.5G 0 part | |
119 | + | └─sdh 8:112 0 931.5G 0 disk | |
120 | + | ||
121 | + | Now Dstat can be used to monitor disk access to the members of the bcache set. | |
122 | + | ||
123 | + | $ dstat -D sdc1,sdh1 | |
124 | + | ||
125 | + | == Advanced operations == | |
126 | + | ||
127 | + | === Resize backing device === | |
128 | + | ||
129 | + | It is possible to resize the backing device so long as you do not move the partition start. This process is described in [https://lore.kernel.org/linux-bcache/CAH+dOxJv-ajvLfbUSo8dqG0a8_grNBhfxJ1EbmSrYZz0YXJM2w@mail.gmail.com/T/ the mailing list]. Here is an example using btrfs volume directly on bcache0. For LVM containers or for other filesystems, procedure will differ. | |
130 | + | ||
131 | + | ==== Example of growing ==== | |
132 | + | ||
133 | + | In this example, I grow the filesystem by 4GB. | |
134 | + | ||
135 | + | 1. Reboot to a live CD/USB Drive (need not be bcache enabled) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G larger. | |
136 | + | ||
137 | + | {{Warning|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}} | |
138 | + | ||
139 | + | 2. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum. For btrfs, that is | |
140 | + | ||
141 | + | # btrfs filesystem resize max / | |
142 | + | ||
143 | + | For ext3/4, that is: | |
144 | + | ||
145 | + | # resize2fs /dev/bcache0 | |
146 | + | ||
147 | + | ==== Example of shrinking ==== | |
148 | + | ||
149 | + | In this example, I shrink the filesystem by 4GB. | |
150 | + | ||
151 | + | 1. Disable writeback cache (switch to writethrough cache) and wait for the disk to flush. | |
152 | + | ||
153 | + | # echo writethrough > /sys/block/bcache0/bcache/cache_mode | |
154 | + | $ watch cat /sys/block/bcache0/bcache/state | |
155 | + | ||
156 | + | wait until state reports "clean". This might take a while. | |
157 | + | ||
158 | + | ===== Force flush of cache to backing device ===== | |
159 | + | ||
160 | + | I suggest to use | |
161 | + | ||
162 | + | # echo 0 > /sys/block/bcache0/bcache/writeback_percent | |
163 | + | ||
164 | + | This will flush the dirty data of the cache to the backing device in less a minute. | |
165 | + | ||
166 | + | Revert back the value after with | |
167 | + | ||
168 | + | # echo 10 > /sys/block/bcache0/bcache/writeback_percent | |
169 | + | ||
170 | + | 2. Shrink the mounted filesystem by something more than the desired amount, to ensure we do not accidentally clip it later. For btrfs, that is: | |
171 | + | ||
172 | + | # btrfs filesystem resize -5G / | |
173 | + | ||
174 | + | For ext3/4 you can use ''resize2fs'', but only if the partition is unmounted | |
175 | + | ||
176 | + | {{hc|$ df -h /home| | |
177 | + | /dev/bcache0 290G 20G 270G 1% /home | |
178 | + | }} | |
179 | + | ||
180 | + | # umount /home | |
181 | + | # resize2fs /dev/bcache0 283G | |
182 | + | ||
183 | + | 3. Reboot to a LiveCD/USB drive (does not need to support bcache) and use fdisk, gdisk, parted, or your other favorite tool to delete the backing partition and recreate it with the same start and a total size 4G smaller. | |
184 | + | ||
185 | + | {{Warning|Do not use a tool like GParted that might perform filesystem operations! It will not recognize the bcache partition and might overwrite part of it!!}} | |
186 | + | ||
187 | + | 4. Reboot to your normal install. Your filesystem will be currently mounted. That is fine. Issue the command to resize the partition to its maximum (that is, the size we shrunk the actual partition to in step 3). For btrfs, that is: | |
188 | + | ||
189 | + | # btrfs filesystem resize max / | |
190 | + | ||
191 | + | For ext3/4, that is: | |
192 | + | ||
193 | + | # resize2fs /dev/bcache0 | |
194 | + | ||
195 | + | 5. Re-enable writeback cache if you want that enabled: | |
196 | + | ||
197 | + | # echo writeback > /sys/block/bcache0/bcache/cache_mode | |
198 | + | ||
199 | + | {{Note|If you are very careful you can shrink the filesystem to the exact size in step 2 and avoid step 4. Be careful, though, many partition tools do not do exactly what you want, but instead adjust the requested partition start/end points to end on sector boundaries. This may be difficult to calculate ahead of time}} | |
200 | + | ||
201 | + | == Troubleshooting == | |
202 | + | ||
203 | + | === /dev/bcache device does not exist on bootup === | |
204 | + | ||
205 | + | If you are sent to a busy box shell with an error: | |
206 | + | ||
207 | + | {{bc|1= | |
208 | + | ERROR: Unable to find root device 'UUID=b6b2d82b-f87e-44d5-bbc5-c51dd7aace15'. | |
209 | + | You are being dropped to a recovery shell | |
210 | + | Type 'exit' to try and continue booting | |
211 | + | }} | |
212 | + | ||
213 | + | This might happen if the backing device is configured for "writeback" mode (default is writearound). When in "writeback" mode, the /dev/bcache0 device is not started until the cache device is both registered and attached. Registering is something that needs to happen every bootup, but attaching should only have to be done once. | |
214 | + | ||
215 | + | To continue booting, try one of the following: | |
216 | + | ||
217 | + | * Register both the backing device and the caching device | |
218 | + | ||
219 | + | # echo /dev/sda3 > /sys/fs/bcache/register | |
220 | + | # echo /dev/sdb > /sys/fs/bcache/register | |
221 | + | ||
222 | + | If the /dev/bcache0 device now exists, type exit and continue booting. You will need to fix your initcpio to ensure devices are registered before mounting the root device. | |
223 | + | ||
224 | + | {{Note| | |
225 | + | * An error of "sh: echo: write error: Invalid argument" means the device was already registered or is not recognized as either a bcache backing device or cache. If using the udev rule on boot it should only attempt to register a device if it finds a bcache superblock | |
226 | + | * This can also happen if using udev's 69-bcache.rules in Installation's step 7 and blkid and bcache-probe "disagree" due to rogue superblocks. See [https://bcache.evilpiepirate.org/#index6h1 bcache's wiki] for a possible explanation/resolution. | |
227 | + | }} | |
228 | + | ||
229 | + | * Re-attach the cache to the backing device: | |
230 | + | ||
231 | + | If the cache device was registered, a folder with the UUID of the cache should exist in {{ic|/sys/fs/bcache}}. Use that UUID when following the example below: | |
232 | + | ||
233 | + | {{hc|# ls /sys/fs/bcache/| | |
234 | + | b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 register register_quiet | |
235 | + | }} | |
236 | + | ||
237 | + | # echo b6b2d82b-f87e-44d5-bbc5-c51dd7aace15 > /sys/block/sda/sda3/bcache/attach | |
238 | + | ||
239 | + | If the {{ic|/dev/bcache0}} device now exists, type exit and continue booting. You should not have to do this again. If it persists, ask on the bcache mailing list. | |
240 | + | ||
241 | + | {{Note|An error of {{ic|sh: echo: write error: Invalid argument}} means the device was already attached. An error of {{ic|sh: echo: write error: No such file or directory}} means the UUID is not a valid cache (make sure you typed it correctly).}} | |
242 | + | ||
243 | + | * Invalidate the cache and force the backing device to run without it. You might want to check some stats, such as "dirty_data" so you have some idea of how much data will be lost. | |
244 | + | ||
245 | + | {{hc|# cat /sys/block/sda/sda3/bcache/dirty_data| | |
246 | + | -3.9M | |
247 | + | }} | |
248 | + | ||
249 | + | dirty data is data in the cache that has not been written to the backing device. If you force the backing device to run, this data will be lost, even if you later re-attach the cache. | |
250 | + | ||
251 | + | {{hc|# cat /sys/block/sda/sda3/bcache/running| | |
252 | + | 0 | |
253 | + | }} | |
254 | + | ||
255 | + | # echo 1 > /sys/block/sda/sda3/bcache/running | |
256 | + | ||
257 | + | The {{ic|/dev/bcache0}} device will now exist. Type exit and continue booting. You might want to unregister the cache device and run make-bcache again. An fsck on {{ic|/dev/bcache0}} would also be wise. See the [https://docs.kernel.org/admin-guide/bcache.html bcache documentation]. | |
258 | + | ||
259 | + | {{Warning|Only invalidate the cache if one of the two options above did not work.}} | |
260 | + | ||
261 | + | === /sys/fs/bcache/ does not exist === | |
262 | + | ||
263 | + | The kernel you booted is not bcache enabled, or you the bcache [[Kernel module#Manual module handling|module is not loaded]] | |
264 | + | ||
265 | + | === write error: Invalid argument when trying to attach a device due to mismatched block parameter === | |
266 | + | ||
267 | + | Given {{ic|bash: echo: write error: Invalid argument}} when trying to attach a device, and the actual error is shown with [[dmesg]]: | |
268 | + | ||
269 | + | bcache: bch_cached_dev_attach() Couldn't attach sdc: block size less than set's block size | |
270 | + | ||
271 | + | This happens because the {{ic|--block 4k}} parameter was not set on either device and defaults can mismatch. | |
272 | + | ||
273 | + | Creating both the backing and caching device in one command automatically solves the issue, but when using separate commands the block size parameter sometimes needs to be set manually on both devices. | |
274 | + | ||
275 | + | === Device or resource busy === | |
276 | + | When a device is in use as a bcache backing device, it can not be formatted nor partitioned: | |
277 | + | # make-bcache -C /dev/sdb1 | |
278 | + | Can't open dev /dev/sdb1: Device or resource busy | |
279 | + | ||
280 | + | # fdisk /dev/sdb | |
281 | + | ||
282 | + | Welcome to fdisk (util-linux 2.37.2). | |
283 | + | Changes will remain in memory only, until you decide to write them. | |
284 | + | Be careful before using the write command. | |
285 | + | ||
286 | + | This disk is currently in use - repartitioning is probably a bad idea. | |
287 | + | It's recommended to umount all file systems, and swapoff all swap | |
288 | + | partitions on this disk. | |
289 | + | ||
290 | + | ||
291 | + | Command (m for help): q | |
292 | + | ||
293 | + | To fix this, first run this command to confirm the disk is actually used as a bcache backing device: | |
294 | + | # bcache-super-show /dev/sdb1 | |
295 | + | sb.magic ok | |
296 | + | sb.first_sector 8 [match] | |
297 | + | sb.csum A3D2B8610F6C5E35 [match] | |
298 | + | sb.version 1 [backing device] | |
299 | + | ||
300 | + | dev.label (empty) | |
301 | + | dev.uuid 5a868788-65a2-4564-b4b7-c1817d0b6080 | |
302 | + | dev.sectors_per_block 1 | |
303 | + | dev.sectors_per_bucket 1024 | |
304 | + | dev.data.first_sector 16 | |
305 | + | dev.data.cache_mode 1 [writeback] | |
306 | + | dev.data.cache_state 2 [dirty] | |
307 | + | ||
308 | + | cset.uuid 42dcb651-6b53-4b65-bc49-9b1ca0acc5b1 | |
309 | + | ||
310 | + | Then stop the backing device. This will also remove the corresponding /dev/bcache device. | |
311 | + | # echo 1 > /sys/class/block/sdb1/bcache/stop | |
312 | + | ||
313 | + | # dmesg | |
314 | + | [ 3171.263577] bcache: bcache_device_free() bcache0 stopped | |
315 | + | Now the device can be partitioned: | |
316 | + | # fdisk /dev/sdb | |
317 | + | ||
318 | + | Welcome to fdisk (util-linux 2.37.2). | |
319 | + | Changes will remain in memory only, until you decide to write them. | |
320 | + | Be careful before using the write command. | |
321 | + | ||
322 | + | ||
323 | + | Command (m for help): q | |
324 | + | When fdisk exits, the kernel scans the drive again, notices it's a bcache backing device, and uses the drive as a backing device. | |
325 | + | # dmesg | |
326 | + | [ 3190.643270] sdb: sdb1 | |
327 | + | [ 3190.833029] bcache: register_bdev() registered backing device sdb1 | |
328 | + | This creates the directory bcache under /sys/class/block/sdb1/ | |
329 | + | # ls /sys/class/block/sdb1/ | |
330 | + | alignment_offset bcache dev discard_alignment holders inflight partition power ro size start stat subsystem uevent | |
331 | + | ||
332 | + | == See also == | |
333 | + | ||
334 | + | * [https://bcache.evilpiepirate.org Bcache Homepage] | |
335 | + | * [https://docs.kernel.org/admin-guide/bcache.html Bcache Manual] | |
336 | + | ||
337 | + | ================================================== | |
338 | + | ||
339 | + | 上面的信息是我从别的地方摘抄的。可能有用,可能没用。可以参考然后回答下面的问题。 | |
340 | + | ||
341 | + | 最近,我的磁盘空间不太够用了。所以我需要你的帮助。我的服务器是 Ubuntu。 | |
342 | + | ||
343 | + | ```bash | |
344 | + | anduin@ms-server:~$ cd /swarm-vol/ | |
345 | + | anduin@ms-server:/swarm-vol$ df . -Th | |
346 | + | Filesystem Type Size Used Avail Use% Mounted on | |
347 | + | /dev/nvme2n1 ext4 7.0T 6.1T 559G 92% /swarm-vol | |
348 | + | anduin@ms-server:/swarm-vol$ cd /swarm-vol/nextcloud/ | |
349 | + | anduin@ms-server:/swarm-vol/nextcloud$ df . -Th | |
350 | + | Filesystem Type Size Used Avail Use% Mounted on | |
351 | + | /dev/nvme0n1 ext4 916G 554G 316G 64% /swarm-vol/nextcloud | |
352 | + | anduin@ms-server:/swarm-vol/nextcloud$ sudo fdisk -l | |
353 | + | Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | |
354 | + | Disk model: INTEL SSDPED1D480GA | |
355 | + | Units: sectors of 1 * 512 = 512 bytes | |
356 | + | Sector size (logical/physical): 512 bytes / 512 bytes | |
357 | + | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
358 | + | Disklabel type: gpt | |
359 | + | Disk identifier: 75C97A6C-09A4-4375-8260-7A950D36C1B4 | |
360 | + | ||
361 | + | Device Start End Sectors Size Type | |
362 | + | /dev/nvme1n1p1 2048 1050623 1048576 512M EFI System | |
363 | + | /dev/nvme1n1p2 1050624 937701375 936650752 446.6G Linux filesystem | |
364 | + | ||
365 | + | ||
366 | + | Disk /dev/nvme2n1: 6.99 TiB, 7681501126656 bytes, 1875366486 sectors | |
367 | + | Disk model: WUS4BB076D7P3E3 | |
368 | + | Units: sectors of 1 * 4096 = 4096 bytes | |
369 | + | Sector size (logical/physical): 4096 bytes / 4096 bytes | |
370 | + | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
371 | + | ||
372 | + | ||
373 | + | Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors | |
374 | + | Disk model: CT1000P3PSSD8 | |
375 | + | Units: sectors of 1 * 512 = 512 bytes | |
376 | + | Sector size (logical/physical): 512 bytes / 512 bytes | |
377 | + | I/O size (minimum/optimal): 512 bytes / 512 bytes | |
378 | + | anduin@ms-server:/swarm-vol/nextcloud$ cd /dev/disk/by-uuid/ | |
379 | + | anduin@ms-server:/dev/disk/by-uuid$ ls -ashl | |
380 | + | total 0 | |
381 | + | 0 drwxr-xr-x 2 root root 140 Jan 17 15:21 . | |
382 | + | 0 drwxr-xr-x 7 root root 140 Dec 28 05:45 .. | |
383 | + | 0 lrwxrwxrwx 1 root root 13 Jan 14 14:00 0377361e-2a7b-4024-a681-ea135c092cce -> ../../nvme0n1 | |
384 | + | 0 lrwxrwxrwx 1 root root 13 Dec 28 05:45 49fd5e45-6074-4370-a95f-c4404920aff5 -> ../../nvme2n1 | |
385 | + | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 9C58-514E -> ../../nvme1n1p1 | |
386 | + | 0 lrwxrwxrwx 1 root root 15 Dec 28 05:45 b91352af-9477-4684-8d08-2a45c39bec98 -> ../../nvme1n1p2 | |
387 | + | anduin@ms-server:/dev/disk/by-uuid$ cat /etc/fstab | |
388 | + | # /etc/fstab: static file system information. | |
389 | + | # | |
390 | + | # Use 'blkid' to print the universally unique identifier for a | |
391 | + | # device; this may be used with UUID= as a more robust way to name devices | |
392 | + | # that works even if disks are added and removed. See fstab(5). | |
393 | + | # | |
394 | + | # <file system> <mount point> <type> <options> <dump> <pass> | |
395 | + | UUID=b91352af-9477-4684-8d08-2a45c39bec98 / ext4 errors=remount-ro 0 1 | |
396 | + | UUID=9C58-514E /boot/efi vfat umask=0077 0 1 | |
397 | + | /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
398 | + | /dev/disk/by-uuid/0377361e-2a7b-4024-a681-ea135c092cce /swarm-vol/nextcloud ext4 defaults,noatime,nofail 0 0 | |
399 | + | /swapfile none swap sw 0 0 | |
400 | + | ``` | |
401 | + | ||
402 | + | 由上面的信息,不难判断出: | |
403 | + | ||
404 | + | 我的系统盘是 b91352af-9477-4684-8d08-2a45c39bec98 ,当然这和我们要调查的内容没什么关系。 | |
405 | + | ||
406 | + | 我的数据都放在了 /swarm-vol 这个文件夹。它背后的磁盘是 `49fd5e45-6074-4370-a95f-c4404920aff5` | |
407 | + | ||
408 | + | 即使我暂时使用奇技淫巧,将 /swarm-vol 下的子文件夹 nextcloud 暂时挪到了 `0377361e-2a7b-4024-a681-ea135c092cce` 下,还是濒临不够了。 | |
409 | + | ||
410 | + | 但是,幸运的是,我购买了一个全新的大而慢的机械硬盘: | |
411 | + | ||
412 | + | ```bash | |
413 | + | Disk /dev/sda: 58.21 TiB, 64003468427264 bytes, 125006774272 sectors | |
414 | + | Disk model: RAID5 | |
415 | + | Units: sectors of 1 * 512 = 512 bytes | |
416 | + | Sector size (logical/physical): 512 bytes / 4096 bytes | |
417 | + | I/O size (minimum/optimal): 4096 bytes / 4096 bytes | |
418 | + | ``` | |
419 | + | ||
420 | + | 为了测试它,我暂时挂载到了这里: | |
421 | + | ||
422 | + | ```bash | |
423 | + | /dev/sda /mnt/temp_big ext4 defaults,noatime,nofail 0 0 | |
424 | + | ``` | |
425 | + | ||
426 | + | 接下来,我认为我需要开始设计我的迁移改造计划。 | |
427 | + | ||
428 | + | 为了充分发挥我过去 49fd5e45-6074-4370-a95f-c4404920aff5,也就是nvme2n1,也就是 /swarm-vol 的快的固态的特性,又能发挥 /dev/sda 大的优点,我计划这样设计: | |
429 | + | ||
430 | + | 使用 bcache 系统,让 /dev/sda 作为真正的存储设备,再让 49fd5e45-6074-4370-a95f-c4404920aff5 作为缓存盘,同时开启写入缓存和阅读缓存,这样我就拥有又大有快的存储了。 | |
431 | + | ||
432 | + | 考虑到我的缓存盘非常大(上面的信息可以得出,它足足有 6.99 TB 对吧?),我相信我可以设置非常激进的写入缓存和阅读缓存。而且我的缓存盘非常可靠,它几乎不会损坏,我也不担心短暂的数据丢失。我又不是银行,就都是电影。 | |
433 | + | ||
434 | + | 接下来,为了方便迁移,我开始设计我的迁移计划: | |
435 | + | ||
436 | + | # 阶段概要 | |
437 | + | ||
438 | + | ## 第一阶段 - 双数据阶段 | |
439 | + | ||
440 | + | 将 sda 格式化清空,作为 bcache 的后端。此时 nvme2n1 继续承载业务数据。不移除它。然后将业务数据使用 rsync 拷贝到 sda 中。 | |
441 | + | ||
442 | + | ## 第二阶段 - 暂停业务阶段 | |
443 | + | ||
444 | + | 将业务暂停,然后我最后运行一次rsync。这次rsync应该会跑得很快,因为只产生了增量数据差异。此时此刻,nvme2n1 (ext4)的数据,和 sda (bacache的后端)的数据是完全相同的。 | |
445 | + | ||
446 | + | ## 第三阶段 - 重构存储阶段 | |
447 | + | ||
448 | + | 将 nvme2n1 格式化。然后让它作为 bcache 的缓存端。再将得到的 bcache 虚拟盘,挂载到 /swarm-vol,实现业务无感。然后重启业务。 | |
449 | + | ||
450 | + | 注意:我没有任何额外的新空间可以用于备份!所以我的命令必须一次成功!一旦失败我们将万劫不复! | |
451 | + | ||
452 | + | ## 第一阶段 | |
453 | + | ||
454 | + | 接下来,我要开始第一阶段的迁移了。我第一阶段计划这么做: | |
455 | + | ||
456 | + | 目标 | |
457 | + | ||
458 | + | * 使用 make-bcache 将 /dev/sda 建立为 bcache 的后端(backing device)。 | |
459 | + | * 先不动现有 /dev/nvme2n1(现挂载于 /swarm-vol)上的业务数据,让业务继续运行。 | |
460 | + | * 格式化出的 /dev/bcache0 上创建一个文件系统(例如 ext4),然后将现有数据从 /swarm-vol 同步到这个新地方。 | |
461 | + | * 这是“第一阶段”,意在让 /dev/sda 上也有一份业务数据拷贝,从而腾出后续的操作空间。 | |
462 | + | ||
463 | + | 结果 | |
464 | + | ||
465 | + | * 最终会拥有两份数据: | |
466 | + | * 原始:/swarm-vol(在 /dev/nvme2n1 上) | |
467 | + | * 新的:/mnt/bcache(对应 /dev/bcache0,后端实际上是 /dev/sda) | |
468 | + | * 业务不中断 | |
469 | + | ||
470 | + | 我可以让服务继续使用 /swarm-vol,只要我在第一阶段只做数据拷贝、而不改动 /swarm-vol 自身。 | |
471 | + | 在第一阶段结束后,等我准备好,可以进入“第二阶段”短暂停机做增量 rsync 以及最终切换。 | |
472 | + | ||
473 | + | ```bash | |
474 | + | # 安装 bcache-tools | |
475 | + | sudo apt install bcache-tools | |
476 | + | ||
477 | + | # 仅示例,注意操作前先确认 /dev/sda 确实空置 | |
478 | + | # (在 fdisk 交互式命令中,删除旧分区、新建分区) | |
479 | + | sudo fdisk /dev/sda | |
480 | + | ||
481 | + | # 使用 wipefs 清除 sda 上的所有签名 | |
482 | + | sudo wipefs -a /dev/sda | |
483 | + | ||
484 | + | # 创建 bcache 后端 | |
485 | + | sudo make-bcache -B /dev/sda | |
486 | + | ||
487 | + | # 确认后端创建成功 | |
488 | + | # UUID: d5a45ab0-60b2-4f3a-8cf1-4d4ca97c018c | |
489 | + | # Set UUID: 01442457-240d-4bf4-8140-b7a647659beb | |
490 | + | # version: 1 | |
491 | + | # block_size: 1 | |
492 | + | # data_offset: 16 | |
493 | + | ||
494 | + | # 格式化后端 | |
495 | + | ls -ashl /dev/bcache0 | |
496 | + | sudo mkfs.ext4 /dev/bcache0 | |
497 | + | ||
498 | + | # 创建挂载点 | |
499 | + | sudo mkdir /mnt/bcache | |
500 | + | ||
501 | + | # 挂载 bcache 后端 | |
502 | + | sudo mount /dev/bcache0 /mnt/bcache | |
503 | + | ||
504 | + | # 确认挂载成功 | |
505 | + | cd /mnt/bcache | |
506 | + | ||
507 | + | # 确认挂载成功 | |
508 | + | df . -Th | |
509 | + | ||
510 | + | # (确认挂载成功后,开始 rsync) | |
511 | + | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
512 | + | ||
513 | + | # 同步 nextcloud 文件夹 | |
514 | + | sudo rsync -Aavx --update --delete /swarm-vol/nexcloud/ /mnt/bcache/swarm-vol/ | |
515 | + | ``` | |
516 | + | ||
517 | + | 下面给出示例脚本,供我在“第二阶段”和“第三阶段”中参考使用。思路与第一阶段相同:谨慎操作、一次成功,避免数据丢失。 | |
518 | + | ||
519 | + | --- | |
520 | + | ||
521 | + | ## 第二阶段 - 暂停业务并做最终同步 | |
522 | + | ||
523 | + | 在这一阶段,我将: | |
524 | + | 1. 暂停业务,使其不再写入 `/swarm-vol`(也就是旧的 nvme2n1)。 | |
525 | + | 2. 做最后一次增量 rsync,保证数据在 /dev/bcache0(后端 sda)上与旧数据完全一致。 | |
526 | + | 3. 卸载旧的 `/swarm-vol`,改为挂载新的 `/dev/bcache0` 到 `/swarm-vol`,这样就完成了切换。 | |
527 | + | ||
528 | + | **示例脚本**(在生产环境中,请根据自己实际服务的暂停方式作相应调整): | |
529 | + | ||
530 | + | ```bash | |
531 | + | # 1) 暂停业务 | |
532 | + | echo "停止相关业务/服务 (示例:docker-compose 或 systemctl stop 等)" | |
533 | + | docker-compose down | |
534 | + | sudo reboot # 重启服务器,确保业务不再写入 | |
535 | + | ||
536 | + | # 2) 做最后一次增量同步 | |
537 | + | sudo rsync -Aavx --update --delete /swarm-vol/ /mnt/bcache/ | |
538 | + | sudo rsync -Aavx --update --delete /swarm-vol/nextcloud/ /mnt/bcache/swarm-vol/ | |
539 | + | ||
540 | + | # 3) 切换挂载点 | |
541 | + | sudo umount /swarm-vol | |
542 | + | ||
543 | + | echo "将 bcache0 挂载为新的 /swarm-vol..." | |
544 | + | sudo mount /dev/bcache0 /swarm-vol | |
545 | + | ||
546 | + | echo "检查挂载..." | |
547 | + | df -Th /swarm-vol | |
548 | + | ||
549 | + | echo "请人工确认 /swarm-vol 中的数据完整性;若无误,可以继续。" | |
550 | + | ``` | |
551 | + | ||
552 | + | 在执行完成后,`/swarm-vol` 已经切换到基于 `/dev/bcache0`(后端是 `/dev/sda`)的存储,业务就可以使用这套新存储。此时 `nvme2n1` 上的原有 ext4 数据已不再对外提供服务,但仍在物理上保留(尚未被清空)。 | |
553 | + | ||
554 | + | --- | |
555 | + | ||
556 | + | ## 第三阶段 - 将原 nvme2n1 作为 bcache 缓存设备 | |
557 | + | ||
558 | + | 在这一阶段,我将: | |
559 | + | 1. 确认 `/swarm-vol` 已经切换成功、业务运行正常且数据安全无误。 | |
560 | + | 2. 清空并格式化原本的 `nvme2n1` 为 bcache 缓存盘。 | |
561 | + | 3. 将缓存盘附加到已经存在的 bcache 后端(即 `/dev/sda`)上,使两者变为真正的 “大容量 + SSD 缓存” 组合。 | |
562 | + | 4. 根据需求,启用写回缓存(writeback)等激进模式。 | |
563 | + | ||
564 | + | **示例脚本**: | |
565 | + | ||
566 | + | ```bash | |
567 | + | # 1) 确认当前 /swarm-vol 已经是 /dev/bcache0,且业务正常 | |
568 | + | # (需人工自行验证,确认数据已在 /dev/sda + /dev/bcache0 上) | |
569 | + | # 此时可以停一下业务,或保持低负载也行,避免写入影响。 | |
570 | + | ||
571 | + | # 2) 清空 nvme2n1 (原来的 /swarm-vol) 注意,这将销毁原数据! | |
572 | + | echo "准备清空 /dev/nvme2n1..." | |
573 | + | sudo umount /dev/nvme2n1 || true # 若尚未卸载,可忽略报错 | |
574 | + | sudo wipefs -a /dev/nvme2n1 | |
575 | + | ||
576 | + | # 3) 将 nvme2n1 作为缓存盘初始化 | |
577 | + | echo "对 /dev/nvme2n1 执行 make-bcache -C(cache)..." | |
578 | + | sudo make-bcache -C /dev/nvme2n1 | |
579 | + | # 若有需要,可以带上 --block/--bucket 参数。例如: | |
580 | + | # sudo make-bcache --block 4k --bucket 2M -C /dev/nvme2n1 | |
581 | + | ||
582 | + | echo "检查生成的缓存盘信息..." | |
583 | + | sudo bcache-super-show /dev/nvme2n1 | grep -E "cset.uuid|dev.uuid" | |
584 | + | ||
585 | + | # 假设输出中 cset.uuid (或 dev.uuid) 为 11111111-2222-3333-4444-555555555555 | |
586 | + | # (这里仅演示,我需要看实际输出) | |
587 | + | ||
588 | + | CACHE_UUID="(此处填上实际的 cset.uuid)" | |
589 | + | ||
590 | + | # 4) 将缓存设备附加到现有的 /dev/bcache0(后端 /dev/sda) | |
591 | + | # /dev/bcache0 的 sysfs 路径可通过 ls /sys/block/bcache0/bcache 等命令确认 | |
592 | + | echo "附加缓存到现有 bcache 后端..." | |
593 | + | echo "$CACHE_UUID" | sudo tee /sys/block/bcache0/bcache/attach | |
594 | + | ||
595 | + | # 如果我看到 echo: write error: Invalid argument,通常是 block size 不匹配等问题 | |
596 | + | # 如果成功,则 /sys/block/bcache0/bcache/cache_mode 等节点应该出现 | |
597 | + | ||
598 | + | # 5) 为 bcache0 启用写回缓存模式(可选) | |
599 | + | echo "启用写回 (writeback) 缓存模式..." | |
600 | + | echo writeback | sudo tee /sys/block/bcache0/bcache/cache_mode | |
601 | + | ||
602 | + | # 可选:关闭顺序IO绕过等更激进的做法 | |
603 | + | # echo 0 | sudo tee /sys/block/bcache0/bcache/sequential_cutoff | |
604 | + | # echo 0 | sudo tee /sys/block/bcache0/bcache/writeback_percent | |
605 | + | ||
606 | + | # 6) 确认缓存已生效 | |
607 | + | echo "确认 /dev/bcache0 依旧正常挂载在 /swarm-vol,并检查 sysfs 等信息:" | |
608 | + | mount | grep /swarm-vol | |
609 | + | ls -l /sys/block/bcache0/bcache | |
610 | + | ``` | |
611 | + | ||
612 | + | 至此,我已经完成了将旧的 nvme2n1 转变为 bcache 缓存设备的操作,并和 `/dev/sda` 组合为统一的逻辑卷 `/dev/bcache0`。接下来的要点包括: | |
613 | + | ||
614 | + | 1. **开机自动挂载** | |
615 | + | - 通常推荐在 `/etc/fstab` 中写入对 `/dev/bcache0` 的挂载。 | |
616 | + | - 同时需要注意在 initramfs 阶段加载 bcache 模块、或者确保 `bcache-tools` 的 udev 规则可以自动将 cache attach 到 backing device(以免重启后没了 /dev/bcache0)。在 Ubuntu 下,一般可通过 `sudo update-initramfs -u` 并检查 `/lib/udev/rules.d/69-bcache.rules` 等来确认。 | |
617 | + | ||
618 | + | 在 `/etc/fsabt` 中添加: | |
619 | + | ||
620 | + | ```bash | |
621 | + | # 删除旧的 /swarm-vol 挂载 | |
622 | + | # /dev/disk/by-uuid/49fd5e45-6074-4370-a95f-c4404920aff5 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
623 | + | # 然后添加新的 /swarm-vol 挂载 | |
624 | + | /dev/bcache0 /swarm-vol ext4 defaults,noatime,nofail 0 0 | |
625 | + | ``` | |
626 | + | ||
627 | + | 2. **确认写回模式的风险** | |
628 | + | - 写回模式(writeback)可以大幅提高速度,但在缓存盘掉电或故障时会丢失尚未写入后端的脏数据。既然我提到 SSD 质量较好,且并不特别在意短期丢失风险,可以大胆使用。 | |
629 | + | ||
630 | + | 3. **调优与监控** | |
631 | + | - 适当调节 `writeback_percent`、`sequential_cutoff` 等 sysfs 参数可以获得性能与风险的平衡。 | |
632 | + | - 还可以用 `dstat -D nvme2n1,sda` 或者 `iostat -xm 1` 来观察实际读写流量和缓存命中情况。 | |
633 | + | ||
634 | + | 完成后,我就拥有一个**后端极大(/dev/sda)+ 前端极快(/dev/nvme2n1 作为缓存)**的综合存储系统,挂载于 `/swarm-vol`。这样就达到了我预想的“又大又快”的目的。 |