Skip to content

Commit b77b4a4

Browse files
author
Andreas Gruenbacher
committed
gfs2: Rework freeze / thaw logic
So far, at mount time, gfs2 would take the freeze glock in shared mode and then immediately drop it again, turning it into a cached glock that can be reclaimed at any time. To freeze the filesystem cluster-wide, the node initiating the freeze would take the freeze glock in exclusive mode, which would cause the freeze glock's freeze_go_sync() callback to run on each node. There, gfs2 would freeze the filesystem and schedule gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the freeze glock in shared mode, thaw the filesystem, and drop the freeze glock again. The initiating node would keep the freeze glock held in exclusive mode. To thaw the filesystem, the initiating node would drop the freeze glock again, which would allow gfs2_freeze_func() to resume on all nodes, leaving the filesystem in the thawed state. It turns out that in freeze_go_sync(), we cannot reliably and safely freeze the filesystem. This is primarily because the final unmount of a filesystem takes a write lock on the s_umount rw semaphore before calling into gfs2_put_super(), and freeze_go_sync() needs to call freeze_super() which also takes a write lock on the same semaphore, causing a deadlock. We could work around this by trying to take an active reference on the super block first, which would prevent unmount from running at the same time. But that can fail, and freeze_go_sync() isn't actually allowed to fail. To get around this, this patch changes the freeze glock locking scheme as follows: At mount time, each node takes the freeze glock in shared mode. To freeze a filesystem, the initiating node first freezes the filesystem locally and then drops and re-acquires the freeze glock in exclusive mode. All other nodes notice that there is contention on the freeze glock in their go_callback callbacks, and they schedule gfs2_freeze_func() to run. There, they freeze the filesystem locally and drop and re-acquire the freeze glock before re-thawing the filesystem. This is happening outside of the glock state engine, so there, we are allowed to fail. From a cluster point of view, taking and immediately dropping a glock is indistinguishable from taking the glock and only dropping it upon contention, so this new scheme is compatible with the old one. Thanks to Li Dong <[email protected]> for reporting a locking bug in gfs2_freeze_func() in a previous version of this commit. Signed-off-by: Andreas Gruenbacher <[email protected]>
1 parent cad1e15 commit b77b4a4

File tree

7 files changed

+178
-110
lines changed

7 files changed

+178
-110
lines changed

fs/gfs2/glops.c

+19-33
Original file line numberDiff line numberDiff line change
@@ -561,47 +561,33 @@ static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl,
561561
}
562562

563563
/**
564-
* freeze_go_sync - promote/demote the freeze glock
564+
* freeze_go_callback - A cluster node is requesting a freeze
565565
* @gl: the glock
566+
* @remote: true if this came from a different cluster node
566567
*/
567568

568-
static int freeze_go_sync(struct gfs2_glock *gl)
569+
static void freeze_go_callback(struct gfs2_glock *gl, bool remote)
569570
{
570-
int error = 0;
571571
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
572+
struct super_block *sb = sdp->sd_vfs;
573+
574+
if (!remote ||
575+
gl->gl_state != LM_ST_SHARED ||
576+
gl->gl_demote_state != LM_ST_UNLOCKED)
577+
return;
572578

573579
/*
574-
* We need to check gl_state == LM_ST_SHARED here and not gl_req ==
575-
* LM_ST_EXCLUSIVE. That's because when any node does a freeze,
576-
* all the nodes should have the freeze glock in SH mode and they all
577-
* call do_xmote: One for EX and the others for UN. They ALL must
578-
* freeze locally, and they ALL must queue freeze work. The freeze_work
579-
* calls freeze_func, which tries to reacquire the freeze glock in SH,
580-
* effectively waiting for the thaw on the node who holds it in EX.
581-
* Once thawed, the work func acquires the freeze glock in
582-
* SH and everybody goes back to thawed.
580+
* Try to get an active super block reference to prevent racing with
581+
* unmount (see trylock_super()). But note that unmount isn't the only
582+
* place where a write lock on s_umount is taken, and we can fail here
583+
* because of things like remount as well.
583584
*/
584-
if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) &&
585-
!test_bit(SDF_NORECOVERY, &sdp->sd_flags)) {
586-
atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
587-
error = freeze_super(sdp->sd_vfs);
588-
if (error) {
589-
fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
590-
error);
591-
if (gfs2_withdrawn(sdp)) {
592-
atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
593-
return 0;
594-
}
595-
gfs2_assert_withdraw(sdp, 0);
596-
}
597-
queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work);
598-
if (test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))
599-
gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_FREEZE |
600-
GFS2_LFC_FREEZE_GO_SYNC);
601-
else /* read-only mounts */
602-
atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
585+
if (down_read_trylock(&sb->s_umount)) {
586+
atomic_inc(&sb->s_active);
587+
up_read(&sb->s_umount);
588+
if (!queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work))
589+
deactivate_super(sb);
603590
}
604-
return 0;
605591
}
606592

607593
/**
@@ -761,9 +747,9 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = {
761747
};
762748

763749
const struct gfs2_glock_operations gfs2_freeze_glops = {
764-
.go_sync = freeze_go_sync,
765750
.go_xmote_bh = freeze_go_xmote_bh,
766751
.go_demote_ok = freeze_go_demote_ok,
752+
.go_callback = freeze_go_callback,
767753
.go_type = LM_TYPE_NONDISK,
768754
.go_flags = GLOF_NONDISK,
769755
};

fs/gfs2/log.c

-2
Original file line numberDiff line numberDiff line change
@@ -1136,8 +1136,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
11361136
if (flags & (GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
11371137
GFS2_LOG_HEAD_FLUSH_FREEZE))
11381138
gfs2_log_shutdown(sdp);
1139-
if (flags & GFS2_LOG_HEAD_FLUSH_FREEZE)
1140-
atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
11411139
}
11421140

11431141
out_end:

fs/gfs2/ops_fstype.c

+2-3
Original file line numberDiff line numberDiff line change
@@ -1140,7 +1140,6 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
11401140
int silent = fc->sb_flags & SB_SILENT;
11411141
struct gfs2_sbd *sdp;
11421142
struct gfs2_holder mount_gh;
1143-
struct gfs2_holder freeze_gh;
11441143
int error;
11451144

11461145
sdp = init_sbd(sb);
@@ -1269,15 +1268,15 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
12691268
}
12701269
}
12711270

1272-
error = gfs2_freeze_lock_shared(sdp, &freeze_gh, 0);
1271+
error = gfs2_freeze_lock_shared(sdp, &sdp->sd_freeze_gh, 0);
12731272
if (error)
12741273
goto fail_per_node;
12751274

12761275
if (!sb_rdonly(sb))
12771276
error = gfs2_make_fs_rw(sdp);
12781277

1279-
gfs2_freeze_unlock(&freeze_gh);
12801278
if (error) {
1279+
gfs2_freeze_unlock(&sdp->sd_freeze_gh);
12811280
if (sdp->sd_quotad_process)
12821281
kthread_stop(sdp->sd_quotad_process);
12831282
sdp->sd_quotad_process = NULL;

fs/gfs2/recovery.c

+12-12
Original file line numberDiff line numberDiff line change
@@ -404,7 +404,7 @@ void gfs2_recover_func(struct work_struct *work)
404404
struct gfs2_inode *ip = GFS2_I(jd->jd_inode);
405405
struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
406406
struct gfs2_log_header_host head;
407-
struct gfs2_holder j_gh, ji_gh, thaw_gh;
407+
struct gfs2_holder j_gh, ji_gh;
408408
ktime_t t_start, t_jlck, t_jhd, t_tlck, t_rep;
409409
int ro = 0;
410410
unsigned int pass;
@@ -465,14 +465,14 @@ void gfs2_recover_func(struct work_struct *work)
465465
ktime_ms_delta(t_jhd, t_jlck));
466466

467467
if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
468-
fs_info(sdp, "jid=%u: Acquiring the freeze glock...\n",
469-
jd->jd_jid);
470-
471-
/* Acquire a shared hold on the freeze glock */
468+
mutex_lock(&sdp->sd_freeze_mutex);
472469

473-
error = gfs2_freeze_lock_shared(sdp, &thaw_gh, LM_FLAG_PRIORITY);
474-
if (error)
470+
if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN) {
471+
mutex_unlock(&sdp->sd_freeze_mutex);
472+
fs_warn(sdp, "jid=%u: Can't replay: filesystem "
473+
"is frozen\n", jd->jd_jid);
475474
goto fail_gunlock_ji;
475+
}
476476

477477
if (test_bit(SDF_RORECOVERY, &sdp->sd_flags)) {
478478
ro = 1;
@@ -496,7 +496,7 @@ void gfs2_recover_func(struct work_struct *work)
496496
fs_warn(sdp, "jid=%u: Can't replay: read-only block "
497497
"device\n", jd->jd_jid);
498498
error = -EROFS;
499-
goto fail_gunlock_thaw;
499+
goto fail_gunlock_nofreeze;
500500
}
501501

502502
t_tlck = ktime_get();
@@ -514,15 +514,15 @@ void gfs2_recover_func(struct work_struct *work)
514514
lops_after_scan(jd, error, pass);
515515
if (error) {
516516
up_read(&sdp->sd_log_flush_lock);
517-
goto fail_gunlock_thaw;
517+
goto fail_gunlock_nofreeze;
518518
}
519519
}
520520

521521
recover_local_statfs(jd, &head);
522522
clean_journal(jd, &head);
523523
up_read(&sdp->sd_log_flush_lock);
524524

525-
gfs2_freeze_unlock(&thaw_gh);
525+
mutex_unlock(&sdp->sd_freeze_mutex);
526526
t_rep = ktime_get();
527527
fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "
528528
"jhead:%lldms, tlck:%lldms, replay:%lldms]\n",
@@ -543,8 +543,8 @@ void gfs2_recover_func(struct work_struct *work)
543543
fs_info(sdp, "jid=%u: Done\n", jd->jd_jid);
544544
goto done;
545545

546-
fail_gunlock_thaw:
547-
gfs2_freeze_unlock(&thaw_gh);
546+
fail_gunlock_nofreeze:
547+
mutex_unlock(&sdp->sd_freeze_mutex);
548548
fail_gunlock_ji:
549549
if (jlocked) {
550550
gfs2_glock_dq_uninit(&ji_gh);

0 commit comments

Comments
 (0)