2018-09-12 09:16:07 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-29 13:28:09 +09:00
|
|
|
/*
|
2012-11-02 17:10:40 +09:00
|
|
|
* fs/f2fs/inode.c
|
|
|
|
|
*
|
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
|
* http://www.samsung.com/
|
|
|
|
|
*/
|
|
|
|
|
#include <linux/fs.h>
|
|
|
|
|
#include <linux/f2fs_fs.h>
|
|
|
|
|
#include <linux/writeback.h>
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14 14:07:14 -08:00
|
|
|
#include <linux/sched/mm.h>
|
2023-04-08 02:31:47 +08:00
|
|
|
#include <linux/lz4.h>
|
|
|
|
|
#include <linux/zstd.h>
|
2012-11-02 17:10:40 +09:00
|
|
|
|
|
|
|
|
#include "f2fs.h"
|
|
|
|
|
#include "node.h"
|
2017-06-14 23:00:56 +08:00
|
|
|
#include "segment.h"
|
2019-03-04 17:19:04 +08:00
|
|
|
#include "xattr.h"
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2013-04-20 01:28:40 +09:00
|
|
|
#include <trace/events/f2fs.h>
|
|
|
|
|
|
2021-05-20 19:51:50 +08:00
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
|
extern const struct address_space_operations f2fs_compress_aops;
|
|
|
|
|
#endif
|
|
|
|
|
|
2016-10-14 11:51:23 -07:00
|
|
|
void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
|
2016-06-30 19:09:37 -07:00
|
|
|
{
|
2018-01-11 11:26:19 +09:00
|
|
|
if (is_inode_flag_set(inode, FI_NEW_INODE))
|
|
|
|
|
return;
|
|
|
|
|
|
2024-06-04 15:56:36 +08:00
|
|
|
if (f2fs_readonly(F2FS_I_SB(inode)->sb))
|
|
|
|
|
return;
|
|
|
|
|
|
2016-10-14 11:51:23 -07:00
|
|
|
if (f2fs_inode_dirtied(inode, sync))
|
2016-06-30 19:09:37 -07:00
|
|
|
return;
|
2016-10-14 11:51:23 -07:00
|
|
|
|
2025-03-27 13:56:06 +08:00
|
|
|
/* only atomic file w/ FI_ATOMIC_COMMITTED can be set vfs dirty */
|
|
|
|
|
if (f2fs_is_atomic_file(inode) &&
|
|
|
|
|
!is_inode_flag_set(inode, FI_ATOMIC_COMMITTED))
|
2024-09-04 08:33:06 -07:00
|
|
|
return;
|
|
|
|
|
|
2016-06-30 19:09:37 -07:00
|
|
|
mark_inode_dirty_sync(inode);
|
|
|
|
|
}
|
|
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
void f2fs_set_inode_flags(struct inode *inode)
|
|
|
|
|
{
|
|
|
|
|
unsigned int flags = F2FS_I(inode)->i_flags;
|
2014-04-15 14:19:38 +08:00
|
|
|
unsigned int new_fl = 0;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2018-04-03 15:08:17 +08:00
|
|
|
if (flags & F2FS_SYNC_FL)
|
2014-04-15 14:19:38 +08:00
|
|
|
new_fl |= S_SYNC;
|
2018-04-03 15:08:17 +08:00
|
|
|
if (flags & F2FS_APPEND_FL)
|
2014-04-15 14:19:38 +08:00
|
|
|
new_fl |= S_APPEND;
|
2018-04-03 15:08:17 +08:00
|
|
|
if (flags & F2FS_IMMUTABLE_FL)
|
2014-04-15 14:19:38 +08:00
|
|
|
new_fl |= S_IMMUTABLE;
|
2018-04-03 15:08:17 +08:00
|
|
|
if (flags & F2FS_NOATIME_FL)
|
2014-04-15 14:19:38 +08:00
|
|
|
new_fl |= S_NOATIME;
|
2018-04-03 15:08:17 +08:00
|
|
|
if (flags & F2FS_DIRSYNC_FL)
|
2014-04-15 14:19:38 +08:00
|
|
|
new_fl |= S_DIRSYNC;
|
2018-12-12 15:20:11 +05:30
|
|
|
if (file_is_encrypt(inode))
|
2017-10-09 12:15:35 -07:00
|
|
|
new_fl |= S_ENCRYPTED;
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-22 09:26:24 -07:00
|
|
|
if (file_is_verity(inode))
|
|
|
|
|
new_fl |= S_VERITY;
|
f2fs: Support case-insensitive file name lookups
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-23 16:05:29 -07:00
|
|
|
if (flags & F2FS_CASEFOLD_FL)
|
|
|
|
|
new_fl |= S_CASEFOLD;
|
2015-08-24 10:41:32 +08:00
|
|
|
inode_set_flags(inode, new_fl,
|
2017-10-09 12:15:35 -07:00
|
|
|
S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
|
2019-09-21 14:26:33 -07:00
|
|
|
S_ENCRYPTED|S_VERITY|S_CASEFOLD);
|
2012-11-02 17:10:40 +09:00
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:12:49 +01:00
|
|
|
static void __get_inode_rdev(struct inode *inode, struct folio *node_folio)
|
2013-10-08 18:01:51 +09:00
|
|
|
{
|
2025-03-31 21:12:49 +01:00
|
|
|
__le32 *addr = get_dnode_addr(inode, node_folio);
|
2017-07-19 00:19:06 +08:00
|
|
|
|
2013-10-08 18:01:51 +09:00
|
|
|
if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
|
|
|
|
|
S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
|
2023-12-10 17:20:37 +08:00
|
|
|
if (addr[0])
|
|
|
|
|
inode->i_rdev = old_decode_dev(le32_to_cpu(addr[0]));
|
2013-10-08 18:01:51 +09:00
|
|
|
else
|
2023-12-10 17:20:37 +08:00
|
|
|
inode->i_rdev = new_decode_dev(le32_to_cpu(addr[1]));
|
2013-10-08 18:01:51 +09:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:12:49 +01:00
|
|
|
static void __set_inode_rdev(struct inode *inode, struct folio *node_folio)
|
2015-03-17 17:16:35 -07:00
|
|
|
{
|
2025-03-31 21:12:49 +01:00
|
|
|
__le32 *addr = get_dnode_addr(inode, node_folio);
|
2017-07-19 00:19:06 +08:00
|
|
|
|
2013-10-08 18:01:51 +09:00
|
|
|
if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
|
|
|
|
|
if (old_valid_dev(inode->i_rdev)) {
|
2023-12-10 17:20:37 +08:00
|
|
|
addr[0] = cpu_to_le32(old_encode_dev(inode->i_rdev));
|
|
|
|
|
addr[1] = 0;
|
2013-10-08 18:01:51 +09:00
|
|
|
} else {
|
2023-12-10 17:20:37 +08:00
|
|
|
addr[0] = 0;
|
|
|
|
|
addr[1] = cpu_to_le32(new_encode_dev(inode->i_rdev));
|
|
|
|
|
addr[2] = 0;
|
2013-10-08 18:01:51 +09:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:12:32 +01:00
|
|
|
static void __recover_inline_status(struct inode *inode, struct folio *ifolio)
|
2014-10-23 19:48:09 -07:00
|
|
|
{
|
2025-03-31 21:12:33 +01:00
|
|
|
void *inline_data = inline_data_addr(inode, ifolio);
|
2015-01-06 14:28:43 +08:00
|
|
|
__le32 *start = inline_data;
|
2017-07-19 00:19:05 +08:00
|
|
|
__le32 *end = start + MAX_INLINE_DATA(inode) / sizeof(__le32);
|
2014-10-23 19:48:09 -07:00
|
|
|
|
2015-01-06 14:28:43 +08:00
|
|
|
while (start < end) {
|
|
|
|
|
if (*start++) {
|
2025-03-31 21:12:32 +01:00
|
|
|
f2fs_folio_wait_writeback(ifolio, NODE, true, true);
|
2014-10-23 19:48:09 -07:00
|
|
|
|
2016-05-20 10:13:22 -07:00
|
|
|
set_inode_flag(inode, FI_DATA_EXIST);
|
2025-07-08 18:03:06 +01:00
|
|
|
set_raw_inline(inode, F2FS_INODE(ifolio));
|
2025-03-31 21:12:32 +01:00
|
|
|
folio_mark_dirty(ifolio);
|
2015-01-06 14:28:43 +08:00
|
|
|
return;
|
|
|
|
|
}
|
2014-10-23 19:48:09 -07:00
|
|
|
}
|
2015-01-06 14:28:43 +08:00
|
|
|
return;
|
2014-10-23 19:48:09 -07:00
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:15 +01:00
|
|
|
static
|
|
|
|
|
bool f2fs_enable_inode_chksum(struct f2fs_sb_info *sbi, struct folio *folio)
|
2017-07-31 20:19:09 +08:00
|
|
|
{
|
2025-07-08 18:03:49 +01:00
|
|
|
struct f2fs_inode *ri = &F2FS_NODE(folio)->i;
|
2017-07-31 20:19:09 +08:00
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (!f2fs_sb_has_inode_chksum(sbi))
|
2017-07-31 20:19:09 +08:00
|
|
|
return false;
|
|
|
|
|
|
2025-07-08 18:03:34 +01:00
|
|
|
if (!IS_INODE(folio) || !(ri->i_inline & F2FS_EXTRA_ATTR))
|
2017-07-31 20:19:09 +08:00
|
|
|
return false;
|
|
|
|
|
|
2018-04-14 01:02:34 +08:00
|
|
|
if (!F2FS_FITS_IN_INODE(ri, le16_to_cpu(ri->i_extra_isize),
|
|
|
|
|
i_inode_checksum))
|
2017-07-31 20:19:09 +08:00
|
|
|
return false;
|
|
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:16 +01:00
|
|
|
static __u32 f2fs_inode_chksum(struct f2fs_sb_info *sbi, struct folio *folio)
|
2017-07-31 20:19:09 +08:00
|
|
|
{
|
2025-07-08 18:03:49 +01:00
|
|
|
struct f2fs_node *node = F2FS_NODE(folio);
|
2017-07-31 20:19:09 +08:00
|
|
|
struct f2fs_inode *ri = &node->i;
|
|
|
|
|
__le32 ino = node->footer.ino;
|
|
|
|
|
__le32 gen = ri->i_generation;
|
|
|
|
|
__u32 chksum, chksum_seed;
|
|
|
|
|
__u32 dummy_cs = 0;
|
|
|
|
|
unsigned int offset = offsetof(struct f2fs_inode, i_inode_checksum);
|
|
|
|
|
unsigned int cs_size = sizeof(dummy_cs);
|
|
|
|
|
|
2025-05-12 22:48:25 -07:00
|
|
|
chksum = f2fs_chksum(sbi->s_chksum_seed, (__u8 *)&ino, sizeof(ino));
|
|
|
|
|
chksum_seed = f2fs_chksum(chksum, (__u8 *)&gen, sizeof(gen));
|
2017-07-31 20:19:09 +08:00
|
|
|
|
2025-05-12 22:48:25 -07:00
|
|
|
chksum = f2fs_chksum(chksum_seed, (__u8 *)ri, offset);
|
|
|
|
|
chksum = f2fs_chksum(chksum, (__u8 *)&dummy_cs, cs_size);
|
2017-07-31 20:19:09 +08:00
|
|
|
offset += cs_size;
|
2025-05-12 22:48:25 -07:00
|
|
|
chksum = f2fs_chksum(chksum, (__u8 *)ri + offset,
|
|
|
|
|
F2FS_BLKSIZE - offset);
|
2017-07-31 20:19:09 +08:00
|
|
|
return chksum;
|
|
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:11:27 +01:00
|
|
|
bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct folio *folio)
|
2017-07-31 20:19:09 +08:00
|
|
|
{
|
|
|
|
|
struct f2fs_inode *ri;
|
|
|
|
|
__u32 provided, calculated;
|
|
|
|
|
|
2018-06-21 13:46:23 -07:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)))
|
|
|
|
|
return true;
|
|
|
|
|
|
2018-03-09 23:10:21 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2025-07-08 18:03:15 +01:00
|
|
|
if (!f2fs_enable_inode_chksum(sbi, folio))
|
2018-03-09 23:10:21 +08:00
|
|
|
#else
|
2025-07-08 18:03:15 +01:00
|
|
|
if (!f2fs_enable_inode_chksum(sbi, folio) ||
|
2025-03-31 21:11:27 +01:00
|
|
|
folio_test_dirty(folio) ||
|
|
|
|
|
folio_test_writeback(folio))
|
2018-03-09 23:10:21 +08:00
|
|
|
#endif
|
2017-07-31 20:19:09 +08:00
|
|
|
return true;
|
|
|
|
|
|
2025-07-08 18:03:49 +01:00
|
|
|
ri = &F2FS_NODE(folio)->i;
|
2017-07-31 20:19:09 +08:00
|
|
|
provided = le32_to_cpu(ri->i_inode_checksum);
|
2025-07-08 18:03:16 +01:00
|
|
|
calculated = f2fs_inode_chksum(sbi, folio);
|
2017-07-31 20:19:09 +08:00
|
|
|
|
|
|
|
|
if (provided != calculated)
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "checksum invalid, nid = %lu, ino_of_node = %x, %x vs. %x",
|
2025-07-08 18:03:07 +01:00
|
|
|
folio->index, ino_of_node(folio),
|
2024-08-20 22:55:07 +08:00
|
|
|
provided, calculated);
|
2017-07-31 20:19:09 +08:00
|
|
|
|
|
|
|
|
return provided == calculated;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:14 +01:00
|
|
|
void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct folio *folio)
|
2017-07-31 20:19:09 +08:00
|
|
|
{
|
2025-07-08 18:03:49 +01:00
|
|
|
struct f2fs_inode *ri = &F2FS_NODE(folio)->i;
|
2017-07-31 20:19:09 +08:00
|
|
|
|
2025-07-08 18:03:15 +01:00
|
|
|
if (!f2fs_enable_inode_chksum(sbi, folio))
|
2017-07-31 20:19:09 +08:00
|
|
|
return;
|
|
|
|
|
|
2025-07-08 18:03:16 +01:00
|
|
|
ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, folio));
|
2017-07-31 20:19:09 +08:00
|
|
|
}
|
|
|
|
|
|
2023-04-08 02:31:47 +08:00
|
|
|
static bool sanity_check_compress_inode(struct inode *inode,
|
|
|
|
|
struct f2fs_inode *ri)
|
|
|
|
|
{
|
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
|
unsigned char clevel;
|
|
|
|
|
|
|
|
|
|
if (ri->i_compress_algorithm >= COMPRESS_MAX) {
|
|
|
|
|
f2fs_warn(sbi,
|
|
|
|
|
"%s: inode (ino=%lx) has unsupported compress algorithm: %u, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, ri->i_compress_algorithm);
|
2023-08-21 23:22:23 +08:00
|
|
|
return false;
|
2023-04-08 02:31:47 +08:00
|
|
|
}
|
|
|
|
|
if (le64_to_cpu(ri->i_compr_blocks) >
|
|
|
|
|
SECTOR_TO_BLOCK(inode->i_blocks)) {
|
|
|
|
|
f2fs_warn(sbi,
|
|
|
|
|
"%s: inode (ino=%lx) has inconsistent i_compr_blocks:%llu, i_blocks:%llu, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, le64_to_cpu(ri->i_compr_blocks),
|
|
|
|
|
SECTOR_TO_BLOCK(inode->i_blocks));
|
2023-08-21 23:22:23 +08:00
|
|
|
return false;
|
2023-04-08 02:31:47 +08:00
|
|
|
}
|
|
|
|
|
if (ri->i_log_cluster_size < MIN_COMPRESS_LOG_SIZE ||
|
|
|
|
|
ri->i_log_cluster_size > MAX_COMPRESS_LOG_SIZE) {
|
|
|
|
|
f2fs_warn(sbi,
|
|
|
|
|
"%s: inode (ino=%lx) has unsupported log cluster size: %u, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, ri->i_log_cluster_size);
|
2023-08-21 23:22:23 +08:00
|
|
|
return false;
|
2023-04-08 02:31:47 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
clevel = le16_to_cpu(ri->i_compress_flag) >>
|
|
|
|
|
COMPRESS_LEVEL_OFFSET;
|
|
|
|
|
switch (ri->i_compress_algorithm) {
|
|
|
|
|
case COMPRESS_LZO:
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_LZO
|
|
|
|
|
if (clevel)
|
|
|
|
|
goto err_level;
|
|
|
|
|
#endif
|
|
|
|
|
break;
|
|
|
|
|
case COMPRESS_LZORLE:
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_LZORLE
|
|
|
|
|
if (clevel)
|
|
|
|
|
goto err_level;
|
|
|
|
|
#endif
|
|
|
|
|
break;
|
|
|
|
|
case COMPRESS_LZ4:
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_LZ4
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_LZ4HC
|
|
|
|
|
if (clevel &&
|
|
|
|
|
(clevel < LZ4HC_MIN_CLEVEL || clevel > LZ4HC_MAX_CLEVEL))
|
|
|
|
|
goto err_level;
|
|
|
|
|
#else
|
|
|
|
|
if (clevel)
|
|
|
|
|
goto err_level;
|
|
|
|
|
#endif
|
|
|
|
|
#endif
|
|
|
|
|
break;
|
|
|
|
|
case COMPRESS_ZSTD:
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_ZSTD
|
|
|
|
|
if (clevel < zstd_min_clevel() || clevel > zstd_max_clevel())
|
|
|
|
|
goto err_level;
|
|
|
|
|
#endif
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
goto err_level;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
err_level:
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has unsupported compress level: %u, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, clevel);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:03 +01:00
|
|
|
static bool sanity_check_inode(struct inode *inode, struct folio *node_folio)
|
2018-04-24 11:37:18 -06:00
|
|
|
{
|
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
f2fs: fix to do sanity check with i_extra_isize
If inode.i_extra_isize was fuzzed to an abnormal value, when
calculating inline data size, the result will overflow, result
in accessing invalid memory area when operating inline data.
Let's do sanity check with i_extra_isize during inode loading
for fixing.
https://bugzilla.kernel.org/show_bug.cgi?id=200421
- Reproduce
- POC (poc.c)
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/xattr.h>
#include <dirent.h>
#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <linux/falloc.h>
#include <linux/loop.h>
static void activity(char *mpoint) {
char *foo_bar_baz;
char *foo_baz;
char *xattr;
int err;
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
err = asprintf(&foo_baz, "%s/foo/baz", mpoint);
err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint);
rename(foo_bar_baz, foo_baz);
char buf2[113];
memset(buf2, 0, sizeof(buf2));
listxattr(xattr, buf2, sizeof(buf2));
removexattr(xattr, "user.mime_type");
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel message
Umount the image will leave the following message
[ 2910.995489] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 2918.416465] ==================================================================
[ 2918.416807] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0xcb9/0x1a80
[ 2918.417009] Read of size 4 at addr ffff88018efc2068 by task a.out/1229
[ 2918.417311] CPU: 1 PID: 1229 Comm: a.out Not tainted 4.17.0+ #1
[ 2918.417314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2918.417323] Call Trace:
[ 2918.417366] dump_stack+0x71/0xab
[ 2918.417401] print_address_description+0x6b/0x290
[ 2918.417407] kasan_report+0x28e/0x390
[ 2918.417411] ? f2fs_iget+0xcb9/0x1a80
[ 2918.417415] f2fs_iget+0xcb9/0x1a80
[ 2918.417422] ? f2fs_lookup+0x2e7/0x580
[ 2918.417425] f2fs_lookup+0x2e7/0x580
[ 2918.417433] ? __recover_dot_dentries+0x400/0x400
[ 2918.417447] ? legitimize_path.isra.29+0x5a/0xa0
[ 2918.417453] __lookup_slow+0x11c/0x220
[ 2918.417457] ? may_delete+0x2a0/0x2a0
[ 2918.417475] ? deref_stack_reg+0xe0/0xe0
[ 2918.417479] ? __lookup_hash+0xb0/0xb0
[ 2918.417483] lookup_slow+0x3e/0x60
[ 2918.417488] walk_component+0x3ac/0x990
[ 2918.417492] ? generic_permission+0x51/0x1e0
[ 2918.417495] ? inode_permission+0x51/0x1d0
[ 2918.417499] ? pick_link+0x3e0/0x3e0
[ 2918.417502] ? link_path_walk+0x4b1/0x770
[ 2918.417513] ? _raw_spin_lock_irqsave+0x25/0x50
[ 2918.417518] ? walk_component+0x990/0x990
[ 2918.417522] ? path_init+0x2e6/0x580
[ 2918.417526] path_lookupat+0x13f/0x430
[ 2918.417531] ? trailing_symlink+0x3a0/0x3a0
[ 2918.417534] ? do_renameat2+0x270/0x7b0
[ 2918.417538] ? __kasan_slab_free+0x14c/0x190
[ 2918.417541] ? do_renameat2+0x270/0x7b0
[ 2918.417553] ? kmem_cache_free+0x85/0x1e0
[ 2918.417558] ? do_renameat2+0x270/0x7b0
[ 2918.417563] filename_lookup+0x13c/0x280
[ 2918.417567] ? filename_parentat+0x2b0/0x2b0
[ 2918.417572] ? kasan_unpoison_shadow+0x31/0x40
[ 2918.417575] ? kasan_kmalloc+0xa6/0xd0
[ 2918.417593] ? strncpy_from_user+0xaa/0x1c0
[ 2918.417598] ? getname_flags+0x101/0x2b0
[ 2918.417614] ? path_listxattr+0x87/0x110
[ 2918.417619] path_listxattr+0x87/0x110
[ 2918.417623] ? listxattr+0xc0/0xc0
[ 2918.417637] ? mm_fault_error+0x1b0/0x1b0
[ 2918.417654] do_syscall_64+0x73/0x160
[ 2918.417660] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2918.417676] RIP: 0033:0x7f2f3a3480d7
[ 2918.417677] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
[ 2918.417732] RSP: 002b:00007fff4095b7d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000c2
[ 2918.417744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2f3a3480d7
[ 2918.417746] RDX: 0000000000000071 RSI: 00007fff4095b810 RDI: 000000000126a0c0
[ 2918.417749] RBP: 00007fff4095b890 R08: 000000000126a010 R09: 0000000000000000
[ 2918.417751] R10: 00000000000001ab R11: 0000000000000206 R12: 00000000004005e0
[ 2918.417753] R13: 00007fff4095b990 R14: 0000000000000000 R15: 0000000000000000
[ 2918.417853] Allocated by task 329:
[ 2918.418002] kasan_kmalloc+0xa6/0xd0
[ 2918.418007] kmem_cache_alloc+0xc8/0x1e0
[ 2918.418023] mempool_init_node+0x194/0x230
[ 2918.418027] mempool_init+0x12/0x20
[ 2918.418042] bioset_init+0x2bd/0x380
[ 2918.418052] blk_alloc_queue_node+0xe9/0x540
[ 2918.418075] dm_create+0x2c0/0x800
[ 2918.418080] dev_create+0xd2/0x530
[ 2918.418083] ctl_ioctl+0x2a3/0x5b0
[ 2918.418087] dm_ctl_ioctl+0xa/0x10
[ 2918.418092] do_vfs_ioctl+0x13e/0x8c0
[ 2918.418095] ksys_ioctl+0x66/0x70
[ 2918.418098] __x64_sys_ioctl+0x3d/0x50
[ 2918.418102] do_syscall_64+0x73/0x160
[ 2918.418106] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2918.418204] Freed by task 0:
[ 2918.418301] (stack is not available)
[ 2918.418521] The buggy address belongs to the object at ffff88018efc0000
which belongs to the cache biovec-max of size 8192
[ 2918.418894] The buggy address is located 104 bytes to the right of
8192-byte region [ffff88018efc0000, ffff88018efc2000)
[ 2918.419257] The buggy address belongs to the page:
[ 2918.419431] page:ffffea00063bf000 count:1 mapcount:0 mapping:ffff8801f2242540 index:0x0 compound_mapcount: 0
[ 2918.419702] flags: 0x17fff8000008100(slab|head)
[ 2918.419879] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f2242540
[ 2918.420101] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 2918.420322] page dumped because: kasan: bad access detected
[ 2918.420599] Memory state around the buggy address:
[ 2918.420764] ffff88018efc1f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.420975] ffff88018efc1f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.421194] >ffff88018efc2000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2918.421406] ^
[ 2918.421627] ffff88018efc2080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2918.421838] ffff88018efc2100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.422046] ==================================================================
[ 2918.422264] Disabling lock debugging due to kernel taint
[ 2923.901641] BUG: unable to handle kernel paging request at ffff88018f0db000
[ 2923.901884] PGD 22226a067 P4D 22226a067 PUD 222273067 PMD 18e642063 PTE 800000018f0db061
[ 2923.902120] Oops: 0003 [#1] SMP KASAN PTI
[ 2923.902274] CPU: 1 PID: 1231 Comm: umount Tainted: G B 4.17.0+ #1
[ 2923.902490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2923.902761] RIP: 0010:__memset+0x24/0x30
[ 2923.902906] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
[ 2923.903446] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
[ 2923.903622] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
[ 2923.903833] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
[ 2923.904062] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
[ 2923.904273] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
[ 2923.904485] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
[ 2923.904693] FS: 00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 2923.904937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2923.910080] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
[ 2923.914930] Call Trace:
[ 2923.919724] f2fs_truncate_inline_inode+0x114/0x170
[ 2923.924487] f2fs_truncate_blocks+0x11b/0x7c0
[ 2923.929178] ? f2fs_truncate_data_blocks+0x10/0x10
[ 2923.933834] ? dqget+0x670/0x670
[ 2923.938437] ? f2fs_destroy_extent_tree+0xd6/0x270
[ 2923.943107] ? __radix_tree_lookup+0x2f/0x150
[ 2923.947772] f2fs_truncate+0xd4/0x1a0
[ 2923.952491] f2fs_evict_inode+0x5ab/0x610
[ 2923.957204] evict+0x15f/0x280
[ 2923.961898] __dentry_kill+0x161/0x250
[ 2923.966634] shrink_dentry_list+0xf3/0x250
[ 2923.971897] shrink_dcache_parent+0xa9/0x100
[ 2923.976561] ? shrink_dcache_sb+0x1f0/0x1f0
[ 2923.981177] ? wait_for_completion+0x8a/0x210
[ 2923.985781] ? migrate_swap_stop+0x2d0/0x2d0
[ 2923.990332] do_one_tree+0xe/0x40
[ 2923.994735] shrink_dcache_for_umount+0x3a/0xa0
[ 2923.999077] generic_shutdown_super+0x3e/0x1c0
[ 2924.003350] kill_block_super+0x4b/0x70
[ 2924.007619] deactivate_locked_super+0x65/0x90
[ 2924.011812] cleanup_mnt+0x5c/0xa0
[ 2924.015995] task_work_run+0xce/0xf0
[ 2924.020174] exit_to_usermode_loop+0x115/0x120
[ 2924.024293] do_syscall_64+0x12f/0x160
[ 2924.028479] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2924.032709] RIP: 0033:0x7fa8b2868487
[ 2924.036888] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 2924.045750] RSP: 002b:00007ffc39824d58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 2924.050190] RAX: 0000000000000000 RBX: 00000000008ea030 RCX: 00007fa8b2868487
[ 2924.054604] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000008f4360
[ 2924.058940] RBP: 00000000008f4360 R08: 0000000000000000 R09: 0000000000000014
[ 2924.063186] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007fa8b2d7183c
[ 2924.067418] R13: 0000000000000000 R14: 00000000008ea210 R15: 00007ffc39824fe0
[ 2924.071534] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 2924.098044] CR2: ffff88018f0db000
[ 2924.102520] ---[ end trace a8e0d899985faf31 ]---
[ 2924.107012] RIP: 0010:__memset+0x24/0x30
[ 2924.111448] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
[ 2924.120724] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
[ 2924.125312] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
[ 2924.129931] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
[ 2924.134537] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
[ 2924.139175] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
[ 2924.143825] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
[ 2924.148500] FS: 00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 2924.153247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2924.158003] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
[ 2924.164641] BUG: Bad rss-counter state mm:00000000fa04621e idx:0 val:4
[ 2924.170007] BUG: Bad rss-counter
tate mm:00000000fa04621e idx:1 val:2
- Location
https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/inline.c#L78
memset(addr + from, 0, MAX_INLINE_DATA(inode) - from);
Here the length can be negative.
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 22:16:55 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
2025-07-08 18:03:06 +01:00
|
|
|
struct f2fs_inode *ri = F2FS_INODE(node_folio);
|
f2fs: fix to do sanity check with node footer and iblocks
This patch adds to do sanity check with below fields of inode to
avoid reported panic.
- node footer
- iblocks
https://bugzilla.kernel.org/show_bug.cgi?id=200223
- Overview
BUG() triggered in f2fs_truncate_inode_blocks() when un-mounting a mounted f2fs image after writing to it
- Reproduce
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
// open / write / read
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, 517);
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel meesage
[ 552.479723] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 556.451891] ------------[ cut here ]------------
[ 556.451899] kernel BUG at fs/f2fs/node.c:987!
[ 556.452920] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 556.453936] CPU: 1 PID: 1310 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 556.455213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.457140] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.458280] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.462015] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.463068] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.464479] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.465901] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.467311] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.468706] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.470117] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.471702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.472838] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.474265] Call Trace:
[ 556.474782] ? f2fs_alloc_nid_failed+0xf0/0xf0
[ 556.475686] ? truncate_nodes+0x980/0x980
[ 556.476516] ? pagecache_get_page+0x21f/0x2f0
[ 556.477412] ? __asan_loadN+0xf/0x20
[ 556.478153] ? __get_node_page+0x331/0x5b0
[ 556.478992] ? reweight_entity+0x1e6/0x3b0
[ 556.479826] f2fs_truncate_blocks+0x55e/0x740
[ 556.480709] ? f2fs_truncate_data_blocks+0x20/0x20
[ 556.481689] ? __radix_tree_lookup+0x34/0x160
[ 556.482630] ? radix_tree_lookup+0xd/0x10
[ 556.483445] f2fs_truncate+0xd4/0x1a0
[ 556.484206] f2fs_evict_inode+0x5ce/0x630
[ 556.485032] evict+0x16f/0x290
[ 556.485664] iput+0x280/0x300
[ 556.486300] dentry_unlink_inode+0x165/0x1e0
[ 556.487169] __dentry_kill+0x16a/0x260
[ 556.487936] dentry_kill+0x70/0x250
[ 556.488651] shrink_dentry_list+0x125/0x260
[ 556.489504] shrink_dcache_parent+0xc1/0x110
[ 556.490379] ? shrink_dcache_sb+0x200/0x200
[ 556.491231] ? bit_wait_timeout+0xc0/0xc0
[ 556.492047] do_one_tree+0x12/0x40
[ 556.492743] shrink_dcache_for_umount+0x3f/0xa0
[ 556.493656] generic_shutdown_super+0x43/0x1c0
[ 556.494561] kill_block_super+0x52/0x80
[ 556.495341] kill_f2fs_super+0x62/0x70
[ 556.496105] deactivate_locked_super+0x6f/0xa0
[ 556.497004] deactivate_super+0x5e/0x80
[ 556.497785] cleanup_mnt+0x61/0xa0
[ 556.498492] __cleanup_mnt+0x12/0x20
[ 556.499218] task_work_run+0xc8/0xf0
[ 556.499949] exit_to_usermode_loop+0x125/0x130
[ 556.500846] do_syscall_64+0x138/0x170
[ 556.501609] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 556.502659] RIP: 0033:0x7f8028b77487
[ 556.503384] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 556.507137] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.508637] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.510069] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.511481] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.512892] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.514320] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.515745] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 556.529276] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 556.530340] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.531513] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.535330] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.536395] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.537824] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.539290] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.540709] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.542131] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.543579] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.545180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.546338] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.547809] ==================================================================
[ 556.549248] BUG: KASAN: stack-out-of-bounds in arch_tlb_gather_mmu+0x52/0x170
[ 556.550672] Write of size 8 at addr ffff8801f292fd10 by task umount/1310
[ 556.552338] CPU: 1 PID: 1310 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 556.553886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.555756] Call Trace:
[ 556.556264] dump_stack+0x7b/0xb5
[ 556.556944] print_address_description+0x70/0x290
[ 556.557903] kasan_report+0x291/0x390
[ 556.558649] ? arch_tlb_gather_mmu+0x52/0x170
[ 556.559537] __asan_store8+0x57/0x90
[ 556.560268] arch_tlb_gather_mmu+0x52/0x170
[ 556.561110] tlb_gather_mmu+0x12/0x40
[ 556.561862] exit_mmap+0x123/0x2a0
[ 556.562555] ? __ia32_sys_munmap+0x50/0x50
[ 556.563384] ? exit_aio+0x98/0x230
[ 556.564079] ? __x32_compat_sys_io_submit+0x260/0x260
[ 556.565099] ? taskstats_exit+0x1f4/0x640
[ 556.565925] ? kasan_check_read+0x11/0x20
[ 556.566739] ? mm_update_next_owner+0x322/0x380
[ 556.567652] mmput+0x8b/0x1d0
[ 556.568260] do_exit+0x43a/0x1390
[ 556.568937] ? mm_update_next_owner+0x380/0x380
[ 556.569855] ? deactivate_super+0x5e/0x80
[ 556.570668] ? cleanup_mnt+0x61/0xa0
[ 556.571395] ? __cleanup_mnt+0x12/0x20
[ 556.572156] ? task_work_run+0xc8/0xf0
[ 556.572917] ? exit_to_usermode_loop+0x125/0x130
[ 556.573861] rewind_stack_do_exit+0x17/0x20
[ 556.574707] RIP: 0033:0x7f8028b77487
[ 556.575428] Code: Bad RIP value.
[ 556.576106] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.577599] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.579020] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.580422] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.581833] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.583252] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.584983] The buggy address belongs to the page:
[ 556.585961] page:ffffea0007ca4bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 556.587540] flags: 0x2ffff0000000000()
[ 556.588296] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 556.589822] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 556.591359] page dumped because: kasan: bad access detected
[ 556.592786] Memory state around the buggy address:
[ 556.593753] ffff8801f292fc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.595191] ffff8801f292fc80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[ 556.596613] >ffff8801f292fd00: 00 00 f3 00 00 00 00 f3 f3 00 00 00 00 f4 f4 f4
[ 556.598044] ^
[ 556.598797] ffff8801f292fd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.600225] ffff8801f292fe00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
[ 556.601647] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/node.c#L987
case NODE_DIND_BLOCK:
err = truncate_nodes(&dn, nofs, offset[1], 3);
cont = 0;
break;
default:
BUG(); <---
}
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 13:55:22 +08:00
|
|
|
unsigned long long iblocks;
|
|
|
|
|
|
2025-07-08 18:03:06 +01:00
|
|
|
iblocks = le64_to_cpu(F2FS_INODE(node_folio)->i_blocks);
|
f2fs: fix to do sanity check with node footer and iblocks
This patch adds to do sanity check with below fields of inode to
avoid reported panic.
- node footer
- iblocks
https://bugzilla.kernel.org/show_bug.cgi?id=200223
- Overview
BUG() triggered in f2fs_truncate_inode_blocks() when un-mounting a mounted f2fs image after writing to it
- Reproduce
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
// open / write / read
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, 517);
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel meesage
[ 552.479723] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 556.451891] ------------[ cut here ]------------
[ 556.451899] kernel BUG at fs/f2fs/node.c:987!
[ 556.452920] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 556.453936] CPU: 1 PID: 1310 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 556.455213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.457140] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.458280] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.462015] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.463068] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.464479] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.465901] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.467311] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.468706] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.470117] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.471702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.472838] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.474265] Call Trace:
[ 556.474782] ? f2fs_alloc_nid_failed+0xf0/0xf0
[ 556.475686] ? truncate_nodes+0x980/0x980
[ 556.476516] ? pagecache_get_page+0x21f/0x2f0
[ 556.477412] ? __asan_loadN+0xf/0x20
[ 556.478153] ? __get_node_page+0x331/0x5b0
[ 556.478992] ? reweight_entity+0x1e6/0x3b0
[ 556.479826] f2fs_truncate_blocks+0x55e/0x740
[ 556.480709] ? f2fs_truncate_data_blocks+0x20/0x20
[ 556.481689] ? __radix_tree_lookup+0x34/0x160
[ 556.482630] ? radix_tree_lookup+0xd/0x10
[ 556.483445] f2fs_truncate+0xd4/0x1a0
[ 556.484206] f2fs_evict_inode+0x5ce/0x630
[ 556.485032] evict+0x16f/0x290
[ 556.485664] iput+0x280/0x300
[ 556.486300] dentry_unlink_inode+0x165/0x1e0
[ 556.487169] __dentry_kill+0x16a/0x260
[ 556.487936] dentry_kill+0x70/0x250
[ 556.488651] shrink_dentry_list+0x125/0x260
[ 556.489504] shrink_dcache_parent+0xc1/0x110
[ 556.490379] ? shrink_dcache_sb+0x200/0x200
[ 556.491231] ? bit_wait_timeout+0xc0/0xc0
[ 556.492047] do_one_tree+0x12/0x40
[ 556.492743] shrink_dcache_for_umount+0x3f/0xa0
[ 556.493656] generic_shutdown_super+0x43/0x1c0
[ 556.494561] kill_block_super+0x52/0x80
[ 556.495341] kill_f2fs_super+0x62/0x70
[ 556.496105] deactivate_locked_super+0x6f/0xa0
[ 556.497004] deactivate_super+0x5e/0x80
[ 556.497785] cleanup_mnt+0x61/0xa0
[ 556.498492] __cleanup_mnt+0x12/0x20
[ 556.499218] task_work_run+0xc8/0xf0
[ 556.499949] exit_to_usermode_loop+0x125/0x130
[ 556.500846] do_syscall_64+0x138/0x170
[ 556.501609] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 556.502659] RIP: 0033:0x7f8028b77487
[ 556.503384] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 556.507137] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.508637] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.510069] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.511481] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.512892] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.514320] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.515745] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 556.529276] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 556.530340] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.531513] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.535330] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.536395] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.537824] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.539290] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.540709] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.542131] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.543579] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.545180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.546338] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.547809] ==================================================================
[ 556.549248] BUG: KASAN: stack-out-of-bounds in arch_tlb_gather_mmu+0x52/0x170
[ 556.550672] Write of size 8 at addr ffff8801f292fd10 by task umount/1310
[ 556.552338] CPU: 1 PID: 1310 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 556.553886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.555756] Call Trace:
[ 556.556264] dump_stack+0x7b/0xb5
[ 556.556944] print_address_description+0x70/0x290
[ 556.557903] kasan_report+0x291/0x390
[ 556.558649] ? arch_tlb_gather_mmu+0x52/0x170
[ 556.559537] __asan_store8+0x57/0x90
[ 556.560268] arch_tlb_gather_mmu+0x52/0x170
[ 556.561110] tlb_gather_mmu+0x12/0x40
[ 556.561862] exit_mmap+0x123/0x2a0
[ 556.562555] ? __ia32_sys_munmap+0x50/0x50
[ 556.563384] ? exit_aio+0x98/0x230
[ 556.564079] ? __x32_compat_sys_io_submit+0x260/0x260
[ 556.565099] ? taskstats_exit+0x1f4/0x640
[ 556.565925] ? kasan_check_read+0x11/0x20
[ 556.566739] ? mm_update_next_owner+0x322/0x380
[ 556.567652] mmput+0x8b/0x1d0
[ 556.568260] do_exit+0x43a/0x1390
[ 556.568937] ? mm_update_next_owner+0x380/0x380
[ 556.569855] ? deactivate_super+0x5e/0x80
[ 556.570668] ? cleanup_mnt+0x61/0xa0
[ 556.571395] ? __cleanup_mnt+0x12/0x20
[ 556.572156] ? task_work_run+0xc8/0xf0
[ 556.572917] ? exit_to_usermode_loop+0x125/0x130
[ 556.573861] rewind_stack_do_exit+0x17/0x20
[ 556.574707] RIP: 0033:0x7f8028b77487
[ 556.575428] Code: Bad RIP value.
[ 556.576106] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.577599] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.579020] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.580422] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.581833] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.583252] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.584983] The buggy address belongs to the page:
[ 556.585961] page:ffffea0007ca4bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 556.587540] flags: 0x2ffff0000000000()
[ 556.588296] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 556.589822] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 556.591359] page dumped because: kasan: bad access detected
[ 556.592786] Memory state around the buggy address:
[ 556.593753] ffff8801f292fc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.595191] ffff8801f292fc80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[ 556.596613] >ffff8801f292fd00: 00 00 f3 00 00 00 00 f3 f3 00 00 00 00 f4 f4 f4
[ 556.598044] ^
[ 556.598797] ffff8801f292fd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.600225] ffff8801f292fe00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
[ 556.601647] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/node.c#L987
case NODE_DIND_BLOCK:
err = truncate_nodes(&dn, nofs, offset[1], 3);
cont = 0;
break;
default:
BUG(); <---
}
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 13:55:22 +08:00
|
|
|
if (!iblocks) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "%s: corrupted inode i_blocks i_ino=%lx iblocks=%llu, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, iblocks);
|
f2fs: fix to do sanity check with node footer and iblocks
This patch adds to do sanity check with below fields of inode to
avoid reported panic.
- node footer
- iblocks
https://bugzilla.kernel.org/show_bug.cgi?id=200223
- Overview
BUG() triggered in f2fs_truncate_inode_blocks() when un-mounting a mounted f2fs image after writing to it
- Reproduce
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
// open / write / read
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, 517);
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel meesage
[ 552.479723] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 556.451891] ------------[ cut here ]------------
[ 556.451899] kernel BUG at fs/f2fs/node.c:987!
[ 556.452920] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 556.453936] CPU: 1 PID: 1310 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 556.455213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.457140] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.458280] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.462015] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.463068] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.464479] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.465901] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.467311] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.468706] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.470117] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.471702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.472838] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.474265] Call Trace:
[ 556.474782] ? f2fs_alloc_nid_failed+0xf0/0xf0
[ 556.475686] ? truncate_nodes+0x980/0x980
[ 556.476516] ? pagecache_get_page+0x21f/0x2f0
[ 556.477412] ? __asan_loadN+0xf/0x20
[ 556.478153] ? __get_node_page+0x331/0x5b0
[ 556.478992] ? reweight_entity+0x1e6/0x3b0
[ 556.479826] f2fs_truncate_blocks+0x55e/0x740
[ 556.480709] ? f2fs_truncate_data_blocks+0x20/0x20
[ 556.481689] ? __radix_tree_lookup+0x34/0x160
[ 556.482630] ? radix_tree_lookup+0xd/0x10
[ 556.483445] f2fs_truncate+0xd4/0x1a0
[ 556.484206] f2fs_evict_inode+0x5ce/0x630
[ 556.485032] evict+0x16f/0x290
[ 556.485664] iput+0x280/0x300
[ 556.486300] dentry_unlink_inode+0x165/0x1e0
[ 556.487169] __dentry_kill+0x16a/0x260
[ 556.487936] dentry_kill+0x70/0x250
[ 556.488651] shrink_dentry_list+0x125/0x260
[ 556.489504] shrink_dcache_parent+0xc1/0x110
[ 556.490379] ? shrink_dcache_sb+0x200/0x200
[ 556.491231] ? bit_wait_timeout+0xc0/0xc0
[ 556.492047] do_one_tree+0x12/0x40
[ 556.492743] shrink_dcache_for_umount+0x3f/0xa0
[ 556.493656] generic_shutdown_super+0x43/0x1c0
[ 556.494561] kill_block_super+0x52/0x80
[ 556.495341] kill_f2fs_super+0x62/0x70
[ 556.496105] deactivate_locked_super+0x6f/0xa0
[ 556.497004] deactivate_super+0x5e/0x80
[ 556.497785] cleanup_mnt+0x61/0xa0
[ 556.498492] __cleanup_mnt+0x12/0x20
[ 556.499218] task_work_run+0xc8/0xf0
[ 556.499949] exit_to_usermode_loop+0x125/0x130
[ 556.500846] do_syscall_64+0x138/0x170
[ 556.501609] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 556.502659] RIP: 0033:0x7f8028b77487
[ 556.503384] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 556.507137] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.508637] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.510069] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.511481] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.512892] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.514320] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.515745] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 556.529276] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 556.530340] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.531513] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.535330] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.536395] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.537824] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.539290] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.540709] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.542131] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.543579] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.545180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.546338] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.547809] ==================================================================
[ 556.549248] BUG: KASAN: stack-out-of-bounds in arch_tlb_gather_mmu+0x52/0x170
[ 556.550672] Write of size 8 at addr ffff8801f292fd10 by task umount/1310
[ 556.552338] CPU: 1 PID: 1310 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 556.553886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.555756] Call Trace:
[ 556.556264] dump_stack+0x7b/0xb5
[ 556.556944] print_address_description+0x70/0x290
[ 556.557903] kasan_report+0x291/0x390
[ 556.558649] ? arch_tlb_gather_mmu+0x52/0x170
[ 556.559537] __asan_store8+0x57/0x90
[ 556.560268] arch_tlb_gather_mmu+0x52/0x170
[ 556.561110] tlb_gather_mmu+0x12/0x40
[ 556.561862] exit_mmap+0x123/0x2a0
[ 556.562555] ? __ia32_sys_munmap+0x50/0x50
[ 556.563384] ? exit_aio+0x98/0x230
[ 556.564079] ? __x32_compat_sys_io_submit+0x260/0x260
[ 556.565099] ? taskstats_exit+0x1f4/0x640
[ 556.565925] ? kasan_check_read+0x11/0x20
[ 556.566739] ? mm_update_next_owner+0x322/0x380
[ 556.567652] mmput+0x8b/0x1d0
[ 556.568260] do_exit+0x43a/0x1390
[ 556.568937] ? mm_update_next_owner+0x380/0x380
[ 556.569855] ? deactivate_super+0x5e/0x80
[ 556.570668] ? cleanup_mnt+0x61/0xa0
[ 556.571395] ? __cleanup_mnt+0x12/0x20
[ 556.572156] ? task_work_run+0xc8/0xf0
[ 556.572917] ? exit_to_usermode_loop+0x125/0x130
[ 556.573861] rewind_stack_do_exit+0x17/0x20
[ 556.574707] RIP: 0033:0x7f8028b77487
[ 556.575428] Code: Bad RIP value.
[ 556.576106] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.577599] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.579020] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.580422] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.581833] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.583252] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.584983] The buggy address belongs to the page:
[ 556.585961] page:ffffea0007ca4bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 556.587540] flags: 0x2ffff0000000000()
[ 556.588296] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 556.589822] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 556.591359] page dumped because: kasan: bad access detected
[ 556.592786] Memory state around the buggy address:
[ 556.593753] ffff8801f292fc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.595191] ffff8801f292fc80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[ 556.596613] >ffff8801f292fd00: 00 00 f3 00 00 00 00 f3 f3 00 00 00 00 f4 f4 f4
[ 556.598044] ^
[ 556.598797] ffff8801f292fd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.600225] ffff8801f292fe00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
[ 556.601647] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/node.c#L987
case NODE_DIND_BLOCK:
err = truncate_nodes(&dn, nofs, offset[1], 3);
cont = 0;
break;
default:
BUG(); <---
}
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 13:55:22 +08:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:08 +01:00
|
|
|
if (ino_of_node(node_folio) != nid_of_node(node_folio)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "%s: corrupted inode footer i_ino=%lx, ino,nid: [%u, %u] run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino,
|
2025-07-08 18:03:08 +01:00
|
|
|
ino_of_node(node_folio), nid_of_node(node_folio));
|
f2fs: fix to do sanity check with node footer and iblocks
This patch adds to do sanity check with below fields of inode to
avoid reported panic.
- node footer
- iblocks
https://bugzilla.kernel.org/show_bug.cgi?id=200223
- Overview
BUG() triggered in f2fs_truncate_inode_blocks() when un-mounting a mounted f2fs image after writing to it
- Reproduce
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
// open / write / read
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, 517);
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel meesage
[ 552.479723] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 556.451891] ------------[ cut here ]------------
[ 556.451899] kernel BUG at fs/f2fs/node.c:987!
[ 556.452920] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 556.453936] CPU: 1 PID: 1310 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 556.455213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.457140] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.458280] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.462015] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.463068] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.464479] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.465901] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.467311] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.468706] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.470117] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.471702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.472838] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.474265] Call Trace:
[ 556.474782] ? f2fs_alloc_nid_failed+0xf0/0xf0
[ 556.475686] ? truncate_nodes+0x980/0x980
[ 556.476516] ? pagecache_get_page+0x21f/0x2f0
[ 556.477412] ? __asan_loadN+0xf/0x20
[ 556.478153] ? __get_node_page+0x331/0x5b0
[ 556.478992] ? reweight_entity+0x1e6/0x3b0
[ 556.479826] f2fs_truncate_blocks+0x55e/0x740
[ 556.480709] ? f2fs_truncate_data_blocks+0x20/0x20
[ 556.481689] ? __radix_tree_lookup+0x34/0x160
[ 556.482630] ? radix_tree_lookup+0xd/0x10
[ 556.483445] f2fs_truncate+0xd4/0x1a0
[ 556.484206] f2fs_evict_inode+0x5ce/0x630
[ 556.485032] evict+0x16f/0x290
[ 556.485664] iput+0x280/0x300
[ 556.486300] dentry_unlink_inode+0x165/0x1e0
[ 556.487169] __dentry_kill+0x16a/0x260
[ 556.487936] dentry_kill+0x70/0x250
[ 556.488651] shrink_dentry_list+0x125/0x260
[ 556.489504] shrink_dcache_parent+0xc1/0x110
[ 556.490379] ? shrink_dcache_sb+0x200/0x200
[ 556.491231] ? bit_wait_timeout+0xc0/0xc0
[ 556.492047] do_one_tree+0x12/0x40
[ 556.492743] shrink_dcache_for_umount+0x3f/0xa0
[ 556.493656] generic_shutdown_super+0x43/0x1c0
[ 556.494561] kill_block_super+0x52/0x80
[ 556.495341] kill_f2fs_super+0x62/0x70
[ 556.496105] deactivate_locked_super+0x6f/0xa0
[ 556.497004] deactivate_super+0x5e/0x80
[ 556.497785] cleanup_mnt+0x61/0xa0
[ 556.498492] __cleanup_mnt+0x12/0x20
[ 556.499218] task_work_run+0xc8/0xf0
[ 556.499949] exit_to_usermode_loop+0x125/0x130
[ 556.500846] do_syscall_64+0x138/0x170
[ 556.501609] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 556.502659] RIP: 0033:0x7f8028b77487
[ 556.503384] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 556.507137] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.508637] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.510069] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.511481] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.512892] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.514320] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.515745] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 556.529276] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 556.530340] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0
[ 556.531513] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5
[ 556.535330] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286
[ 556.536395] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc
[ 556.537824] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164
[ 556.539290] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d
[ 556.540709] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64
[ 556.542131] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801
[ 556.543579] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 556.545180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 556.546338] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0
[ 556.547809] ==================================================================
[ 556.549248] BUG: KASAN: stack-out-of-bounds in arch_tlb_gather_mmu+0x52/0x170
[ 556.550672] Write of size 8 at addr ffff8801f292fd10 by task umount/1310
[ 556.552338] CPU: 1 PID: 1310 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 556.553886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 556.555756] Call Trace:
[ 556.556264] dump_stack+0x7b/0xb5
[ 556.556944] print_address_description+0x70/0x290
[ 556.557903] kasan_report+0x291/0x390
[ 556.558649] ? arch_tlb_gather_mmu+0x52/0x170
[ 556.559537] __asan_store8+0x57/0x90
[ 556.560268] arch_tlb_gather_mmu+0x52/0x170
[ 556.561110] tlb_gather_mmu+0x12/0x40
[ 556.561862] exit_mmap+0x123/0x2a0
[ 556.562555] ? __ia32_sys_munmap+0x50/0x50
[ 556.563384] ? exit_aio+0x98/0x230
[ 556.564079] ? __x32_compat_sys_io_submit+0x260/0x260
[ 556.565099] ? taskstats_exit+0x1f4/0x640
[ 556.565925] ? kasan_check_read+0x11/0x20
[ 556.566739] ? mm_update_next_owner+0x322/0x380
[ 556.567652] mmput+0x8b/0x1d0
[ 556.568260] do_exit+0x43a/0x1390
[ 556.568937] ? mm_update_next_owner+0x380/0x380
[ 556.569855] ? deactivate_super+0x5e/0x80
[ 556.570668] ? cleanup_mnt+0x61/0xa0
[ 556.571395] ? __cleanup_mnt+0x12/0x20
[ 556.572156] ? task_work_run+0xc8/0xf0
[ 556.572917] ? exit_to_usermode_loop+0x125/0x130
[ 556.573861] rewind_stack_do_exit+0x17/0x20
[ 556.574707] RIP: 0033:0x7f8028b77487
[ 556.575428] Code: Bad RIP value.
[ 556.576106] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 556.577599] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487
[ 556.579020] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0
[ 556.580422] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014
[ 556.581833] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c
[ 556.583252] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.584983] The buggy address belongs to the page:
[ 556.585961] page:ffffea0007ca4bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 556.587540] flags: 0x2ffff0000000000()
[ 556.588296] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 556.589822] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 556.591359] page dumped because: kasan: bad access detected
[ 556.592786] Memory state around the buggy address:
[ 556.593753] ffff8801f292fc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.595191] ffff8801f292fc80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[ 556.596613] >ffff8801f292fd00: 00 00 f3 00 00 00 00 f3 f3 00 00 00 00 f4 f4 f4
[ 556.598044] ^
[ 556.598797] ffff8801f292fd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 556.600225] ffff8801f292fe00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
[ 556.601647] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/node.c#L987
case NODE_DIND_BLOCK:
err = truncate_nodes(&dn, nofs, offset[1], 3);
cont = 0;
break;
default:
BUG(); <---
}
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 13:55:22 +08:00
|
|
|
return false;
|
|
|
|
|
}
|
2018-04-24 11:37:18 -06:00
|
|
|
|
2025-07-08 18:03:07 +01:00
|
|
|
if (ino_of_node(node_folio) == fi->i_xattr_nid) {
|
2025-03-24 13:33:39 +08:00
|
|
|
f2fs_warn(sbi, "%s: corrupted inode i_ino=%lx, xnid=%x, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, fi->i_xattr_nid);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2023-05-31 09:40:55 +08:00
|
|
|
if (f2fs_has_extra_attr(inode)) {
|
|
|
|
|
if (!f2fs_sb_has_extra_attr(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) is with extra_attr, but extra_attr feature is off",
|
|
|
|
|
__func__, inode->i_ino);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (fi->i_extra_isize > F2FS_TOTAL_EXTRA_ATTR_SIZE ||
|
|
|
|
|
fi->i_extra_isize < F2FS_MIN_EXTRA_ATTR_SIZE ||
|
|
|
|
|
fi->i_extra_isize % sizeof(__le32)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_extra_isize: %d, max: %zu",
|
|
|
|
|
__func__, inode->i_ino, fi->i_extra_isize,
|
|
|
|
|
F2FS_TOTAL_EXTRA_ATTR_SIZE);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (f2fs_sb_has_compression(sbi) &&
|
|
|
|
|
fi->i_flags & F2FS_COMPR_FL &&
|
|
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize,
|
|
|
|
|
i_compress_flag)) {
|
|
|
|
|
if (!sanity_check_compress_inode(inode, ri))
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2018-04-24 11:37:18 -06:00
|
|
|
}
|
f2fs: fix to do sanity check with extra_attr feature
If FI_EXTRA_ATTR is set in inode by fuzzing, inode.i_addr[0] will be
parsed as inode.i_extra_isize, then in __recover_inline_status, inline
data address will beyond boundary of page, result in accessing invalid
memory.
So in this condition, during reading inode page, let's do sanity check
with EXTRA_ATTR feature of fs and extra_attr bit of inode, if they're
inconsistent, deny to load this inode.
- Overview
Out-of-bound access in f2fs_iget() when mounting a corrupted f2fs image
- Reproduce
The following message will be got in KASAN build of 4.18 upstream kernel.
[ 819.392227] ==================================================================
[ 819.393901] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0x736/0x1530
[ 819.395329] Read of size 4 at addr ffff8801f099c968 by task mount/1292
[ 819.397079] CPU: 1 PID: 1292 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 819.397082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 819.397088] Call Trace:
[ 819.397124] dump_stack+0x7b/0xb5
[ 819.397154] print_address_description+0x70/0x290
[ 819.397159] kasan_report+0x291/0x390
[ 819.397163] ? f2fs_iget+0x736/0x1530
[ 819.397176] check_memory_region+0x139/0x190
[ 819.397182] __asan_loadN+0xf/0x20
[ 819.397185] f2fs_iget+0x736/0x1530
[ 819.397197] f2fs_fill_super+0x1b4f/0x2b40
[ 819.397202] ? f2fs_fill_super+0x1b4f/0x2b40
[ 819.397208] ? f2fs_commit_super+0x1b0/0x1b0
[ 819.397227] ? set_blocksize+0x90/0x140
[ 819.397241] mount_bdev+0x1c5/0x210
[ 819.397245] ? f2fs_commit_super+0x1b0/0x1b0
[ 819.397252] f2fs_mount+0x15/0x20
[ 819.397256] mount_fs+0x60/0x1a0
[ 819.397267] ? alloc_vfsmnt+0x309/0x360
[ 819.397272] vfs_kern_mount+0x6b/0x1a0
[ 819.397282] do_mount+0x34a/0x18c0
[ 819.397300] ? lockref_put_or_lock+0xcf/0x160
[ 819.397306] ? copy_mount_string+0x20/0x20
[ 819.397318] ? memcg_kmem_put_cache+0x1b/0xa0
[ 819.397324] ? kasan_check_write+0x14/0x20
[ 819.397334] ? _copy_from_user+0x6a/0x90
[ 819.397353] ? memdup_user+0x42/0x60
[ 819.397359] ksys_mount+0x83/0xd0
[ 819.397365] __x64_sys_mount+0x67/0x80
[ 819.397388] do_syscall_64+0x78/0x170
[ 819.397403] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 819.397422] RIP: 0033:0x7f54c667cb9a
[ 819.397424] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 819.397483] RSP: 002b:00007ffd8f46cd08 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 819.397496] RAX: ffffffffffffffda RBX: 0000000000dfa030 RCX: 00007f54c667cb9a
[ 819.397498] RDX: 0000000000dfa210 RSI: 0000000000dfbf30 RDI: 0000000000e02ec0
[ 819.397501] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 819.397503] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000000000e02ec0
[ 819.397505] R13: 0000000000dfa210 R14: 0000000000000000 R15: 0000000000000003
[ 819.397866] Allocated by task 139:
[ 819.398702] save_stack+0x46/0xd0
[ 819.398705] kasan_kmalloc+0xad/0xe0
[ 819.398709] kasan_slab_alloc+0x11/0x20
[ 819.398713] kmem_cache_alloc+0xd1/0x1e0
[ 819.398717] dup_fd+0x50/0x4c0
[ 819.398740] copy_process.part.37+0xbed/0x32e0
[ 819.398744] _do_fork+0x16e/0x590
[ 819.398748] __x64_sys_clone+0x69/0x80
[ 819.398752] do_syscall_64+0x78/0x170
[ 819.398756] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 819.399097] Freed by task 159:
[ 819.399743] save_stack+0x46/0xd0
[ 819.399747] __kasan_slab_free+0x13c/0x1a0
[ 819.399750] kasan_slab_free+0xe/0x10
[ 819.399754] kmem_cache_free+0x89/0x1e0
[ 819.399757] put_files_struct+0x132/0x150
[ 819.399761] exit_files+0x62/0x70
[ 819.399766] do_exit+0x47b/0x1390
[ 819.399770] do_group_exit+0x86/0x130
[ 819.399774] __x64_sys_exit_group+0x2c/0x30
[ 819.399778] do_syscall_64+0x78/0x170
[ 819.399782] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 819.400115] The buggy address belongs to the object at ffff8801f099c680
which belongs to the cache files_cache of size 704
[ 819.403234] The buggy address is located 40 bytes to the right of
704-byte region [ffff8801f099c680, ffff8801f099c940)
[ 819.405689] The buggy address belongs to the page:
[ 819.406709] page:ffffea0007c26700 count:1 mapcount:0 mapping:ffff8801f69a3340 index:0xffff8801f099d380 compound_mapcount: 0
[ 819.408984] flags: 0x2ffff0000008100(slab|head)
[ 819.409932] raw: 02ffff0000008100 ffffea00077fb600 0000000200000002 ffff8801f69a3340
[ 819.411514] raw: ffff8801f099d380 0000000080130000 00000001ffffffff 0000000000000000
[ 819.413073] page dumped because: kasan: bad access detected
[ 819.414539] Memory state around the buggy address:
[ 819.415521] ffff8801f099c800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 819.416981] ffff8801f099c880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 819.418454] >ffff8801f099c900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 819.419921] ^
[ 819.421265] ffff8801f099c980: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 819.422745] ffff8801f099ca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 819.424206] ==================================================================
[ 819.425668] Disabling lock debugging due to kernel taint
[ 819.457463] F2FS-fs (loop0): Mounted with checkpoint version = 3
The kernel still mounts the image. If you run the following program on the mounted folder mnt,
(poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
int fd = open(foo_bar_baz, O_RDONLY, 0);
if (fd >= 0) {
read(fd, (char *)buf, 11);
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
You can get kernel crash:
[ 819.457463] F2FS-fs (loop0): Mounted with checkpoint version = 3
[ 918.028501] BUG: unable to handle kernel paging request at ffffed0048000d82
[ 918.044020] PGD 23ffee067 P4D 23ffee067 PUD 23fbef067 PMD 0
[ 918.045207] Oops: 0000 [#1] SMP KASAN PTI
[ 918.046048] CPU: 0 PID: 1309 Comm: poc Tainted: G B 4.18.0-rc1+ #4
[ 918.047573] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 918.049552] RIP: 0010:check_memory_region+0x5e/0x190
[ 918.050565] Code: f8 49 c1 e8 03 49 89 db 49 c1 eb 03 4d 01 cb 4d 01 c1 4d 8d 63 01 4c 89 c8 4d 89 e2 4d 29 ca 49 83 fa 10 7f 3d 4d 85 d2 74 32 <41> 80 39 00 75 23 48 b8 01 00 00 00 00 fc ff df 4d 01 d1 49 01 c0
[ 918.054322] RSP: 0018:ffff8801e3a1f258 EFLAGS: 00010202
[ 918.055400] RAX: ffffed0048000d82 RBX: ffff880240006c11 RCX: ffffffffb8867d14
[ 918.056832] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880240006c10
[ 918.058253] RBP: ffff8801e3a1f268 R08: 1ffff10048000d82 R09: ffffed0048000d82
[ 918.059717] R10: 0000000000000001 R11: ffffed0048000d82 R12: ffffed0048000d83
[ 918.061159] R13: ffff8801e3a1f390 R14: 0000000000000000 R15: ffff880240006c08
[ 918.062614] FS: 00007fac9732c700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 918.064246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 918.065412] CR2: ffffed0048000d82 CR3: 00000001df77a000 CR4: 00000000000006f0
[ 918.066882] Call Trace:
[ 918.067410] __asan_loadN+0xf/0x20
[ 918.068149] f2fs_find_target_dentry+0xf4/0x270
[ 918.069083] ? __get_node_page+0x331/0x5b0
[ 918.069925] f2fs_find_in_inline_dir+0x24b/0x310
[ 918.070881] ? f2fs_recover_inline_data+0x4c0/0x4c0
[ 918.071905] ? unwind_next_frame.part.5+0x34f/0x490
[ 918.072901] ? unwind_dump+0x290/0x290
[ 918.073695] ? is_bpf_text_address+0xe/0x20
[ 918.074566] __f2fs_find_entry+0x599/0x670
[ 918.075408] ? kasan_unpoison_shadow+0x36/0x50
[ 918.076315] ? kasan_kmalloc+0xad/0xe0
[ 918.077100] ? memcg_kmem_put_cache+0x55/0xa0
[ 918.077998] ? f2fs_find_target_dentry+0x270/0x270
[ 918.079006] ? d_set_d_op+0x30/0x100
[ 918.079749] ? __d_lookup_rcu+0x69/0x2e0
[ 918.080556] ? __d_alloc+0x275/0x450
[ 918.081297] ? kasan_check_write+0x14/0x20
[ 918.082135] ? memset+0x31/0x40
[ 918.082820] ? fscrypt_setup_filename+0x1ec/0x4c0
[ 918.083782] ? d_alloc_parallel+0x5bb/0x8c0
[ 918.084640] f2fs_find_entry+0xe9/0x110
[ 918.085432] ? __f2fs_find_entry+0x670/0x670
[ 918.086308] ? kasan_check_write+0x14/0x20
[ 918.087163] f2fs_lookup+0x297/0x590
[ 918.087902] ? f2fs_link+0x2b0/0x2b0
[ 918.088646] ? legitimize_path.isra.29+0x61/0xa0
[ 918.089589] __lookup_slow+0x12e/0x240
[ 918.090371] ? may_delete+0x2b0/0x2b0
[ 918.091123] ? __nd_alloc_stack+0xa0/0xa0
[ 918.091944] lookup_slow+0x44/0x60
[ 918.092642] walk_component+0x3ee/0xa40
[ 918.093428] ? is_bpf_text_address+0xe/0x20
[ 918.094283] ? pick_link+0x3e0/0x3e0
[ 918.095047] ? in_group_p+0xa5/0xe0
[ 918.095771] ? generic_permission+0x53/0x1e0
[ 918.096666] ? security_inode_permission+0x1d/0x70
[ 918.097646] ? inode_permission+0x7a/0x1f0
[ 918.098497] link_path_walk+0x2a2/0x7b0
[ 918.099298] ? apparmor_capget+0x3d0/0x3d0
[ 918.100140] ? walk_component+0xa40/0xa40
[ 918.100958] ? path_init+0x2e6/0x580
[ 918.101695] path_openat+0x1bb/0x2160
[ 918.102471] ? __save_stack_trace+0x92/0x100
[ 918.103352] ? save_stack+0xb5/0xd0
[ 918.104070] ? vfs_unlink+0x250/0x250
[ 918.104822] ? save_stack+0x46/0xd0
[ 918.105538] ? kasan_slab_alloc+0x11/0x20
[ 918.106370] ? kmem_cache_alloc+0xd1/0x1e0
[ 918.107213] ? getname_flags+0x76/0x2c0
[ 918.107997] ? getname+0x12/0x20
[ 918.108677] ? do_sys_open+0x14b/0x2c0
[ 918.109450] ? __x64_sys_open+0x4c/0x60
[ 918.110255] ? do_syscall_64+0x78/0x170
[ 918.111083] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 918.112148] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 918.113204] ? f2fs_empty_inline_dir+0x1e0/0x1e0
[ 918.114150] ? timespec64_trunc+0x5c/0x90
[ 918.114993] ? wb_io_lists_depopulated+0x1a/0xc0
[ 918.115937] ? inode_io_list_move_locked+0x102/0x110
[ 918.116949] do_filp_open+0x12b/0x1d0
[ 918.117709] ? may_open_dev+0x50/0x50
[ 918.118475] ? kasan_kmalloc+0xad/0xe0
[ 918.119246] do_sys_open+0x17c/0x2c0
[ 918.119983] ? do_sys_open+0x17c/0x2c0
[ 918.120751] ? filp_open+0x60/0x60
[ 918.121463] ? task_work_run+0x4d/0xf0
[ 918.122237] __x64_sys_open+0x4c/0x60
[ 918.123001] do_syscall_64+0x78/0x170
[ 918.123759] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 918.124802] RIP: 0033:0x7fac96e3e040
[ 918.125537] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 09 27 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e e0 01 00 48 89 04 24
[ 918.129341] RSP: 002b:00007fff1b37f848 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 918.130870] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac96e3e040
[ 918.132295] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000122d080
[ 918.133748] RBP: 00007fff1b37f9b0 R08: 00007fac9710bbd8 R09: 0000000000000001
[ 918.135209] R10: 000000000000069d R11: 0000000000000246 R12: 0000000000400c20
[ 918.136650] R13: 00007fff1b37fab0 R14: 0000000000000000 R15: 0000000000000000
[ 918.138093] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 918.147924] CR2: ffffed0048000d82
[ 918.148619] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 918.149563] RIP: 0010:check_memory_region+0x5e/0x190
[ 918.150576] Code: f8 49 c1 e8 03 49 89 db 49 c1 eb 03 4d 01 cb 4d 01 c1 4d 8d 63 01 4c 89 c8 4d 89 e2 4d 29 ca 49 83 fa 10 7f 3d 4d 85 d2 74 32 <41> 80 39 00 75 23 48 b8 01 00 00 00 00 fc ff df 4d 01 d1 49 01 c0
[ 918.154360] RSP: 0018:ffff8801e3a1f258 EFLAGS: 00010202
[ 918.155411] RAX: ffffed0048000d82 RBX: ffff880240006c11 RCX: ffffffffb8867d14
[ 918.156833] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880240006c10
[ 918.158257] RBP: ffff8801e3a1f268 R08: 1ffff10048000d82 R09: ffffed0048000d82
[ 918.159722] R10: 0000000000000001 R11: ffffed0048000d82 R12: ffffed0048000d83
[ 918.161149] R13: ffff8801e3a1f390 R14: 0000000000000000 R15: ffff880240006c08
[ 918.162587] FS: 00007fac9732c700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 918.164203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 918.165356] CR2: ffffed0048000d82 CR3: 00000001df77a000 CR4: 00000000000006f0
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-25 23:29:49 +08:00
|
|
|
|
f2fs: fix to do sanity check correctly on i_inline_xattr_size
syzbot reported an out-of-range access issue as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3292:19
index 18446744073709550491 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10689-g7af08b57bcb9 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
read_inline_xattr+0x273/0x280
lookup_all_xattrs fs/f2fs/xattr.c:341 [inline]
f2fs_getxattr+0x57b/0x13b0 fs/f2fs/xattr.c:533
vfs_getxattr_alloc+0x472/0x5c0 fs/xattr.c:393
ima_read_xattr+0x38/0x60 security/integrity/ima/ima_appraise.c:229
process_measurement+0x117a/0x1fb0 security/integrity/ima/ima_main.c:353
ima_file_check+0xd9/0x120 security/integrity/ima/ima_main.c:572
security_file_post_open+0xb9/0x280 security/security.c:3121
do_open fs/namei.c:3830 [inline]
path_openat+0x2ccd/0x3590 fs/namei.c:3987
do_file_open_root+0x3a7/0x720 fs/namei.c:4039
file_open_root+0x247/0x2a0 fs/open.c:1382
do_handle_open+0x85b/0x9d0 fs/fhandle.c:414
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
index: 18446744073709550491 (decimal, unsigned long long)
= 0xfffffffffffffb9b (hexadecimal) = -1125 (decimal, long long)
UBSAN detects that inline_xattr_addr() tries to access .i_addr[-1125].
w/ below testcase, it can reproduce this bug easily:
- mkfs.f2fs -f -O extra_attr,flexible_inline_xattr /dev/sdb
- mount -o inline_xattr_size=512 /dev/sdb /mnt/f2fs
- touch /mnt/f2fs/file
- umount /mnt/f2fs
- inject.f2fs --node --mb i_inline --nid 4 --val 0x1 /dev/sdb
- inject.f2fs --node --mb i_inline_xattr_size --nid 4 --val 2048 /dev/sdb
- mount /dev/sdb /mnt/f2fs
- getfattr /mnt/f2fs/file
The root cause is if metadata of filesystem and inode were fuzzed as below:
- extra_attr feature is enabled
- flexible_inline_xattr feature is enabled
- ri.i_inline_xattr_size = 2048
- F2FS_EXTRA_ATTR bit in ri.i_inline was not set
sanity_check_inode() will skip doing sanity check on fi->i_inline_xattr_size,
result in using invalid inline_xattr_size later incorrectly, fix it.
Meanwhile, let's fix to check lower boundary for .i_inline_xattr_size w/
MIN_INLINE_XATTR_SIZE like we did in parse_options().
There is a related issue reported by syzbot, Qasim Ijaz has anlyzed and
fixed it w/ very similar way [1], as discussed, we all agree that it will
be better to do sanity check in sanity_check_inode() for fix, so finally,
let's fix these two related bugs w/ current patch.
Including commit message from Qasim's patch as below, thanks a lot for
his contribution.
"In f2fs_getxattr(), the function lookup_all_xattrs() allocates a 12-byte
(base_size) buffer for an inline extended attribute. However, when
__find_inline_xattr() calls __find_xattr(), it uses the macro
"list_for_each_xattr(entry, addr)", which starts by calling
XATTR_FIRST_ENTRY(addr). This skips a 24-byte struct f2fs_xattr_header
at the beginning of the buffer, causing an immediate out-of-bounds read
in a 12-byte allocation. The subsequent !IS_XATTR_LAST_ENTRY(entry)
check then dereferences memory outside the allocated region, triggering
the slab-out-of bounds read.
This patch prevents the out-of-bounds read by adding a check to bail
out early if inline_size is too small and does not account for the
header plus the 4-byte value that IS_XATTR_LAST_ENTRY reads."
[1]: https://lore.kernel.org/linux-f2fs-devel/Z32y1rfBY9Qb5ZjM@qasdev.system/
Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size")
Reported-by: syzbot+69f5379a1717a0b982a1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/674f4e7d.050a0220.17bd51.004f.GAE@google.com
Reported-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=f5e74075e096e757bdbf
Tested-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Tested-by: Qasim Ijaz <qasdev00@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-14 20:34:10 +08:00
|
|
|
if (f2fs_sb_has_flexible_inline_xattr(sbi) &&
|
|
|
|
|
f2fs_has_inline_xattr(inode) &&
|
|
|
|
|
(fi->i_inline_xattr_size < MIN_INLINE_XATTR_SIZE ||
|
|
|
|
|
fi->i_inline_xattr_size > MAX_INLINE_XATTR_SIZE)) {
|
f2fs: Fix format specifier in sanity_check_inode()
When building for 32-bit platforms, for which 'size_t' is 'unsigned int',
there is a warning due to an incorrect format specifier:
fs/f2fs/inode.c:320:6: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
318 | f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_inline_xattr_size: %d, min: %lu, max: %lu",
| ~~~
| %u
319 | __func__, inode->i_ino, fi->i_inline_xattr_size,
320 | MIN_INLINE_XATTR_SIZE, MAX_INLINE_XATTR_SIZE);
| ^~~~~~~~~~~~~~~~~~~~~
fs/f2fs/f2fs.h:1855:46: note: expanded from macro 'f2fs_warn'
1855 | f2fs_printk(sbi, false, KERN_WARNING fmt, ##__VA_ARGS__)
| ~~~ ^~~~~~~~~~~
fs/f2fs/xattr.h:86:31: note: expanded from macro 'MIN_INLINE_XATTR_SIZE'
86 | #define MIN_INLINE_XATTR_SIZE (sizeof(struct f2fs_xattr_header) / sizeof(__le32))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use the format specifier for 'size_t', '%zu', to resolve the warning.
Fixes: 5c1768b67250 ("f2fs: fix to do sanity check correctly on i_inline_xattr_size")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-20 05:59:44 -07:00
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_inline_xattr_size: %d, min: %zu, max: %lu",
|
f2fs: fix to do sanity check correctly on i_inline_xattr_size
syzbot reported an out-of-range access issue as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3292:19
index 18446744073709550491 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10689-g7af08b57bcb9 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
read_inline_xattr+0x273/0x280
lookup_all_xattrs fs/f2fs/xattr.c:341 [inline]
f2fs_getxattr+0x57b/0x13b0 fs/f2fs/xattr.c:533
vfs_getxattr_alloc+0x472/0x5c0 fs/xattr.c:393
ima_read_xattr+0x38/0x60 security/integrity/ima/ima_appraise.c:229
process_measurement+0x117a/0x1fb0 security/integrity/ima/ima_main.c:353
ima_file_check+0xd9/0x120 security/integrity/ima/ima_main.c:572
security_file_post_open+0xb9/0x280 security/security.c:3121
do_open fs/namei.c:3830 [inline]
path_openat+0x2ccd/0x3590 fs/namei.c:3987
do_file_open_root+0x3a7/0x720 fs/namei.c:4039
file_open_root+0x247/0x2a0 fs/open.c:1382
do_handle_open+0x85b/0x9d0 fs/fhandle.c:414
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
index: 18446744073709550491 (decimal, unsigned long long)
= 0xfffffffffffffb9b (hexadecimal) = -1125 (decimal, long long)
UBSAN detects that inline_xattr_addr() tries to access .i_addr[-1125].
w/ below testcase, it can reproduce this bug easily:
- mkfs.f2fs -f -O extra_attr,flexible_inline_xattr /dev/sdb
- mount -o inline_xattr_size=512 /dev/sdb /mnt/f2fs
- touch /mnt/f2fs/file
- umount /mnt/f2fs
- inject.f2fs --node --mb i_inline --nid 4 --val 0x1 /dev/sdb
- inject.f2fs --node --mb i_inline_xattr_size --nid 4 --val 2048 /dev/sdb
- mount /dev/sdb /mnt/f2fs
- getfattr /mnt/f2fs/file
The root cause is if metadata of filesystem and inode were fuzzed as below:
- extra_attr feature is enabled
- flexible_inline_xattr feature is enabled
- ri.i_inline_xattr_size = 2048
- F2FS_EXTRA_ATTR bit in ri.i_inline was not set
sanity_check_inode() will skip doing sanity check on fi->i_inline_xattr_size,
result in using invalid inline_xattr_size later incorrectly, fix it.
Meanwhile, let's fix to check lower boundary for .i_inline_xattr_size w/
MIN_INLINE_XATTR_SIZE like we did in parse_options().
There is a related issue reported by syzbot, Qasim Ijaz has anlyzed and
fixed it w/ very similar way [1], as discussed, we all agree that it will
be better to do sanity check in sanity_check_inode() for fix, so finally,
let's fix these two related bugs w/ current patch.
Including commit message from Qasim's patch as below, thanks a lot for
his contribution.
"In f2fs_getxattr(), the function lookup_all_xattrs() allocates a 12-byte
(base_size) buffer for an inline extended attribute. However, when
__find_inline_xattr() calls __find_xattr(), it uses the macro
"list_for_each_xattr(entry, addr)", which starts by calling
XATTR_FIRST_ENTRY(addr). This skips a 24-byte struct f2fs_xattr_header
at the beginning of the buffer, causing an immediate out-of-bounds read
in a 12-byte allocation. The subsequent !IS_XATTR_LAST_ENTRY(entry)
check then dereferences memory outside the allocated region, triggering
the slab-out-of bounds read.
This patch prevents the out-of-bounds read by adding a check to bail
out early if inline_size is too small and does not account for the
header plus the 4-byte value that IS_XATTR_LAST_ENTRY reads."
[1]: https://lore.kernel.org/linux-f2fs-devel/Z32y1rfBY9Qb5ZjM@qasdev.system/
Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size")
Reported-by: syzbot+69f5379a1717a0b982a1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/674f4e7d.050a0220.17bd51.004f.GAE@google.com
Reported-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=f5e74075e096e757bdbf
Tested-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Tested-by: Qasim Ijaz <qasdev00@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-14 20:34:10 +08:00
|
|
|
__func__, inode->i_ino, fi->i_inline_xattr_size,
|
|
|
|
|
MIN_INLINE_XATTR_SIZE, MAX_INLINE_XATTR_SIZE);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2023-05-31 09:40:55 +08:00
|
|
|
if (!f2fs_sb_has_extra_attr(sbi)) {
|
|
|
|
|
if (f2fs_sb_has_project_quota(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: corrupted inode ino=%lx, wrong feature flag: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, F2FS_FEATURE_PRJQUOTA);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (f2fs_sb_has_inode_chksum(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: corrupted inode ino=%lx, wrong feature flag: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, F2FS_FEATURE_INODE_CHKSUM);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (f2fs_sb_has_flexible_inline_xattr(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: corrupted inode ino=%lx, wrong feature flag: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, F2FS_FEATURE_FLEXIBLE_INLINE_XATTR);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (f2fs_sb_has_inode_crtime(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: corrupted inode ino=%lx, wrong feature flag: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, F2FS_FEATURE_INODE_CRTIME);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (f2fs_sb_has_compression(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: corrupted inode ino=%lx, wrong feature flag: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, F2FS_FEATURE_COMPRESSION);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2019-03-04 17:19:04 +08:00
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:04 +01:00
|
|
|
if (f2fs_sanity_check_inline_data(inode, node_folio)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, inode->i_mode);
|
f2fs: fix to do sanity check with inline flags
https://bugzilla.kernel.org/show_bug.cgi?id=200221
- Overview
BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image
- Reproduce
- Kernel message
[ 538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257)
[ 538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0
[ 538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 540.970834] ------------[ cut here ]------------
[ 540.970838] kernel BUG at fs/inode.c:512!
[ 540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 540.982913] RIP: 0010:clear_inode+0xc0/0xd0
[ 540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 540.995786] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 540.997403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.000015] Call Trace:
[ 541.000554] f2fs_evict_inode+0x253/0x630
[ 541.001381] evict+0x16f/0x290
[ 541.002015] iput+0x280/0x300
[ 541.002654] dentry_unlink_inode+0x165/0x1e0
[ 541.003528] __dentry_kill+0x16a/0x260
[ 541.004300] dentry_kill+0x70/0x250
[ 541.005018] dput+0x154/0x1d0
[ 541.005635] do_one_tree+0x34/0x40
[ 541.006354] shrink_dcache_for_umount+0x3f/0xa0
[ 541.007285] generic_shutdown_super+0x43/0x1c0
[ 541.008192] kill_block_super+0x52/0x80
[ 541.008978] kill_f2fs_super+0x62/0x70
[ 541.009750] deactivate_locked_super+0x6f/0xa0
[ 541.010664] deactivate_super+0x5e/0x80
[ 541.011450] cleanup_mnt+0x61/0xa0
[ 541.012151] __cleanup_mnt+0x12/0x20
[ 541.012893] task_work_run+0xc8/0xf0
[ 541.013635] exit_to_usermode_loop+0x125/0x130
[ 541.014555] do_syscall_64+0x138/0x170
[ 541.015340] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 541.016375] RIP: 0033:0x7f46624bf487
[ 541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 541.039445] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 541.040392] RIP: 0010:clear_inode+0xc0/0xd0
[ 541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 541.053263] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 541.054891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.058506] ==================================================================
[ 541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[ 541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305
[ 541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 541.066778] Call Trace:
[ 541.067294] dump_stack+0x7b/0xb5
[ 541.067986] print_address_description+0x70/0x290
[ 541.068941] kasan_report+0x291/0x390
[ 541.069692] ? update_stack_state+0x38c/0x3e0
[ 541.070598] __asan_load8+0x54/0x90
[ 541.071315] update_stack_state+0x38c/0x3e0
[ 541.072172] ? __read_once_size_nocheck.constprop.7+0x20/0x20
[ 541.073340] ? vprintk_func+0x27/0x60
[ 541.074096] ? printk+0xa3/0xd3
[ 541.074762] ? __save_stack_trace+0x5e/0x100
[ 541.075634] unwind_next_frame.part.5+0x18e/0x490
[ 541.076594] ? unwind_dump+0x290/0x290
[ 541.077368] ? __show_regs+0x2c4/0x330
[ 541.078142] __unwind_start+0x106/0x190
[ 541.085422] __save_stack_trace+0x5e/0x100
[ 541.086268] ? __save_stack_trace+0x5e/0x100
[ 541.087161] ? unlink_anon_vmas+0xba/0x2c0
[ 541.087997] save_stack_trace+0x1f/0x30
[ 541.088782] save_stack+0x46/0xd0
[ 541.089475] ? __alloc_pages_slowpath+0x1420/0x1420
[ 541.090477] ? flush_tlb_mm_range+0x15e/0x220
[ 541.091364] ? __dec_node_state+0x24/0xb0
[ 541.092180] ? lock_page_memcg+0x85/0xf0
[ 541.092979] ? unlock_page_memcg+0x16/0x80
[ 541.093812] ? page_remove_rmap+0x198/0x520
[ 541.094674] ? mark_page_accessed+0x133/0x200
[ 541.095559] ? _cond_resched+0x1a/0x50
[ 541.096326] ? unmap_page_range+0xcd4/0xe50
[ 541.097179] ? rb_next+0x58/0x80
[ 541.097845] ? rb_next+0x58/0x80
[ 541.098518] __kasan_slab_free+0x13c/0x1a0
[ 541.099352] ? unlink_anon_vmas+0xba/0x2c0
[ 541.100184] kasan_slab_free+0xe/0x10
[ 541.100934] kmem_cache_free+0x89/0x1e0
[ 541.101724] unlink_anon_vmas+0xba/0x2c0
[ 541.102534] free_pgtables+0x101/0x1b0
[ 541.103299] exit_mmap+0x146/0x2a0
[ 541.103996] ? __ia32_sys_munmap+0x50/0x50
[ 541.104829] ? kasan_check_read+0x11/0x20
[ 541.105649] ? mm_update_next_owner+0x322/0x380
[ 541.106578] mmput+0x8b/0x1d0
[ 541.107191] do_exit+0x43a/0x1390
[ 541.107876] ? mm_update_next_owner+0x380/0x380
[ 541.108791] ? deactivate_super+0x5e/0x80
[ 541.109610] ? cleanup_mnt+0x61/0xa0
[ 541.110351] ? __cleanup_mnt+0x12/0x20
[ 541.111115] ? task_work_run+0xc8/0xf0
[ 541.111879] ? exit_to_usermode_loop+0x125/0x130
[ 541.112817] rewind_stack_do_exit+0x17/0x20
[ 541.113666] RIP: 0033:0x7f46624bf487
[ 541.114404] Code: Bad RIP value.
[ 541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.124061] The buggy address belongs to the page:
[ 541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 541.126651] flags: 0x2ffff0000000000()
[ 541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 541.130516] page dumped because: kasan: bad access detected
[ 541.131954] Memory state around the buggy address:
[ 541.132924] ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[ 541.134378] ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 541.137253] ^
[ 541.138637] ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.140075] ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00
[ 541.141509] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512
BUG_ON(inode->i_data.nrpages);
The root cause is root directory inode is corrupted, it has both
inline_data and inline_dentry flag, and its nlink is zero, so in
->evict(), after dropping all page cache, it grabs page #0 for inline
data truncation, result in panic in later clear_inode() where we will
check inode->i_data.nrpages value.
This patch adds inline flags check in sanity_check_inode, in addition,
do sanity check with root inode's nlink.
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 00:19:25 +08:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (f2fs_has_inline_dentry(inode) && !S_ISDIR(inode->i_mode)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_dentry, run fsck to fix",
|
|
|
|
|
__func__, inode->i_ino, inode->i_mode);
|
f2fs: fix to do sanity check with inline flags
https://bugzilla.kernel.org/show_bug.cgi?id=200221
- Overview
BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image
- Reproduce
- Kernel message
[ 538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257)
[ 538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0
[ 538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 540.970834] ------------[ cut here ]------------
[ 540.970838] kernel BUG at fs/inode.c:512!
[ 540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 540.982913] RIP: 0010:clear_inode+0xc0/0xd0
[ 540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 540.995786] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 540.997403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.000015] Call Trace:
[ 541.000554] f2fs_evict_inode+0x253/0x630
[ 541.001381] evict+0x16f/0x290
[ 541.002015] iput+0x280/0x300
[ 541.002654] dentry_unlink_inode+0x165/0x1e0
[ 541.003528] __dentry_kill+0x16a/0x260
[ 541.004300] dentry_kill+0x70/0x250
[ 541.005018] dput+0x154/0x1d0
[ 541.005635] do_one_tree+0x34/0x40
[ 541.006354] shrink_dcache_for_umount+0x3f/0xa0
[ 541.007285] generic_shutdown_super+0x43/0x1c0
[ 541.008192] kill_block_super+0x52/0x80
[ 541.008978] kill_f2fs_super+0x62/0x70
[ 541.009750] deactivate_locked_super+0x6f/0xa0
[ 541.010664] deactivate_super+0x5e/0x80
[ 541.011450] cleanup_mnt+0x61/0xa0
[ 541.012151] __cleanup_mnt+0x12/0x20
[ 541.012893] task_work_run+0xc8/0xf0
[ 541.013635] exit_to_usermode_loop+0x125/0x130
[ 541.014555] do_syscall_64+0x138/0x170
[ 541.015340] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 541.016375] RIP: 0033:0x7f46624bf487
[ 541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 541.039445] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 541.040392] RIP: 0010:clear_inode+0xc0/0xd0
[ 541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 541.053263] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 541.054891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.058506] ==================================================================
[ 541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[ 541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305
[ 541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 541.066778] Call Trace:
[ 541.067294] dump_stack+0x7b/0xb5
[ 541.067986] print_address_description+0x70/0x290
[ 541.068941] kasan_report+0x291/0x390
[ 541.069692] ? update_stack_state+0x38c/0x3e0
[ 541.070598] __asan_load8+0x54/0x90
[ 541.071315] update_stack_state+0x38c/0x3e0
[ 541.072172] ? __read_once_size_nocheck.constprop.7+0x20/0x20
[ 541.073340] ? vprintk_func+0x27/0x60
[ 541.074096] ? printk+0xa3/0xd3
[ 541.074762] ? __save_stack_trace+0x5e/0x100
[ 541.075634] unwind_next_frame.part.5+0x18e/0x490
[ 541.076594] ? unwind_dump+0x290/0x290
[ 541.077368] ? __show_regs+0x2c4/0x330
[ 541.078142] __unwind_start+0x106/0x190
[ 541.085422] __save_stack_trace+0x5e/0x100
[ 541.086268] ? __save_stack_trace+0x5e/0x100
[ 541.087161] ? unlink_anon_vmas+0xba/0x2c0
[ 541.087997] save_stack_trace+0x1f/0x30
[ 541.088782] save_stack+0x46/0xd0
[ 541.089475] ? __alloc_pages_slowpath+0x1420/0x1420
[ 541.090477] ? flush_tlb_mm_range+0x15e/0x220
[ 541.091364] ? __dec_node_state+0x24/0xb0
[ 541.092180] ? lock_page_memcg+0x85/0xf0
[ 541.092979] ? unlock_page_memcg+0x16/0x80
[ 541.093812] ? page_remove_rmap+0x198/0x520
[ 541.094674] ? mark_page_accessed+0x133/0x200
[ 541.095559] ? _cond_resched+0x1a/0x50
[ 541.096326] ? unmap_page_range+0xcd4/0xe50
[ 541.097179] ? rb_next+0x58/0x80
[ 541.097845] ? rb_next+0x58/0x80
[ 541.098518] __kasan_slab_free+0x13c/0x1a0
[ 541.099352] ? unlink_anon_vmas+0xba/0x2c0
[ 541.100184] kasan_slab_free+0xe/0x10
[ 541.100934] kmem_cache_free+0x89/0x1e0
[ 541.101724] unlink_anon_vmas+0xba/0x2c0
[ 541.102534] free_pgtables+0x101/0x1b0
[ 541.103299] exit_mmap+0x146/0x2a0
[ 541.103996] ? __ia32_sys_munmap+0x50/0x50
[ 541.104829] ? kasan_check_read+0x11/0x20
[ 541.105649] ? mm_update_next_owner+0x322/0x380
[ 541.106578] mmput+0x8b/0x1d0
[ 541.107191] do_exit+0x43a/0x1390
[ 541.107876] ? mm_update_next_owner+0x380/0x380
[ 541.108791] ? deactivate_super+0x5e/0x80
[ 541.109610] ? cleanup_mnt+0x61/0xa0
[ 541.110351] ? __cleanup_mnt+0x12/0x20
[ 541.111115] ? task_work_run+0xc8/0xf0
[ 541.111879] ? exit_to_usermode_loop+0x125/0x130
[ 541.112817] rewind_stack_do_exit+0x17/0x20
[ 541.113666] RIP: 0033:0x7f46624bf487
[ 541.114404] Code: Bad RIP value.
[ 541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.124061] The buggy address belongs to the page:
[ 541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 541.126651] flags: 0x2ffff0000000000()
[ 541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 541.130516] page dumped because: kasan: bad access detected
[ 541.131954] Memory state around the buggy address:
[ 541.132924] ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[ 541.134378] ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 541.137253] ^
[ 541.138637] ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.140075] ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00
[ 541.141509] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512
BUG_ON(inode->i_data.nrpages);
The root cause is root directory inode is corrupted, it has both
inline_data and inline_dentry flag, and its nlink is zero, so in
->evict(), after dropping all page cache, it grabs page #0 for inline
data truncation, result in panic in later clear_inode() where we will
check inode->i_data.nrpages value.
This patch adds inline flags check in sanity_check_inode, in addition,
do sanity check with root inode's nlink.
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-29 00:19:25 +08:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2020-10-08 12:15:22 -07:00
|
|
|
if ((fi->i_flags & F2FS_CASEFOLD_FL) && !f2fs_sb_has_casefold(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has casefold flag, but casefold feature is off",
|
|
|
|
|
__func__, inode->i_ino);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2024-04-25 16:58:38 +08:00
|
|
|
if (fi->i_xattr_nid && f2fs_check_nid_range(sbi, fi->i_xattr_nid)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_xattr_nid: %u, run fsck to fix.",
|
|
|
|
|
__func__, inode->i_ino, fi->i_xattr_nid);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2024-10-17 10:31:53 -07:00
|
|
|
if (IS_DEVICE_ALIASING(inode)) {
|
|
|
|
|
if (!f2fs_sb_has_device_alias(sbi)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has device alias flag, but the feature is off",
|
|
|
|
|
__func__, inode->i_ino);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
if (!f2fs_is_pinned_file(inode)) {
|
|
|
|
|
f2fs_warn(sbi, "%s: inode (ino=%lx) has device alias flag, but is not pinned",
|
|
|
|
|
__func__, inode->i_ino);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-24 11:37:18 -06:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2022-08-31 17:48:15 +08:00
|
|
|
static void init_idisk_time(struct inode *inode)
|
|
|
|
|
{
|
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
|
2023-10-04 14:52:21 -04:00
|
|
|
fi->i_disk_time[0] = inode_get_atime(inode);
|
2023-07-05 15:01:08 -04:00
|
|
|
fi->i_disk_time[1] = inode_get_ctime(inode);
|
2023-10-04 14:52:21 -04:00
|
|
|
fi->i_disk_time[2] = inode_get_mtime(inode);
|
2022-08-31 17:48:15 +08:00
|
|
|
}
|
|
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
static int do_read_inode(struct inode *inode)
|
|
|
|
|
{
|
2014-09-02 15:31:18 -07:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
2025-03-31 21:12:07 +01:00
|
|
|
struct folio *node_folio;
|
2012-11-02 17:10:40 +09:00
|
|
|
struct f2fs_inode *ri;
|
2017-07-26 00:01:41 +08:00
|
|
|
projid_t i_projid;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
|
|
|
|
/* Check if ino is within scope */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (f2fs_check_nid_range(sbi, inode->i_ino))
|
2013-03-17 17:27:20 +09:00
|
|
|
return -EINVAL;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2025-03-31 21:12:07 +01:00
|
|
|
node_folio = f2fs_get_inode_folio(sbi, inode->i_ino);
|
|
|
|
|
if (IS_ERR(node_folio))
|
|
|
|
|
return PTR_ERR(node_folio);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2025-07-08 18:03:06 +01:00
|
|
|
ri = F2FS_INODE(node_folio);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
|
|
|
|
inode->i_mode = le16_to_cpu(ri->i_mode);
|
|
|
|
|
i_uid_write(inode, le32_to_cpu(ri->i_uid));
|
|
|
|
|
i_gid_write(inode, le32_to_cpu(ri->i_gid));
|
|
|
|
|
set_nlink(inode, le32_to_cpu(ri->i_links));
|
|
|
|
|
inode->i_size = le64_to_cpu(ri->i_size);
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
inode->i_blocks = SECTOR_FROM_BLOCK(le64_to_cpu(ri->i_blocks) - 1);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2023-10-04 14:52:21 -04:00
|
|
|
inode_set_atime(inode, le64_to_cpu(ri->i_atime),
|
|
|
|
|
le32_to_cpu(ri->i_atime_nsec));
|
2023-07-05 15:01:08 -04:00
|
|
|
inode_set_ctime(inode, le64_to_cpu(ri->i_ctime),
|
|
|
|
|
le32_to_cpu(ri->i_ctime_nsec));
|
2023-10-04 14:52:21 -04:00
|
|
|
inode_set_mtime(inode, le64_to_cpu(ri->i_mtime),
|
|
|
|
|
le32_to_cpu(ri->i_mtime_nsec));
|
2012-11-02 17:10:40 +09:00
|
|
|
inode->i_generation = le32_to_cpu(ri->i_generation);
|
2018-05-07 20:28:52 +08:00
|
|
|
if (S_ISDIR(inode->i_mode))
|
|
|
|
|
fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
|
|
|
|
|
else if (S_ISREG(inode->i_mode))
|
2024-05-06 18:45:37 +08:00
|
|
|
fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures);
|
2012-11-02 17:10:40 +09:00
|
|
|
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
|
|
|
|
|
fi->i_flags = le32_to_cpu(ri->i_flags);
|
2019-06-13 16:29:53 +09:00
|
|
|
if (S_ISREG(inode->i_mode))
|
|
|
|
|
fi->i_flags &= ~F2FS_PROJINHERIT_FL;
|
2020-03-23 11:18:07 +08:00
|
|
|
bitmap_zero(fi->flags, FI_MAX);
|
2012-11-02 17:10:40 +09:00
|
|
|
fi->i_advise = ri->i_advise;
|
f2fs: fix tracking parent inode number
Previously, f2fs didn't track the parent inode number correctly which is stored
in each f2fs_inode. In the case of the following scenario, a bug can be occured.
Let's suppose there are one directory, "/b", and two files, "/a" and "/b/a".
- pino of "/a" is ROOT_INO.
- pino of "/b/a" is DIR_B_INO.
Then,
# sync
: The inode pages of "/a" and "/b/a" contain the parent inode numbers as
ROOT_INO and DIR_B_INO respectively.
# mv /a /b/a
: The parent inode number of "/a" should be changed to DIR_B_INO, but f2fs
didn't do that. Ref. f2fs_set_link().
In order to fix this clearly, I added i_pino in f2fs_inode_info, and whenever
it needs to be changed like in f2fs_add_link() and f2fs_set_link(), it is
updated temporarily in f2fs_inode_info.
And later, f2fs_write_inode() stores the latest information to the inode pages.
For power-off-recovery, f2fs_sync_file() triggers simply f2fs_write_inode().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-10 17:52:48 +09:00
|
|
|
fi->i_pino = le32_to_cpu(ri->i_pino);
|
2014-02-27 18:20:00 +09:00
|
|
|
fi->i_dir_level = ri->i_dir_level;
|
2013-10-08 18:01:51 +09:00
|
|
|
|
2016-05-20 10:13:22 -07:00
|
|
|
get_inline_info(inode, ri);
|
2013-10-08 18:01:51 +09:00
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
fi->i_extra_isize = f2fs_has_extra_attr(inode) ?
|
|
|
|
|
le16_to_cpu(ri->i_extra_isize) : 0;
|
|
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_flexible_inline_xattr(sbi)) {
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
|
|
|
|
|
} else if (f2fs_has_inline_xattr(inode) ||
|
|
|
|
|
f2fs_has_inline_dentry(inode)) {
|
|
|
|
|
fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
|
|
|
|
|
} else {
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Previous inline data or directory always reserved 200 bytes
|
|
|
|
|
* in inode layout, even if inline_xattr is disabled. In order
|
|
|
|
|
* to keep inline_dentry's structure for backward compatibility,
|
|
|
|
|
* we get the space back only from inline_data.
|
|
|
|
|
*/
|
|
|
|
|
fi->i_inline_xattr_size = 0;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-08 18:03:03 +01:00
|
|
|
if (!sanity_check_inode(inode, node_folio)) {
|
2025-03-31 21:12:07 +01:00
|
|
|
f2fs_folio_put(node_folio, true);
|
2023-08-21 23:22:23 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
2023-07-20 19:29:53 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_CORRUPTED_INODE);
|
|
|
|
|
return -EFSCORRUPTED;
|
|
|
|
|
}
|
|
|
|
|
|
2014-10-23 19:48:09 -07:00
|
|
|
/* check data exist */
|
|
|
|
|
if (f2fs_has_inline_data(inode) && !f2fs_exist_data(inode))
|
2025-03-31 21:12:32 +01:00
|
|
|
__recover_inline_status(inode, node_folio);
|
2014-10-23 19:48:09 -07:00
|
|
|
|
2018-10-03 22:32:44 +08:00
|
|
|
/* try to recover cold bit for non-dir inode */
|
2025-07-08 18:03:28 +01:00
|
|
|
if (!S_ISDIR(inode->i_mode) && !is_cold_node(node_folio)) {
|
2025-03-31 21:12:07 +01:00
|
|
|
f2fs_folio_wait_writeback(node_folio, NODE, true, true);
|
2025-07-08 18:03:19 +01:00
|
|
|
set_cold_node(node_folio, false);
|
2025-03-31 21:12:07 +01:00
|
|
|
folio_mark_dirty(node_folio);
|
2018-10-03 22:32:44 +08:00
|
|
|
}
|
|
|
|
|
|
2013-10-08 18:01:51 +09:00
|
|
|
/* get rdev by using inline_info */
|
2025-03-31 21:12:49 +01:00
|
|
|
__get_inode_rdev(inode, node_folio);
|
2015-03-17 17:16:35 -07:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_need_inode_block_update(sbi, inode->i_ino))
|
2016-05-20 20:42:37 -07:00
|
|
|
fi->last_disk_size = inode->i_size;
|
|
|
|
|
|
2018-04-03 15:08:17 +08:00
|
|
|
if (fi->i_flags & F2FS_PROJINHERIT_FL)
|
2017-07-26 00:01:41 +08:00
|
|
|
set_inode_flag(inode, FI_PROJ_INHERIT);
|
|
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_has_extra_attr(inode) && f2fs_sb_has_project_quota(sbi) &&
|
2017-07-26 00:01:41 +08:00
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_projid))
|
|
|
|
|
i_projid = (projid_t)le32_to_cpu(ri->i_projid);
|
|
|
|
|
else
|
|
|
|
|
i_projid = F2FS_DEF_PROJID;
|
|
|
|
|
fi->i_projid = make_kprojid(&init_user_ns, i_projid);
|
|
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_has_extra_attr(inode) && f2fs_sb_has_inode_crtime(sbi) &&
|
2018-01-25 14:54:42 +08:00
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) {
|
|
|
|
|
fi->i_crtime.tv_sec = le64_to_cpu(ri->i_crtime);
|
|
|
|
|
fi->i_crtime.tv_nsec = le32_to_cpu(ri->i_crtime_nsec);
|
|
|
|
|
}
|
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
if (f2fs_has_extra_attr(inode) && f2fs_sb_has_compression(sbi) &&
|
|
|
|
|
(fi->i_flags & F2FS_COMPR_FL)) {
|
|
|
|
|
if (F2FS_FITS_IN_INODE(ri, fi->i_extra_isize,
|
2023-05-17 11:41:39 +08:00
|
|
|
i_compress_flag)) {
|
2023-01-28 18:30:11 +08:00
|
|
|
unsigned short compress_flag;
|
|
|
|
|
|
2020-09-08 11:44:10 +09:00
|
|
|
atomic_set(&fi->i_compr_blocks,
|
|
|
|
|
le64_to_cpu(ri->i_compr_blocks));
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
fi->i_compress_algorithm = ri->i_compress_algorithm;
|
|
|
|
|
fi->i_log_cluster_size = ri->i_log_cluster_size;
|
2023-01-28 18:30:11 +08:00
|
|
|
compress_flag = le16_to_cpu(ri->i_compress_flag);
|
|
|
|
|
fi->i_compress_level = compress_flag >>
|
|
|
|
|
COMPRESS_LEVEL_OFFSET;
|
|
|
|
|
fi->i_compress_flag = compress_flag &
|
2023-02-16 21:53:24 +08:00
|
|
|
GENMASK(COMPRESS_LEVEL_OFFSET - 1, 0);
|
|
|
|
|
fi->i_cluster_size = BIT(fi->i_log_cluster_size);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
set_inode_flag(inode, FI_COMPRESSED_FILE);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-08-31 17:48:15 +08:00
|
|
|
init_idisk_time(inode);
|
2022-12-02 13:51:09 -08:00
|
|
|
|
2025-07-08 18:03:02 +01:00
|
|
|
if (!sanity_check_extent_cache(inode, node_folio)) {
|
2025-03-31 21:12:07 +01:00
|
|
|
f2fs_folio_put(node_folio, true);
|
2023-02-07 21:48:08 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_CORRUPTED_INODE);
|
|
|
|
|
return -EFSCORRUPTED;
|
|
|
|
|
}
|
|
|
|
|
|
2024-05-31 10:00:32 +08:00
|
|
|
/* Need all the flag bits */
|
2025-03-31 21:12:44 +01:00
|
|
|
f2fs_init_read_extent_tree(inode, node_folio);
|
2024-05-31 10:00:32 +08:00
|
|
|
f2fs_init_age_extent_tree(inode);
|
|
|
|
|
|
2025-03-31 21:12:07 +01:00
|
|
|
f2fs_folio_put(node_folio, true);
|
2014-12-05 10:51:50 -08:00
|
|
|
|
2015-07-15 17:28:53 +08:00
|
|
|
stat_inc_inline_xattr(inode);
|
2014-12-05 10:51:50 -08:00
|
|
|
stat_inc_inline_inode(inode);
|
|
|
|
|
stat_inc_inline_dir(inode);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
stat_inc_compr_inode(inode);
|
2020-09-08 11:44:10 +09:00
|
|
|
stat_add_compr_blocks(inode, atomic_read(&fi->i_compr_blocks));
|
2014-12-05 10:51:50 -08:00
|
|
|
|
2015-01-06 14:28:43 +08:00
|
|
|
return 0;
|
2012-11-02 17:10:40 +09:00
|
|
|
}
|
|
|
|
|
|
2022-09-13 15:48:12 +08:00
|
|
|
static bool is_meta_ino(struct f2fs_sb_info *sbi, unsigned int ino)
|
|
|
|
|
{
|
|
|
|
|
return ino == F2FS_NODE_INO(sbi) || ino == F2FS_META_INO(sbi) ||
|
|
|
|
|
ino == F2FS_COMPRESS_INO(sbi);
|
|
|
|
|
}
|
|
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
|
|
|
|
|
{
|
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_SB(sb);
|
|
|
|
|
struct inode *inode;
|
2013-04-20 01:28:40 +09:00
|
|
|
int ret = 0;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
|
|
|
|
inode = iget_locked(sb, ino);
|
|
|
|
|
if (!inode)
|
|
|
|
|
return ERR_PTR(-ENOMEM);
|
2013-04-20 01:28:40 +09:00
|
|
|
|
|
|
|
|
if (!(inode->i_state & I_NEW)) {
|
2022-09-13 15:48:12 +08:00
|
|
|
if (is_meta_ino(sbi, ino)) {
|
|
|
|
|
f2fs_err(sbi, "inaccessible inode: %lu, run fsck to repair", ino);
|
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
|
ret = -EFSCORRUPTED;
|
|
|
|
|
trace_f2fs_iget_exit(inode, ret);
|
|
|
|
|
iput(inode);
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_CORRUPTED_INODE);
|
2022-09-13 15:48:12 +08:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
|
}
|
|
|
|
|
|
2013-04-20 01:28:40 +09:00
|
|
|
trace_f2fs_iget(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
return inode;
|
2013-04-20 01:28:40 +09:00
|
|
|
}
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2022-09-13 15:48:12 +08:00
|
|
|
if (is_meta_ino(sbi, ino))
|
2021-05-20 19:51:50 +08:00
|
|
|
goto make_now;
|
|
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
ret = do_read_inode(inode);
|
|
|
|
|
if (ret)
|
|
|
|
|
goto bad_inode;
|
|
|
|
|
make_now:
|
|
|
|
|
if (ino == F2FS_NODE_INO(sbi)) {
|
|
|
|
|
inode->i_mapping->a_ops = &f2fs_node_aops;
|
2018-04-09 20:25:06 +08:00
|
|
|
mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
|
2012-11-02 17:10:40 +09:00
|
|
|
} else if (ino == F2FS_META_INO(sbi)) {
|
|
|
|
|
inode->i_mapping->a_ops = &f2fs_meta_aops;
|
2018-04-09 20:25:06 +08:00
|
|
|
mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
|
2021-05-20 19:51:50 +08:00
|
|
|
} else if (ino == F2FS_COMPRESS_INO(sbi)) {
|
|
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
|
inode->i_mapping->a_ops = &f2fs_compress_aops;
|
2021-11-26 18:19:19 +08:00
|
|
|
/*
|
2023-11-17 16:14:47 +00:00
|
|
|
* generic_error_remove_folio only truncates pages of regular
|
2021-11-26 18:19:19 +08:00
|
|
|
* inode
|
|
|
|
|
*/
|
|
|
|
|
inode->i_mode |= S_IFREG;
|
2021-05-20 19:51:50 +08:00
|
|
|
#endif
|
|
|
|
|
mapping_set_gfp_mask(inode->i_mapping,
|
|
|
|
|
GFP_NOFS | __GFP_HIGHMEM | __GFP_MOVABLE);
|
2012-11-02 17:10:40 +09:00
|
|
|
} else if (S_ISREG(inode->i_mode)) {
|
|
|
|
|
inode->i_op = &f2fs_file_inode_operations;
|
|
|
|
|
inode->i_fop = &f2fs_file_operations;
|
|
|
|
|
inode->i_mapping->a_ops = &f2fs_dblock_aops;
|
|
|
|
|
} else if (S_ISDIR(inode->i_mode)) {
|
|
|
|
|
inode->i_op = &f2fs_dir_inode_operations;
|
|
|
|
|
inode->i_fop = &f2fs_dir_operations;
|
|
|
|
|
inode->i_mapping->a_ops = &f2fs_dblock_aops;
|
2021-09-07 10:24:21 -07:00
|
|
|
mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
|
2012-11-02 17:10:40 +09:00
|
|
|
} else if (S_ISLNK(inode->i_mode)) {
|
2018-12-12 15:20:11 +05:30
|
|
|
if (file_is_encrypt(inode))
|
2015-04-29 15:10:53 -07:00
|
|
|
inode->i_op = &f2fs_encrypted_symlink_inode_operations;
|
|
|
|
|
else
|
|
|
|
|
inode->i_op = &f2fs_symlink_inode_operations;
|
2015-11-17 01:07:57 -05:00
|
|
|
inode_nohighmem(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
inode->i_mapping->a_ops = &f2fs_dblock_aops;
|
|
|
|
|
} else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
|
|
|
|
|
S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
|
|
|
|
|
inode->i_op = &f2fs_special_inode_operations;
|
|
|
|
|
init_special_inode(inode, inode->i_mode, inode->i_rdev);
|
|
|
|
|
} else {
|
|
|
|
|
ret = -EIO;
|
|
|
|
|
goto bad_inode;
|
|
|
|
|
}
|
2017-05-16 13:20:16 -07:00
|
|
|
f2fs_set_inode_flags(inode);
|
2021-11-12 14:31:16 -08:00
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
unlock_new_inode(inode);
|
2013-04-20 01:28:40 +09:00
|
|
|
trace_f2fs_iget(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
return inode;
|
|
|
|
|
|
|
|
|
|
bad_inode:
|
2019-04-15 15:28:33 +08:00
|
|
|
f2fs_inode_synced(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
iget_failed(inode);
|
2013-04-20 01:28:40 +09:00
|
|
|
trace_f2fs_iget_exit(inode, ret);
|
2012-11-02 17:10:40 +09:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-09 16:59:39 -07:00
|
|
|
struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino)
|
|
|
|
|
{
|
|
|
|
|
struct inode *inode;
|
|
|
|
|
retry:
|
|
|
|
|
inode = f2fs_iget(sb, ino);
|
|
|
|
|
if (IS_ERR(inode)) {
|
|
|
|
|
if (PTR_ERR(inode) == -ENOMEM) {
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14 14:07:14 -08:00
|
|
|
memalloc_retry_wait(GFP_NOFS);
|
2016-09-09 16:59:39 -07:00
|
|
|
goto retry;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return inode;
|
|
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:12:46 +01:00
|
|
|
void f2fs_update_inode(struct inode *inode, struct folio *node_folio)
|
2012-11-02 17:10:40 +09:00
|
|
|
{
|
2024-06-25 11:16:02 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
struct f2fs_inode *ri;
|
2024-06-25 11:16:02 +08:00
|
|
|
struct extent_tree *et = fi->extent_tree[EX_READ];
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2025-03-31 21:12:46 +01:00
|
|
|
f2fs_folio_wait_writeback(node_folio, NODE, true, true);
|
|
|
|
|
folio_mark_dirty(node_folio);
|
2017-12-05 12:07:47 +08:00
|
|
|
|
|
|
|
|
f2fs_inode_synced(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2025-07-08 18:03:06 +01:00
|
|
|
ri = F2FS_INODE(node_folio);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
|
|
|
|
ri->i_mode = cpu_to_le16(inode->i_mode);
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_advise = fi->i_advise;
|
2012-11-02 17:10:40 +09:00
|
|
|
ri->i_uid = cpu_to_le32(i_uid_read(inode));
|
|
|
|
|
ri->i_gid = cpu_to_le32(i_gid_read(inode));
|
|
|
|
|
ri->i_links = cpu_to_le32(inode->i_nlink);
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
ri->i_blocks = cpu_to_le64(SECTOR_TO_BLOCK(inode->i_blocks) + 1);
|
2015-02-05 17:46:29 +08:00
|
|
|
|
2022-10-31 12:24:15 -07:00
|
|
|
if (!f2fs_is_atomic_file(inode) ||
|
|
|
|
|
is_inode_flag_set(inode, FI_ATOMIC_COMMITTED))
|
|
|
|
|
ri->i_size = cpu_to_le64(i_size_read(inode));
|
|
|
|
|
|
2016-10-11 22:57:05 +08:00
|
|
|
if (et) {
|
|
|
|
|
read_lock(&et->lock);
|
2022-11-30 09:36:43 -08:00
|
|
|
set_raw_read_extent(&et->largest, &ri->i_ext);
|
2016-10-11 22:57:05 +08:00
|
|
|
read_unlock(&et->lock);
|
|
|
|
|
} else {
|
2015-06-19 17:53:26 -07:00
|
|
|
memset(&ri->i_ext, 0, sizeof(ri->i_ext));
|
2016-10-11 22:57:05 +08:00
|
|
|
}
|
2016-05-20 10:13:22 -07:00
|
|
|
set_raw_inline(inode, ri);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2023-10-04 14:52:21 -04:00
|
|
|
ri->i_atime = cpu_to_le64(inode_get_atime_sec(inode));
|
|
|
|
|
ri->i_ctime = cpu_to_le64(inode_get_ctime_sec(inode));
|
|
|
|
|
ri->i_mtime = cpu_to_le64(inode_get_mtime_sec(inode));
|
|
|
|
|
ri->i_atime_nsec = cpu_to_le32(inode_get_atime_nsec(inode));
|
|
|
|
|
ri->i_ctime_nsec = cpu_to_le32(inode_get_ctime_nsec(inode));
|
|
|
|
|
ri->i_mtime_nsec = cpu_to_le32(inode_get_mtime_nsec(inode));
|
2018-05-07 20:28:52 +08:00
|
|
|
if (S_ISDIR(inode->i_mode))
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_current_depth = cpu_to_le32(fi->i_current_depth);
|
2018-05-07 20:28:52 +08:00
|
|
|
else if (S_ISREG(inode->i_mode))
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_gc_failures = cpu_to_le16(fi->i_gc_failures);
|
|
|
|
|
ri->i_xattr_nid = cpu_to_le32(fi->i_xattr_nid);
|
|
|
|
|
ri->i_flags = cpu_to_le32(fi->i_flags);
|
|
|
|
|
ri->i_pino = cpu_to_le32(fi->i_pino);
|
2012-11-02 17:10:40 +09:00
|
|
|
ri->i_generation = cpu_to_le32(inode->i_generation);
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_dir_level = fi->i_dir_level;
|
f2fs: save device node number into f2fs_inode
This patch stores inode->i_rdev into on-disk inode structure.
Alun reported that:
aspire tmp # mount -t f2fs /dev/sdb mnt
aspire tmp # mknod mnt/sda1 b 8 1
aspire tmp # mknod mnt/null c 1 3
aspire tmp # mknod mnt/console c 5 1
aspire tmp # ls -l mnt
total 2
crw-r--r-- 1 root root 5, 1 Jan 22 18:44 console
crw-r--r-- 1 root root 1, 3 Jan 22 18:44 null
brw-r--r-- 1 root root 8, 1 Jan 22 18:44 sda1
aspire tmp # umount mnt
aspire tmp # mount -t f2fs /dev/sdb mnt
aspire tmp # ls -l mnt
total 2
crw-r--r-- 1 root root 0, 0 Jan 22 18:44 console
crw-r--r-- 1 root root 0, 0 Jan 22 18:44 null
brw-r--r-- 1 root root 0, 0 Jan 22 18:44 sda1
In this report, f2fs lost the major/minor numbers of device files after umount.
The reason was revealed that f2fs does not store the inode->i_rdev to the
on-disk inode data structure.
So, as the other file systems do, f2fs also stores i_rdev into the i_addr fields
in on-disk inode structure without any on-disk layout changes.
Note that, this bug is limited to device files made by mknod().
Reported-and-Tested-by: Alun Jones <alun.linux@ty-penguin.org.uk>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-23 09:40:23 +09:00
|
|
|
|
2017-07-26 00:01:41 +08:00
|
|
|
if (f2fs_has_extra_attr(inode)) {
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_extra_isize = cpu_to_le16(fi->i_extra_isize);
|
2017-07-19 00:19:06 +08:00
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(inode)))
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
ri->i_inline_xattr_size =
|
2024-06-25 11:16:02 +08:00
|
|
|
cpu_to_le16(fi->i_inline_xattr_size);
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_project_quota(F2FS_I_SB(inode)) &&
|
2024-06-25 11:16:02 +08:00
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_projid)) {
|
2017-07-26 00:01:41 +08:00
|
|
|
projid_t i_projid;
|
|
|
|
|
|
2024-06-25 11:16:02 +08:00
|
|
|
i_projid = from_kprojid(&init_user_ns, fi->i_projid);
|
2017-07-26 00:01:41 +08:00
|
|
|
ri->i_projid = cpu_to_le32(i_projid);
|
|
|
|
|
}
|
2018-01-25 14:54:42 +08:00
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_inode_crtime(F2FS_I_SB(inode)) &&
|
2024-06-25 11:16:02 +08:00
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) {
|
|
|
|
|
ri->i_crtime = cpu_to_le64(fi->i_crtime.tv_sec);
|
|
|
|
|
ri->i_crtime_nsec = cpu_to_le32(fi->i_crtime.tv_nsec);
|
2018-01-25 14:54:42 +08:00
|
|
|
}
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
|
if (f2fs_sb_has_compression(F2FS_I_SB(inode)) &&
|
2024-06-25 11:16:02 +08:00
|
|
|
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize,
|
2023-05-17 11:41:39 +08:00
|
|
|
i_compress_flag)) {
|
2023-01-28 18:30:11 +08:00
|
|
|
unsigned short compress_flag;
|
|
|
|
|
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_compr_blocks = cpu_to_le64(
|
|
|
|
|
atomic_read(&fi->i_compr_blocks));
|
|
|
|
|
ri->i_compress_algorithm = fi->i_compress_algorithm;
|
|
|
|
|
compress_flag = fi->i_compress_flag |
|
|
|
|
|
fi->i_compress_level <<
|
2023-01-28 18:30:11 +08:00
|
|
|
COMPRESS_LEVEL_OFFSET;
|
|
|
|
|
ri->i_compress_flag = cpu_to_le16(compress_flag);
|
2024-06-25 11:16:02 +08:00
|
|
|
ri->i_log_cluster_size = fi->i_log_cluster_size;
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
}
|
2017-07-26 00:01:41 +08:00
|
|
|
}
|
|
|
|
|
|
2025-03-31 21:12:49 +01:00
|
|
|
__set_inode_rdev(inode, node_folio);
|
2016-01-07 13:23:12 -08:00
|
|
|
|
2016-01-25 05:57:05 -08:00
|
|
|
/* deleted inode */
|
|
|
|
|
if (inode->i_nlink == 0)
|
2025-07-08 18:03:35 +01:00
|
|
|
folio_clear_f2fs_inline(node_folio);
|
2016-01-25 05:57:05 -08:00
|
|
|
|
2022-08-31 17:48:15 +08:00
|
|
|
init_idisk_time(inode);
|
2018-03-09 23:10:21 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2025-07-08 18:03:14 +01:00
|
|
|
f2fs_inode_chksum_set(F2FS_I_SB(inode), node_folio);
|
2018-03-09 23:10:21 +08:00
|
|
|
#endif
|
2012-11-02 17:10:40 +09:00
|
|
|
}
|
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_update_inode_page(struct inode *inode)
|
2012-11-02 17:10:40 +09:00
|
|
|
{
|
2014-09-02 15:31:18 -07:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2025-03-31 21:12:06 +01:00
|
|
|
struct folio *node_folio;
|
2023-01-30 15:20:09 -08:00
|
|
|
int count = 0;
|
2014-01-24 09:42:16 +09:00
|
|
|
retry:
|
2025-03-31 21:12:06 +01:00
|
|
|
node_folio = f2fs_get_inode_folio(sbi, inode->i_ino);
|
|
|
|
|
if (IS_ERR(node_folio)) {
|
|
|
|
|
int err = PTR_ERR(node_folio);
|
2021-04-06 09:47:35 +08:00
|
|
|
|
2023-01-30 15:20:09 -08:00
|
|
|
/* The node block was truncated. */
|
|
|
|
|
if (err == -ENOENT)
|
|
|
|
|
return;
|
|
|
|
|
|
f2fs: don't retry IO for corrupted data scenario
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
If node block is loaded successfully, but its content is inconsistent, it
doesn't need to retry IO.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 15:36:32 +08:00
|
|
|
if (err == -EFSCORRUPTED)
|
|
|
|
|
goto stop_checkpoint;
|
|
|
|
|
|
2023-01-30 15:20:09 -08:00
|
|
|
if (err == -ENOMEM || ++count <= DEFAULT_RETRY_IO_COUNT)
|
2014-01-24 09:42:16 +09:00
|
|
|
goto retry;
|
f2fs: don't retry IO for corrupted data scenario
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
If node block is loaded successfully, but its content is inconsistent, it
doesn't need to retry IO.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 15:36:32 +08:00
|
|
|
stop_checkpoint:
|
2023-01-30 15:20:09 -08:00
|
|
|
f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_UPDATE_INODE);
|
2017-12-05 12:07:47 +08:00
|
|
|
return;
|
2014-01-24 09:42:16 +09:00
|
|
|
}
|
2025-03-31 21:12:46 +01:00
|
|
|
f2fs_update_inode(inode, node_folio);
|
2025-03-31 21:12:06 +01:00
|
|
|
f2fs_folio_put(node_folio, true);
|
2012-11-02 17:10:40 +09:00
|
|
|
}
|
|
|
|
|
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
|
|
|
|
|
{
|
2014-09-02 15:31:18 -07:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
|
|
|
|
|
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
|
|
|
|
|
inode->i_ino == F2FS_META_INO(sbi))
|
|
|
|
|
return 0;
|
|
|
|
|
|
f2fs: fix to update time in lazytime mode
generic/018 reports an inconsistent status of atime, the
testcase is as below:
- open file with O_SYNC
- write file to construct fraged space
- calc md5 of file
- record {a,c,m}time
- defrag file --- do nothing
- umount & mount
- check {a,c,m}time
The root cause is, as f2fs enables lazytime by default, atime
update will dirty vfs inode, rather than dirtying f2fs inode (by set
with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
fail to update inode page due to our skip:
f2fs_write_inode()
if (is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
So eventually, after evict(), we lose last atime for ever.
To fix this issue, we need to check whether {a,c,m,cr}time is
consistent in between inode cache and inode page, and only skip
f2fs_update_inode() if f2fs inode is not dirty and time is
consistent as well.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-27 18:01:35 +08:00
|
|
|
/*
|
|
|
|
|
* atime could be updated without dirtying f2fs inode in lazytime mode
|
|
|
|
|
*/
|
|
|
|
|
if (f2fs_is_time_consistent(inode) &&
|
|
|
|
|
!is_inode_flag_set(inode, FI_DIRTY_INODE))
|
2013-06-10 09:17:01 +09:00
|
|
|
return 0;
|
|
|
|
|
|
2025-02-24 18:29:23 +08:00
|
|
|
/*
|
|
|
|
|
* no need to update inode page, ultimately f2fs_evict_inode() will
|
|
|
|
|
* clear dirty status of inode.
|
|
|
|
|
*/
|
|
|
|
|
if (f2fs_cp_error(sbi))
|
|
|
|
|
return -EIO;
|
|
|
|
|
|
2024-09-18 02:44:00 -06:00
|
|
|
if (!f2fs_is_checkpoint_ready(sbi)) {
|
|
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2018-08-20 19:21:43 -07:00
|
|
|
return -ENOSPC;
|
2024-09-18 02:44:00 -06:00
|
|
|
}
|
2018-08-20 19:21:43 -07:00
|
|
|
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
/*
|
2015-09-12 11:25:30 -07:00
|
|
|
* We need to balance fs here to prevent from producing dirty node pages
|
2021-03-25 02:38:11 -04:00
|
|
|
* during the urgent cleaning time when running out of free sections.
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_update_inode_page(inode);
|
2017-04-20 13:51:57 -07:00
|
|
|
if (wbc && wbc->nr_to_write)
|
2016-01-07 14:15:04 -08:00
|
|
|
f2fs_balance_fs(sbi, true);
|
2014-01-24 09:42:16 +09:00
|
|
|
return 0;
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
}
|
|
|
|
|
|
2025-07-28 21:37:26 +00:00
|
|
|
void f2fs_remove_donate_inode(struct inode *inode)
|
2025-01-31 22:27:56 +00:00
|
|
|
{
|
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
|
|
|
|
|
|
if (list_empty(&F2FS_I(inode)->gdonate_list))
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
spin_lock(&sbi->inode_lock[DONATE_INODE]);
|
|
|
|
|
list_del_init(&F2FS_I(inode)->gdonate_list);
|
|
|
|
|
sbi->donate_files--;
|
|
|
|
|
spin_unlock(&sbi->inode_lock[DONATE_INODE]);
|
|
|
|
|
}
|
|
|
|
|
|
2012-11-29 13:28:09 +09:00
|
|
|
/*
|
2012-11-02 17:10:40 +09:00
|
|
|
* Called at the last iput() if i_nlink is zero
|
|
|
|
|
*/
|
|
|
|
|
void f2fs_evict_inode(struct inode *inode)
|
|
|
|
|
{
|
2014-09-02 15:31:18 -07:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2023-02-09 10:18:19 -08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
nid_t xnid = fi->i_xattr_nid;
|
2015-08-24 17:40:45 +08:00
|
|
|
int err = 0;
|
2024-03-22 13:16:39 +09:00
|
|
|
bool freeze_protected = false;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2022-08-04 21:38:21 +08:00
|
|
|
f2fs_abort_atomic_write(inode, true);
|
2014-10-06 17:39:50 -07:00
|
|
|
|
2024-07-10 20:51:43 +09:00
|
|
|
if (fi->cow_inode && f2fs_is_cow_file(fi->cow_inode)) {
|
2023-02-09 10:18:19 -08:00
|
|
|
clear_inode_flag(fi->cow_inode, FI_COW_FILE);
|
2024-07-10 20:51:43 +09:00
|
|
|
F2FS_I(fi->cow_inode)->atomic_inode = NULL;
|
2023-02-09 10:18:19 -08:00
|
|
|
iput(fi->cow_inode);
|
|
|
|
|
fi->cow_inode = NULL;
|
|
|
|
|
}
|
|
|
|
|
|
2013-04-20 01:28:40 +09:00
|
|
|
trace_f2fs_evict_inode(inode);
|
2014-04-03 14:47:49 -07:00
|
|
|
truncate_inode_pages_final(&inode->i_data);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2021-12-29 17:47:00 +08:00
|
|
|
if ((inode->i_nlink || is_bad_inode(inode)) &&
|
|
|
|
|
test_opt(sbi, COMPRESS_CACHE) && f2fs_compressed_file(inode))
|
2021-05-20 19:51:50 +08:00
|
|
|
f2fs_invalidate_compress_pages(sbi, inode->i_ino);
|
|
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
|
2021-05-20 19:51:50 +08:00
|
|
|
inode->i_ino == F2FS_META_INO(sbi) ||
|
|
|
|
|
inode->i_ino == F2FS_COMPRESS_INO(sbi))
|
f2fs: avoid use invalid mapping of node_inode when evict meta inode
Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:
AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
[<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
[<ffffffff812359af>] evict+0x15f/0x290
[< inlined >] iput+0x196/0x280 iput_final
[<ffffffff812369a6>] iput+0x196/0x280
[<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
[<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
[<ffffffff812105fd>] kill_block_super+0x4d/0xb0
[<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
[<ffffffff81211c98>] deactivate_super+0x68/0x80
[<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
[< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount
[<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
Freed by thread T3:
[<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
[< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
[< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
[<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
[<ffffffff8107cce2>] __do_softirq+0x142/0x380
[<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
[<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
[<ffffffff810a8238>] kthread+0x148/0x160
[<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0
Allocated by thread T22276:
[<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
[<ffffffff81235e2a>] iget_locked+0x10a/0x230
[<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
[<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
[<ffffffff81211bce>] mount_bdev+0x1de/0x240
[<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
[<ffffffff81212a85>] mount_fs+0x55/0x220
[<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
[< inlined >] do_mount+0x2b4/0x1120 do_new_mount
[<ffffffff812400d4>] do_mount+0x2b4/0x1120
[< inlined >] SyS_mount+0xb2/0x110 SYSC_mount
[<ffffffff812414a2>] SyS_mount+0xb2/0x110
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
The buggy address ffff8800587866c8 is located 48 bytes inside
of 680-byte region [ffff880058786698, ffff880058786940)
Memory state around the buggy address:
ffff880058786100: ffffffff ffffffff ffffffff ffffffff
ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
ffff880058786400: ffffffff ffffffff ffffffff ffffffff
ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
^
ffff880058786700: ffffffff ffffffff ffffffff ffffffff
ffff880058786800: ffffffff ffffffff ffffffff ffffffff
ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
ffff880058786a00: ........ ........ ........ ........
ffff880058786b00: ........ ........ ........ ........
Legend:
f - 8 freed bytes
r - 8 redzone bytes
. - 8 allocated bytes
x=1..7 - x allocated bytes + (8-x) redzone bytes
Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().
It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."
Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.
Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 12:00:57 +08:00
|
|
|
goto out_clear;
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2014-09-12 15:53:45 -07:00
|
|
|
f2fs_bug_on(sbi, get_dirty_pages(inode));
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_remove_dirty_inode(inode);
|
2025-01-31 22:27:56 +00:00
|
|
|
f2fs_remove_donate_inode(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2024-10-17 10:31:53 -07:00
|
|
|
if (!IS_DEVICE_ALIASING(inode))
|
|
|
|
|
f2fs_destroy_extent_tree(inode);
|
2015-06-19 17:53:26 -07:00
|
|
|
|
2012-11-02 17:10:40 +09:00
|
|
|
if (inode->i_nlink || is_bad_inode(inode))
|
|
|
|
|
goto no_delete;
|
|
|
|
|
|
2021-10-28 21:03:05 +08:00
|
|
|
err = f2fs_dquot_initialize(inode);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
if (err) {
|
|
|
|
|
err = 0;
|
|
|
|
|
set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR);
|
|
|
|
|
}
|
2017-07-09 00:13:07 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO);
|
|
|
|
|
f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO);
|
|
|
|
|
f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO);
|
2016-11-02 20:43:21 +08:00
|
|
|
|
2024-03-22 13:16:39 +09:00
|
|
|
if (!is_sbi_flag_set(sbi, SBI_IS_FREEZING)) {
|
2022-03-04 09:40:05 -08:00
|
|
|
sb_start_intwrite(inode->i_sb);
|
2024-03-22 13:16:39 +09:00
|
|
|
freeze_protected = true;
|
|
|
|
|
}
|
2016-05-20 10:13:22 -07:00
|
|
|
set_inode_flag(inode, FI_NO_ALLOC);
|
2012-11-02 17:10:40 +09:00
|
|
|
i_size_write(inode, 0);
|
2016-05-03 09:22:18 -07:00
|
|
|
retry:
|
2012-11-02 17:10:40 +09:00
|
|
|
if (F2FS_HAS_BLOCKS(inode))
|
2016-06-02 13:49:38 -07:00
|
|
|
err = f2fs_truncate(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
|
2022-12-21 02:39:04 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_EVICT_INODE))
|
2017-03-07 13:32:20 -08:00
|
|
|
err = -EIO;
|
2018-08-13 23:38:06 +02:00
|
|
|
|
2015-08-24 17:40:45 +08:00
|
|
|
if (!err) {
|
|
|
|
|
f2fs_lock_op(sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_remove_inode_page(inode);
|
2015-08-24 17:40:45 +08:00
|
|
|
f2fs_unlock_op(sbi);
|
2022-04-30 21:19:24 +08:00
|
|
|
if (err == -ENOENT) {
|
2016-10-11 22:56:59 +08:00
|
|
|
err = 0;
|
2022-04-30 21:19:24 +08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* in fuzzed image, another node may has the same
|
|
|
|
|
* block address as inode's, if it was truncated
|
|
|
|
|
* previously, truncation of inode node will fail.
|
|
|
|
|
*/
|
|
|
|
|
if (is_inode_flag_set(inode, FI_DIRTY_INODE)) {
|
|
|
|
|
f2fs_warn(F2FS_I_SB(inode),
|
|
|
|
|
"f2fs_evict_inode: inconsistent node id, ino:%lu",
|
|
|
|
|
inode->i_ino);
|
|
|
|
|
f2fs_inode_synced(inode);
|
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
|
}
|
|
|
|
|
}
|
2015-08-24 17:40:45 +08:00
|
|
|
}
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 16:21:29 +09:00
|
|
|
|
2016-05-03 09:22:18 -07:00
|
|
|
/* give more chances, if ENOMEM case */
|
|
|
|
|
if (err == -ENOMEM) {
|
|
|
|
|
err = 0;
|
|
|
|
|
goto retry;
|
|
|
|
|
}
|
|
|
|
|
|
2024-10-17 10:31:53 -07:00
|
|
|
if (IS_DEVICE_ALIASING(inode))
|
|
|
|
|
f2fs_destroy_extent_tree(inode);
|
|
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
if (err) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_update_inode_page(inode);
|
2019-07-19 11:51:11 +08:00
|
|
|
if (dquot_initialize_needed(inode))
|
|
|
|
|
set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR);
|
f2fs: fix to avoid panic in f2fs_evict_inode
As syzbot [1] reported as below:
R10: 0000000000000100 R11: 0000000000000206 R12: 00007ffe17473450
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
</TASK>
---[ end trace 0000000000000000 ]---
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff88812d962278 by task syz-executor/564
CPU: 1 PID: 564 Comm: syz-executor Tainted: G W 6.1.129-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
<TASK>
__dump_stack+0x21/0x24 lib/dump_stack.c:88
dump_stack_lvl+0xee/0x158 lib/dump_stack.c:106
print_address_description+0x71/0x210 mm/kasan/report.c:316
print_report+0x4a/0x60 mm/kasan/report.c:427
kasan_report+0x122/0x150 mm/kasan/report.c:531
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
__list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
__list_del_entry include/linux/list.h:134 [inline]
list_del_init include/linux/list.h:206 [inline]
f2fs_inode_synced+0xf7/0x2e0 fs/f2fs/super.c:1531
f2fs_update_inode+0x74/0x1c40 fs/f2fs/inode.c:585
f2fs_update_inode_page+0x137/0x170 fs/f2fs/inode.c:703
f2fs_write_inode+0x4ec/0x770 fs/f2fs/inode.c:731
write_inode fs/fs-writeback.c:1460 [inline]
__writeback_single_inode+0x4a0/0xab0 fs/fs-writeback.c:1677
writeback_single_inode+0x221/0x8b0 fs/fs-writeback.c:1733
sync_inode_metadata+0xb6/0x110 fs/fs-writeback.c:2789
f2fs_sync_inode_meta+0x16d/0x2a0 fs/f2fs/checkpoint.c:1159
block_operations fs/f2fs/checkpoint.c:1269 [inline]
f2fs_write_checkpoint+0xca3/0x2100 fs/f2fs/checkpoint.c:1658
kill_f2fs_super+0x231/0x390 fs/f2fs/super.c:4668
deactivate_locked_super+0x98/0x100 fs/super.c:332
deactivate_super+0xaf/0xe0 fs/super.c:363
cleanup_mnt+0x45f/0x4e0 fs/namespace.c:1186
__cleanup_mnt+0x19/0x20 fs/namespace.c:1193
task_work_run+0x1c6/0x230 kernel/task_work.c:203
exit_task_work include/linux/task_work.h:39 [inline]
do_exit+0x9fb/0x2410 kernel/exit.c:871
do_group_exit+0x210/0x2d0 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1030
x64_sys_call+0x7b4/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
RIP: 0033:0x7f28b1b8e169
Code: Unable to access opcode bytes at 0x7f28b1b8e13f.
RSP: 002b:00007ffe174710a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f28b1c10879 RCX: 00007f28b1b8e169
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000002 R08: 00007ffe1746ee47 R09: 00007ffe17472360
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffe17472360
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
</TASK>
Allocated by task 569:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_alloc_info+0x25/0x30 mm/kasan/generic.c:505
__kasan_slab_alloc+0x72/0x80 mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x4f/0x2c0 mm/slab.h:737
slab_alloc_node mm/slub.c:3398 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc_lru+0x104/0x220 mm/slub.c:3429
alloc_inode_sb include/linux/fs.h:3245 [inline]
f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x186/0x880 fs/inode.c:1373
f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
f2fs_lookup+0x366/0xab0 fs/f2fs/namei.c:487
__lookup_slow+0x2a3/0x3d0 fs/namei.c:1690
lookup_slow+0x57/0x70 fs/namei.c:1707
walk_component+0x2e6/0x410 fs/namei.c:1998
lookup_last fs/namei.c:2455 [inline]
path_lookupat+0x180/0x490 fs/namei.c:2479
filename_lookup+0x1f0/0x500 fs/namei.c:2508
vfs_statx+0x10b/0x660 fs/stat.c:229
vfs_fstatat fs/stat.c:267 [inline]
vfs_lstat include/linux/fs.h:3424 [inline]
__do_sys_newlstat fs/stat.c:423 [inline]
__se_sys_newlstat+0xd5/0x350 fs/stat.c:417
__x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
x64_sys_call+0x393/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Freed by task 13:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_free_info+0x31/0x50 mm/kasan/generic.c:516
____kasan_slab_free+0x132/0x180 mm/kasan/common.c:236
__kasan_slab_free+0x11/0x20 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0xc2/0x190 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
kmem_cache_free+0x12d/0x2a0 mm/slub.c:3683
f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1562
i_callback+0x4c/0x70 fs/inode.c:250
rcu_do_batch+0x503/0xb80 kernel/rcu/tree.c:2297
rcu_core+0x5a2/0xe70 kernel/rcu/tree.c:2557
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
handle_softirqs+0x178/0x500 kernel/softirq.c:578
run_ksoftirqd+0x28/0x30 kernel/softirq.c:945
smpboot_thread_fn+0x45a/0x8c0 kernel/smpboot.c:164
kthread+0x270/0x310 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Last potentially related work creation:
kasan_save_stack+0x3a/0x60 mm/kasan/common.c:45
__kasan_record_aux_stack+0xb6/0xc0 mm/kasan/generic.c:486
kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
call_rcu+0xd4/0xf70 kernel/rcu/tree.c:2845
destroy_inode fs/inode.c:316 [inline]
evict+0x7da/0x870 fs/inode.c:720
iput_final fs/inode.c:1834 [inline]
iput+0x62b/0x830 fs/inode.c:1860
do_unlinkat+0x356/0x540 fs/namei.c:4397
__do_sys_unlink fs/namei.c:4438 [inline]
__se_sys_unlink fs/namei.c:4436 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4436
x64_sys_call+0x958/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
The buggy address belongs to the object at ffff88812d961f20
which belongs to the cache f2fs_inode_cache of size 1200
The buggy address is located 856 bytes inside of
1200-byte region [ffff88812d961f20, ffff88812d9623d0)
The buggy address belongs to the physical page:
page:ffffea0004b65800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12d960
head:ffffea0004b65800 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff88810a94c500
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Reclaimable, gfp_mask 0x1d2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_RECLAIMABLE), pid 569, tgid 568 (syz.2.16), ts 55943246141, free_ts 0
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1d0/0x1f0 mm/page_alloc.c:2532
prep_new_page mm/page_alloc.c:2539 [inline]
get_page_from_freelist+0x2e63/0x2ef0 mm/page_alloc.c:4328
__alloc_pages+0x235/0x4b0 mm/page_alloc.c:5605
alloc_slab_page include/linux/gfp.h:-1 [inline]
allocate_slab mm/slub.c:1939 [inline]
new_slab+0xec/0x4b0 mm/slub.c:1992
___slab_alloc+0x6f6/0xb50 mm/slub.c:3180
__slab_alloc+0x5e/0xa0 mm/slub.c:3279
slab_alloc_node mm/slub.c:3364 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc_lru+0x13f/0x220 mm/slub.c:3429
alloc_inode_sb include/linux/fs.h:3245 [inline]
f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x186/0x880 fs/inode.c:1373
f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
f2fs_fill_super+0x3ad7/0x6bb0 fs/f2fs/super.c:4293
mount_bdev+0x2ae/0x3e0 fs/super.c:1443
f2fs_mount+0x34/0x40 fs/f2fs/super.c:4642
legacy_get_tree+0xea/0x190 fs/fs_context.c:632
vfs_get_tree+0x89/0x260 fs/super.c:1573
do_new_mount+0x25a/0xa20 fs/namespace.c:3056
page_owner free stack trace missing
Memory state around the buggy address:
ffff88812d962100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812d962180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88812d962200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88812d962280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812d962300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[1] https://syzkaller.appspot.com/x/report.txt?x=13448368580000
This bug can be reproduced w/ the reproducer [2], once we enable
CONFIG_F2FS_CHECK_FS config, the reproducer will trigger panic as below,
so the direct reason of this bug is the same as the one below patch [3]
fixed.
kernel BUG at fs/f2fs/inode.c:857!
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
Call Trace:
<TASK>
evict+0x32a/0x7a0
do_unlinkat+0x37b/0x5b0
__x64_sys_unlink+0xad/0x100
do_syscall_64+0x5a/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
[2] https://syzkaller.appspot.com/x/repro.c?x=17495ccc580000
[3] https://lore.kernel.org/linux-f2fs-devel/20250702120321.1080759-1-chao@kernel.org
Tracepoints before panic:
f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file1
f2fs_unlink_exit: dev = (7,0), ino = 7, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 7, pino = 3, i_mode = 0x81ed, i_size = 10, i_nlink = 0, i_blocks = 0, i_advise = 0x0
f2fs_truncate_node: dev = (7,0), ino = 7, nid = 8, block_address = 0x3c05
f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file3
f2fs_unlink_exit: dev = (7,0), ino = 8, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 9000, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 0, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate_blocks_enter: dev = (7,0), ino = 8, i_size = 0, i_blocks = 24, start file offset = 0
f2fs_truncate_blocks_exit: dev = (7,0), ino = 8, ret = -2
The root cause is: in the fuzzed image, dnode #8 belongs to inode #7,
after inode #7 eviction, dnode #8 was dropped.
However there is dirent that has ino #8, so, once we unlink file3, in
f2fs_evict_inode(), both f2fs_truncate() and f2fs_update_inode_page()
will fail due to we can not load node #8, result in we missed to call
f2fs_inode_synced() to clear inode dirty status.
Let's fix this by calling f2fs_inode_synced() in error path of
f2fs_evict_inode().
PS: As I verified, the reproducer [2] can trigger this bug in v6.1.129,
but it failed in v6.16-rc4, this is because the testcase will stop due to
other corruption has been detected by f2fs:
F2FS-fs (loop0): inconsistent node block, node_type:2, nid:8, node_footer[nid:8,ino:8,ofs:0,cpver:5013063228981249506,blkaddr:15366]
F2FS-fs (loop0): f2fs_lookup: inode (ino=9) has zero i_nlink
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/x/report.txt?x=13448368580000
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-08 17:56:57 +08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If both f2fs_truncate() and f2fs_update_inode_page() failed
|
|
|
|
|
* due to fuzzed corrupted inode, call f2fs_inode_synced() to
|
|
|
|
|
* avoid triggering later f2fs_bug_on().
|
|
|
|
|
*/
|
|
|
|
|
if (is_inode_flag_set(inode, FI_DIRTY_INODE)) {
|
|
|
|
|
f2fs_warn(sbi,
|
|
|
|
|
"f2fs_evict_inode: inode is dirty, ino:%lu",
|
|
|
|
|
inode->i_ino);
|
|
|
|
|
f2fs_inode_synced(inode);
|
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
|
}
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
}
|
2024-03-22 13:16:39 +09:00
|
|
|
if (freeze_protected)
|
2022-03-04 09:40:05 -08:00
|
|
|
sb_end_intwrite(inode->i_sb);
|
2012-11-02 17:10:40 +09:00
|
|
|
no_delete:
|
2017-07-09 00:13:07 +08:00
|
|
|
dquot_drop(inode);
|
|
|
|
|
|
2015-07-15 17:28:53 +08:00
|
|
|
stat_dec_inline_xattr(inode);
|
2014-10-13 20:00:16 -07:00
|
|
|
stat_dec_inline_dir(inode);
|
2014-10-14 10:29:50 -07:00
|
|
|
stat_dec_inline_inode(inode);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
stat_dec_compr_inode(inode);
|
2020-09-08 11:44:10 +09:00
|
|
|
stat_sub_compr_blocks(inode,
|
2023-02-09 10:18:19 -08:00
|
|
|
atomic_read(&fi->i_compr_blocks));
|
2015-03-19 19:27:51 +08:00
|
|
|
|
2019-08-15 19:45:35 +08:00
|
|
|
if (likely(!f2fs_cp_error(sbi) &&
|
2018-08-20 19:21:43 -07:00
|
|
|
!is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
|
2017-09-12 14:04:05 +08:00
|
|
|
f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
|
f2fs: fix to avoid UAF in f2fs_sync_inode_meta()
syzbot reported an UAF issue as below: [1] [2]
[1] https://syzkaller.appspot.com/text?tag=CrashReport&x=16594c60580000
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff888100567dc8 by task kworker/u4:0/8
CPU: 1 PID: 8 Comm: kworker/u4:0 Tainted: G W 6.1.129-syzkaller-00017-g642656a36791 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Workqueue: writeback wb_workfn (flush-7:0)
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x151/0x1b7 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:316 [inline]
print_report+0x158/0x4e0 mm/kasan/report.c:427
kasan_report+0x13c/0x170 mm/kasan/report.c:531
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
__list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
__list_del_entry include/linux/list.h:134 [inline]
list_del_init include/linux/list.h:206 [inline]
f2fs_inode_synced+0x100/0x2e0 fs/f2fs/super.c:1553
f2fs_update_inode+0x72/0x1c40 fs/f2fs/inode.c:588
f2fs_update_inode_page+0x135/0x170 fs/f2fs/inode.c:706
f2fs_write_inode+0x416/0x790 fs/f2fs/inode.c:734
write_inode fs/fs-writeback.c:1460 [inline]
__writeback_single_inode+0x4cf/0xb80 fs/fs-writeback.c:1677
writeback_sb_inodes+0xb32/0x1910 fs/fs-writeback.c:1903
__writeback_inodes_wb+0x118/0x3f0 fs/fs-writeback.c:1974
wb_writeback+0x3da/0xa00 fs/fs-writeback.c:2081
wb_check_background_flush fs/fs-writeback.c:2151 [inline]
wb_do_writeback fs/fs-writeback.c:2239 [inline]
wb_workfn+0xbba/0x1030 fs/fs-writeback.c:2266
process_one_work+0x73d/0xcb0 kernel/workqueue.c:2299
worker_thread+0xa60/0x1260 kernel/workqueue.c:2446
kthread+0x26d/0x300 kernel/kthread.c:386
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Allocated by task 298:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_alloc_info+0x1f/0x30 mm/kasan/generic.c:505
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:333
kasan_slab_alloc include/linux/kasan.h:202 [inline]
slab_post_alloc_hook+0x53/0x2c0 mm/slab.h:768
slab_alloc_node mm/slub.c:3421 [inline]
slab_alloc mm/slub.c:3431 [inline]
__kmem_cache_alloc_lru mm/slub.c:3438 [inline]
kmem_cache_alloc_lru+0x102/0x270 mm/slub.c:3454
alloc_inode_sb include/linux/fs.h:3255 [inline]
f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x18c/0x7e0 fs/inode.c:1373
f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
f2fs_lookup+0x3c1/0xb50 fs/f2fs/namei.c:484
__lookup_slow+0x2b9/0x3e0 fs/namei.c:1689
lookup_slow+0x5a/0x80 fs/namei.c:1706
walk_component+0x2e7/0x410 fs/namei.c:1997
lookup_last fs/namei.c:2454 [inline]
path_lookupat+0x16d/0x450 fs/namei.c:2478
filename_lookup+0x251/0x600 fs/namei.c:2507
vfs_statx+0x107/0x4b0 fs/stat.c:229
vfs_fstatat fs/stat.c:267 [inline]
vfs_lstat include/linux/fs.h:3434 [inline]
__do_sys_newlstat fs/stat.c:423 [inline]
__se_sys_newlstat+0xda/0x7c0 fs/stat.c:417
__x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
x64_sys_call+0x52/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Freed by task 0:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:516
____kasan_slab_free+0x131/0x180 mm/kasan/common.c:241
__kasan_slab_free+0x11/0x20 mm/kasan/common.c:249
kasan_slab_free include/linux/kasan.h:178 [inline]
slab_free_hook mm/slub.c:1745 [inline]
slab_free_freelist_hook mm/slub.c:1771 [inline]
slab_free mm/slub.c:3686 [inline]
kmem_cache_free+0x291/0x560 mm/slub.c:3711
f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1584
i_callback+0x4b/0x70 fs/inode.c:250
rcu_do_batch+0x552/0xbe0 kernel/rcu/tree.c:2297
rcu_core+0x502/0xf40 kernel/rcu/tree.c:2557
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
handle_softirqs+0x1db/0x650 kernel/softirq.c:624
__do_softirq kernel/softirq.c:662 [inline]
invoke_softirq kernel/softirq.c:479 [inline]
__irq_exit_rcu+0x52/0xf0 kernel/softirq.c:711
irq_exit_rcu+0x9/0x10 kernel/softirq.c:723
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1118 [inline]
sysvec_apic_timer_interrupt+0xa9/0xc0 arch/x86/kernel/apic/apic.c:1118
asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:691
Last potentially related work creation:
kasan_save_stack+0x3b/0x60 mm/kasan/common.c:45
__kasan_record_aux_stack+0xb4/0xc0 mm/kasan/generic.c:486
kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
__call_rcu_common kernel/rcu/tree.c:2807 [inline]
call_rcu+0xdc/0x10f0 kernel/rcu/tree.c:2926
destroy_inode fs/inode.c:316 [inline]
evict+0x87d/0x930 fs/inode.c:720
iput_final fs/inode.c:1834 [inline]
iput+0x616/0x690 fs/inode.c:1860
do_unlinkat+0x4e1/0x920 fs/namei.c:4396
__do_sys_unlink fs/namei.c:4437 [inline]
__se_sys_unlink fs/namei.c:4435 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4435
x64_sys_call+0x289/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
The buggy address belongs to the object at ffff888100567a10
which belongs to the cache f2fs_inode_cache of size 1360
The buggy address is located 952 bytes inside of
1360-byte region [ffff888100567a10, ffff888100567f60)
The buggy address belongs to the physical page:
page:ffffea0004015800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100560
head:ffffea0004015800 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff8881002c4d80
raw: 0000000000000000 0000000080160016 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 298, tgid 298 (syz-executor330), ts 26489303743, free_ts 0
set_page_owner include/linux/page_owner.h:33 [inline]
post_alloc_hook+0x213/0x220 mm/page_alloc.c:2637
prep_new_page+0x1b/0x110 mm/page_alloc.c:2644
get_page_from_freelist+0x3a98/0x3b10 mm/page_alloc.c:4539
__alloc_pages+0x234/0x610 mm/page_alloc.c:5837
alloc_slab_page+0x6c/0xf0 include/linux/gfp.h:-1
allocate_slab mm/slub.c:1962 [inline]
new_slab+0x90/0x3e0 mm/slub.c:2015
___slab_alloc+0x6f9/0xb80 mm/slub.c:3203
__slab_alloc+0x5d/0xa0 mm/slub.c:3302
slab_alloc_node mm/slub.c:3387 [inline]
slab_alloc mm/slub.c:3431 [inline]
__kmem_cache_alloc_lru mm/slub.c:3438 [inline]
kmem_cache_alloc_lru+0x149/0x270 mm/slub.c:3454
alloc_inode_sb include/linux/fs.h:3255 [inline]
f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x18c/0x7e0 fs/inode.c:1373
f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
f2fs_fill_super+0x5360/0x6dc0 fs/f2fs/super.c:4488
mount_bdev+0x282/0x3b0 fs/super.c:1445
f2fs_mount+0x34/0x40 fs/f2fs/super.c:4743
legacy_get_tree+0xf1/0x190 fs/fs_context.c:632
page_owner free stack trace missing
Memory state around the buggy address:
ffff888100567c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888100567d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888100567d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888100567e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888100567e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[2] https://syzkaller.appspot.com/text?tag=CrashLog&x=13654c60580000
[ 24.675720][ T28] audit: type=1400 audit(1745327318.732:72): avc: denied { write } for pid=298 comm="syz-executor399" name="/" dev="loop0" ino=3 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.705426][ T296] ------------[ cut here ]------------
[ 24.706608][ T28] audit: type=1400 audit(1745327318.732:73): avc: denied { remove_name } for pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.711550][ T296] WARNING: CPU: 0 PID: 296 at fs/f2fs/inode.c:847 f2fs_evict_inode+0x1262/0x1540
[ 24.734141][ T28] audit: type=1400 audit(1745327318.732:74): avc: denied { rename } for pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.742969][ T296] Modules linked in:
[ 24.765201][ T28] audit: type=1400 audit(1745327318.732:75): avc: denied { add_name } for pid=298 comm="syz-executor399" name="bus" scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.768847][ T296] CPU: 0 PID: 296 Comm: syz-executor399 Not tainted 6.1.129-syzkaller-00017-g642656a36791 #0
[ 24.799506][ T296] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[ 24.809401][ T296] RIP: 0010:f2fs_evict_inode+0x1262/0x1540
[ 24.815018][ T296] Code: 34 70 4a ff eb 0d e8 2d 70 4a ff 4d 89 e5 4c 8b 64 24 18 48 8b 5c 24 28 4c 89 e7 e8 78 38 03 00 e9 84 fc ff ff e8 0e 70 4a ff <0f> 0b 4c 89 f7 be 08 00 00 00 e8 7f 21 92 ff f0 41 80 0e 04 e9 61
[ 24.834584][ T296] RSP: 0018:ffffc90000db7a40 EFLAGS: 00010293
[ 24.840465][ T296] RAX: ffffffff822aca42 RBX: 0000000000000002 RCX: ffff888110948000
[ 24.848291][ T296] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[ 24.856064][ T296] RBP: ffffc90000db7bb0 R08: ffffffff822ac6a8 R09: ffffed10200b005d
[ 24.864073][ T296] R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888100580000
[ 24.871812][ T296] R13: dffffc0000000000 R14: ffff88810fef4078 R15: 1ffff920001b6f5c
The root cause is w/ a fuzzed image, f2fs may missed to clear FI_DIRTY_INODE
flag for target inode, after f2fs_evict_inode(), the inode is still linked in
sbi->inode_list[DIRTY_META] global list, once it triggers checkpoint,
f2fs_sync_inode_meta() may access the released inode.
In f2fs_evict_inode(), let's always call f2fs_inode_synced() to clear
FI_DIRTY_INODE flag and drop inode from global dirty list to avoid this
UAF issue.
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/bug?extid=849174b2efaf0d8be6ba
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-08 17:53:39 +08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* anyway, it needs to remove the inode from sbi->inode_list[DIRTY_META]
|
|
|
|
|
* list to avoid UAF in f2fs_sync_inode_meta() during checkpoint.
|
|
|
|
|
*/
|
|
|
|
|
f2fs_inode_synced(inode);
|
2017-09-12 14:04:05 +08:00
|
|
|
|
2020-02-27 19:30:05 +08:00
|
|
|
/* for the case f2fs_new_inode() was failed, .i_ino is zero, skip it */
|
2017-03-04 13:56:10 -08:00
|
|
|
if (inode->i_ino)
|
|
|
|
|
invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino,
|
|
|
|
|
inode->i_ino);
|
2014-08-04 09:54:58 +08:00
|
|
|
if (xnid)
|
|
|
|
|
invalidate_mapping_pages(NODE_MAPPING(sbi), xnid, xnid);
|
2016-11-02 20:43:21 +08:00
|
|
|
if (inode->i_nlink) {
|
|
|
|
|
if (is_inode_flag_set(inode, FI_APPEND_WRITE))
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_add_ino_entry(sbi, inode->i_ino, APPEND_INO);
|
2016-11-02 20:43:21 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_UPDATE_WRITE))
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_add_ino_entry(sbi, inode->i_ino, UPDATE_INO);
|
2016-11-02 20:43:21 +08:00
|
|
|
}
|
2016-05-20 10:13:22 -07:00
|
|
|
if (is_inode_flag_set(inode, FI_FREE_NID)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_alloc_nid_failed(sbi, inode->i_ino);
|
2016-05-20 10:13:22 -07:00
|
|
|
clear_inode_flag(inode, FI_FREE_NID);
|
2017-06-01 15:39:27 -07:00
|
|
|
} else {
|
2018-04-23 23:02:31 -06:00
|
|
|
/*
|
|
|
|
|
* If xattr nid is corrupted, we can reach out error condition,
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
* err & !f2fs_exist_written_data(sbi, inode->i_ino, ORPHAN_INO)).
|
|
|
|
|
* In that case, f2fs_check_nid_range() is enough to give a clue.
|
2018-04-23 23:02:31 -06:00
|
|
|
*/
|
2015-06-23 10:36:08 -07:00
|
|
|
}
|
f2fs: avoid use invalid mapping of node_inode when evict meta inode
Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:
AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
[<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
[<ffffffff812359af>] evict+0x15f/0x290
[< inlined >] iput+0x196/0x280 iput_final
[<ffffffff812369a6>] iput+0x196/0x280
[<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
[<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
[<ffffffff812105fd>] kill_block_super+0x4d/0xb0
[<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
[<ffffffff81211c98>] deactivate_super+0x68/0x80
[<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
[< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount
[<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
Freed by thread T3:
[<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
[< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
[< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
[<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
[<ffffffff8107cce2>] __do_softirq+0x142/0x380
[<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
[<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
[<ffffffff810a8238>] kthread+0x148/0x160
[<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0
Allocated by thread T22276:
[<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
[<ffffffff81235e2a>] iget_locked+0x10a/0x230
[<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
[<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
[<ffffffff81211bce>] mount_bdev+0x1de/0x240
[<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
[<ffffffff81212a85>] mount_fs+0x55/0x220
[<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
[< inlined >] do_mount+0x2b4/0x1120 do_new_mount
[<ffffffff812400d4>] do_mount+0x2b4/0x1120
[< inlined >] SyS_mount+0xb2/0x110 SYSC_mount
[<ffffffff812414a2>] SyS_mount+0xb2/0x110
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
The buggy address ffff8800587866c8 is located 48 bytes inside
of 680-byte region [ffff880058786698, ffff880058786940)
Memory state around the buggy address:
ffff880058786100: ffffffff ffffffff ffffffff ffffffff
ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
ffff880058786400: ffffffff ffffffff ffffffff ffffffff
ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
^
ffff880058786700: ffffffff ffffffff ffffffff ffffffff
ffff880058786800: ffffffff ffffffff ffffffff ffffffff
ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
ffff880058786a00: ........ ........ ........ ........
ffff880058786b00: ........ ........ ........ ........
Legend:
f - 8 freed bytes
r - 8 redzone bytes
. - 8 allocated bytes
x=1..7 - x allocated bytes + (8-x) redzone bytes
Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().
It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."
Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.
Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 12:00:57 +08:00
|
|
|
out_clear:
|
2018-01-11 23:30:13 -05:00
|
|
|
fscrypt_put_encryption_info(inode);
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-22 09:26:24 -07:00
|
|
|
fsverity_cleanup_inode(inode);
|
f2fs: avoid use invalid mapping of node_inode when evict meta inode
Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:
AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
[<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
[<ffffffff812359af>] evict+0x15f/0x290
[< inlined >] iput+0x196/0x280 iput_final
[<ffffffff812369a6>] iput+0x196/0x280
[<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
[<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
[<ffffffff812105fd>] kill_block_super+0x4d/0xb0
[<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
[<ffffffff81211c98>] deactivate_super+0x68/0x80
[<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
[< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount
[<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
Freed by thread T3:
[<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
[< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
[< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
[<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
[<ffffffff8107cce2>] __do_softirq+0x142/0x380
[<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
[<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
[<ffffffff810a8238>] kthread+0x148/0x160
[<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0
Allocated by thread T22276:
[<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
[<ffffffff81235e2a>] iget_locked+0x10a/0x230
[<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
[<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
[<ffffffff81211bce>] mount_bdev+0x1de/0x240
[<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
[<ffffffff81212a85>] mount_fs+0x55/0x220
[<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
[< inlined >] do_mount+0x2b4/0x1120 do_new_mount
[<ffffffff812400d4>] do_mount+0x2b4/0x1120
[< inlined >] SyS_mount+0xb2/0x110 SYSC_mount
[<ffffffff812414a2>] SyS_mount+0xb2/0x110
[<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
The buggy address ffff8800587866c8 is located 48 bytes inside
of 680-byte region [ffff880058786698, ffff880058786940)
Memory state around the buggy address:
ffff880058786100: ffffffff ffffffff ffffffff ffffffff
ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
ffff880058786400: ffffffff ffffffff ffffffff ffffffff
ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
^
ffff880058786700: ffffffff ffffffff ffffffff ffffffff
ffff880058786800: ffffffff ffffffff ffffffff ffffffff
ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
ffff880058786a00: ........ ........ ........ ........
ffff880058786b00: ........ ........ ........ ........
Legend:
f - 8 freed bytes
r - 8 redzone bytes
. - 8 allocated bytes
x=1..7 - x allocated bytes + (8-x) redzone bytes
Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().
It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."
Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.
Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 12:00:57 +08:00
|
|
|
clear_inode(inode);
|
2012-11-02 17:10:40 +09:00
|
|
|
}
|
2014-09-25 11:55:53 -07:00
|
|
|
|
|
|
|
|
/* caller should call f2fs_lock_op() */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_handle_failed_inode(struct inode *inode)
|
2014-09-25 11:55:53 -07:00
|
|
|
{
|
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2016-05-02 12:34:48 -07:00
|
|
|
struct node_info ni;
|
2018-07-17 00:02:17 +08:00
|
|
|
int err;
|
2014-09-25 11:55:53 -07:00
|
|
|
|
2016-10-11 22:56:59 +08:00
|
|
|
/*
|
|
|
|
|
* clear nlink of inode in order to release resource of inode
|
|
|
|
|
* immediately.
|
|
|
|
|
*/
|
|
|
|
|
clear_nlink(inode);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* we must call this to avoid inode being remained as dirty, resulting
|
|
|
|
|
* in a panic when flushing dirty inodes in gdirty_list.
|
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_update_inode_page(inode);
|
2017-04-11 19:01:26 -07:00
|
|
|
f2fs_inode_synced(inode);
|
2016-10-11 22:56:59 +08:00
|
|
|
|
2016-05-02 12:34:48 -07:00
|
|
|
/* don't make bad inode, since it becomes a regular file. */
|
2014-09-25 11:55:53 -07:00
|
|
|
unlock_new_inode(inode);
|
|
|
|
|
|
2015-08-24 17:40:45 +08:00
|
|
|
/*
|
|
|
|
|
* Note: we should add inode to orphan list before f2fs_unlock_op()
|
|
|
|
|
* so we can prevent losing this orphan when encoutering checkpoint
|
|
|
|
|
* and following suddenly power-off.
|
|
|
|
|
*/
|
2021-12-13 14:16:32 -08:00
|
|
|
err = f2fs_get_node_info(sbi, inode->i_ino, &ni, false);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (err) {
|
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
2022-02-11 18:56:46 -08:00
|
|
|
set_inode_flag(inode, FI_FREE_NID);
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "May loss orphan inode, run fsck to fix.");
|
2018-07-17 00:02:17 +08:00
|
|
|
goto out;
|
|
|
|
|
}
|
2016-05-02 12:34:48 -07:00
|
|
|
|
|
|
|
|
if (ni.blk_addr != NULL_ADDR) {
|
2018-07-17 00:02:17 +08:00
|
|
|
err = f2fs_acquire_orphan_inode(sbi);
|
2016-05-02 12:34:48 -07:00
|
|
|
if (err) {
|
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "Too many orphan inodes, run fsck to fix.");
|
2016-05-02 12:34:48 -07:00
|
|
|
} else {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_add_orphan_inode(inode);
|
2016-05-02 12:34:48 -07:00
|
|
|
}
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_alloc_nid_done(sbi, inode->i_ino);
|
2016-05-02 12:34:48 -07:00
|
|
|
} else {
|
2016-05-20 10:13:22 -07:00
|
|
|
set_inode_flag(inode, FI_FREE_NID);
|
2015-08-24 17:40:45 +08:00
|
|
|
}
|
2014-09-25 11:55:53 -07:00
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
out:
|
2014-09-25 11:55:53 -07:00
|
|
|
f2fs_unlock_op(sbi);
|
|
|
|
|
|
|
|
|
|
/* iput will drop the inode object */
|
|
|
|
|
iput(inode);
|
|
|
|
|
}
|