Detailed explanation of EXT series file system formats in Linux

Detailed explanation of EXT series file system formats in Linux

Linux File System

Common hard disks are shown in the figure above. Each disk is divided into multiple tracks, each track is divided into multiple sectors, and each sector is 512 bytes, which is the smallest storage unit of the hard disk. However, at the operating system level, multiple sectors are combined into blocks, which is the smallest unit for operating system data storage. Usually, 8 sectors form a 4K-byte block.
For Linux file systems, there are a few things to consider:

  • The file system needs to have a strict organizational form so that files can be stored in blocks.
  • The file system needs an index area to facilitate finding where the multiple blocks of a file are located.
  • If there are files that are frequently read and written recently, a cache layer is required
  • Files should be organized in folders for easy management and query
  • The Linux kernel maintains a set of data structures in its own memory to keep track of which files are opened and used by which processes.

Everything in Linux is a file, and there are the following types of files (as can be seen from the first identifier of the ls -l result):

  • - indicates a normal file
  • d indicates folder
  • c represents a character device file
  • b indicates a block device file
  • s indicates the socket file
  • l indicates a soft link

Inode and block storage

Let's take the EXT series format as an example to see how the file exists on the hard disk. First, the file will be divided into blocks and scattered on the hard disk. An index structure is needed to help us find these blocks and record some metadata of the file. This is the inode, where i stands for index. The inode data structure is as follows:

struct ext4_inode {
 __le16 i_mode; /* File mode */
 __le16 i_uid; /* Low 16 bits of Owner Uid */
 __le32 i_size_lo; /* Size in bytes */
 __le32 i_atime; /* Access time */
 __le32 i_ctime; /* Inode Change time */
 __le32 i_mtime; /* Modification time */
 __le32 i_dtime; /* Deletion Time */
 __le16 i_gid; /* Low 16 bits of Group Id */
 __le16 i_links_count; /* Links count */
 __le32 i_blocks_lo; /* Blocks count */
 __le32 i_flags; /* File flags */
 union {
  struct {
   __le32 l_i_version;
  }linux1;
  struct {
   __u32 h_i_translator;
  } hurd1;
  struct {
   __u32 m_i_reserved1;
  } masix1;
 } osd1; /* OS dependent 1 */
 __le32 i_block[EXT4_N_BLOCKS];/* Pointers to blocks */
 __le32 i_generation; /* File version (for NFS) */
 __le32 i_file_acl_lo; /* File ACL */
 __le32 i_size_high;
 __le32 i_obso_faddr; /* Obsoleted fragment address */
 union {
  struct {
   __le16 l_i_blocks_high; /* were l_i_reserved1 */
   __le16 l_i_file_acl_high;
   __le16 l_i_uid_high; /* these 2 fields */
   __le16 l_i_gid_high; /* were reserved2[0] */
   __le16 l_i_checksum_lo;/* crc32c(uuid+inum+inode) LE */
   __le16 l_i_reserved;
  }linux2;
  struct {
   __le16 h_i_reserved1; /* Obsoleted fragment number/size which are removed in ext4 */
   __u16 h_i_mode_high;
   __u16 h_i_uid_high;
   __u16 h_i_gid_high;
   __u32 h_i_author;
  } hurd2;
  struct {
   __le16 h_i_reserved1; /* Obsoleted fragment number/size which are removed in ext4 */
   __le16 m_i_file_acl_high;
   __u32 m_i_reserved2[2];
  } masix2;
 } osd2; /* OS dependent 2 */
 __le16 i_extra_isize;
 __le16 i_checksum_hi; /* crc32c(uuid+inum+inode) BE */
 __le32 i_ctime_extra; /* extra Change time (nsec << 2 | epoch) */
 __le32 i_mtime_extra; /* extra Modification time(nsec << 2 | epoch) */
 __le32 i_atime_extra; /* extra Access time (nsec << 2 | epoch) */
 __le32 i_crtime; /* File Creation time */
 __le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
 __le32 i_version_hi; /* high 32 bits for 64-bit version */
 __le32 i_projid; /* Project ID */
};

Among them, __le32 i_block[EXT4_N_BLOCKS] stores the reference to the data block. EXT4_N_BLOCKS is defined as follows:

#define EXT4_NDIR_BLOCKS 12
#define EXT4_IND_BLOCK EXT4_NDIR_BLOCKS
#define EXT4_DIND_BLOCK (EXT4_IND_BLOCK + 1)
#define EXT4_TIND_BLOCK (EXT4_DIND_BLOCK + 1)
#define EXT4_N_BLOCKS (EXT4_TIND_BLOCK + 1)

In ext2 and ext3, the first 12 items of i_block store direct references to data blocks, the 13th item stores references to indirect blocks, and the location of data blocks is stored in the indirect blocks. Similarly, the 14th item stores the location of secondary indirect blocks, and the 15th item stores the location of tertiary indirect blocks, as shown in the following figure:

It is not difficult to see that for large files, the hard disk needs to be read multiple times to find the corresponding blocks. Extents Tree is proposed in ext4 to solve this problem. The core idea is to represent continuous blocks by the starting position plus the number of blocks, instead of recording the position of each block one by one, thus saving storage space. First, it replaces the original 415=60 bytes of space in i_block with an extent header (ext4_extent_header) plus 4 extent entries (ext4_extent), because both ext4_extent_header and ext4_extent occupy 12 bytes. The first bit in ee_len is used to determine whether it is initialized, so it can also store a maximum of 32K numbers. Therefore, an extent entry can store a maximum of 32K4K=128M of data. If a file is larger than 4128M=512M or the file is stored in more than 4 non-contiguous blocks, we need to expand the i_block structure in the inode. Its extent entry must be changed from ext4_extent to ext4_extent_idx structure, which points to a block with 4K bytes. Excluding the 12 bytes occupied by the header, 340 ext4_extents can be stored, and the maximum amount of data that can be stored is 340128M=42.5G. It can be seen that this index structure is very efficient when files are stored in continuous blocks.

struct ext4_extent_header {
 __le16 eh_magic; /* ext4 extents identifier: 0xF30A */
 __le16 eh_entries; /* Number of valid nodes in the current level*/
 __le16 eh_max; /* Maximum number of nodes in the current level*/
 __le16 eh_depth; /* The depth of the current level in the tree, 0 is a leaf node, i.e. a data node, >0 represents an index node*/
 __le32 eh_generation; 
}
struct ext4_extent {
 __le32 ee_block; /* extent's starting block logical number*/
 __le16 ee_len; /* Number of blocks contained in extent*/
 __le16 ee_start_hi; /*High 16 bits of the physical address of the extent start block*/
 __le32 ee_start_lo; /*low 32 bits of the physical address of the extent start block*/
}; //extent_body format in data node struct ext4_extent_idx {
 __le32 ei_block; /* The logical sequence number of the starting block of the file range covered by the index*/
 __le32 ei_leaf_lo; /* Stores the lower 32 bits of the physical address of the block of the next level extents*/ 
 __le16 ei_leaf_hi; /* The upper 16 bits of the physical address of the block that stores the next level extents*/
 __u16 ei_unused;

}; //extent_body format in index node

An example of a /var/log/messages file is shown below:

Inode bitmap and block bitmap

There will be areas on the hard disk that are dedicated to storing block data and inodes. However, when we want to create a new file, we need to know which inode area and which block are empty. This requires using one block to store the inode bitmap and one block to store the block bitmap respectively. Each bit is 1 for occupied and 0 for unoccupied. However, a block has at most 4K*8=32K bits, which means it can represent the status of up to 32K blocks, so these blocks need to be organized into a block group to build a larger system.

Hard links and soft links

A hard link shares the same inode with the original file, and inodes cannot cross file systems, so hard links cannot cross file systems either.

A soft link has its own inode, but when the file is opened it points to another file, so it can cross file systems and still exist when the original file is deleted.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Using Ext3 file system in Linux environment
  • Tutorial on adjusting the size of lvm logical volume partition in Linux (for different file systems such as xfs and ext4)
  • Linux file systems explained: ext4 and beyond
  • Use mysqladmin extended-status with Linux commands to view MySQL running status in Linux

<<:  How to encapsulate axios in Vue project (unified management of http requests)

>>:  Detailed explanation of common MySQL operation commands in Linux terminal

Recommend

Web project development VUE mixing and inheritance principle

Table of contents Mixin Mixin Note (duplicate nam...

Detailed explanation of Zabbix installation and deployment practices

Preface Zabbix is ​​one of the most mainstream op...

Summary of the differences between Html, sHtml and XHtml

For example: <u> This has no ending characte...

How to install Docker on Windows 10 Home Edition

I recently used Docker to upgrade a project. I ha...

Solve the error problem caused by modifying mysql data_dir

Today, I set up a newly purchased Alibaba Cloud E...

JavaScript simulation calculator

This article shares the specific code of JavaScri...

In-depth explanation of Set and WeakSet collections in ES6

Table of contents Set is a special collection who...

Navicat for MySQL 11 Registration Code\Activation Code Summary

Recommended reading: Navicat12.1 series cracking ...

Calling Baidu Map to obtain longitude and latitude in Vue

In the project, it is necessary to obtain the lat...

MySQL 5.7 JSON type usage details

JSON is a lightweight data exchange format that u...

Docker container connection implementation steps analysis

Generally speaking, after the container is starte...

Centos7 installation and configuration of Mysql5.7

Step 1: Get the MySQL YUM source Go to the MySQL ...

docker cp copy files and enter the container

Enter the running container # Enter the container...

How to use SVG icons in WeChat applets

SVG has been widely used in recent years due to i...