Preface These principles are summarized from actual combat. Behind every principle is a bloody lesson These principles are mainly for database developers. Be sure to pay attention to them during the development process. I. Core Principles 1. Try not to do calculations in the database As the saying goes: Don't let your toes think, that's the job of your brain As database developers, we should let the database do more of what it is good at:
Example: Try not to use calculation functions such as md5(), Order by Rand(), etc. in MySQL. 2. Try to control the amount of data in a single table Everyone knows that if the amount of data in a single table is too large, it will affect the efficiency of data query, and in serious cases, it will cause the entire database to be stuck. In general, based on the estimated data volume of a single table within a year:
At the same time, we should try to do a reasonable table partitioning to prevent the single table from being overloaded. Common table partitioning strategies are:
The partition table is mainly applicable to the following scenarios: ① The table is very large and cannot be stored in memory, or there is only hot data at the end of the table, and the rest is historical data; ② The data in the partition table is easier to maintain, and independent operations can be performed on independent partitions; ③ The data of the partition table can be distributed on different machines, thereby efficiently using resources; ④ Partition tables can be used to avoid certain special bottlenecks; ⑤ Independent partitions can be backed up and restored. However, there are also some limitations when using partition tables. Please pay attention to the following when using them: ① A table can have a maximum of 1024 partitions; ② In version 5.1, the partition table expression must be an integer, and 5.5 can use column partitioning; ③ If there are primary key and unique index columns in the partition field, both the primary key column and the unique column must be included; ④ Foreign key constraints cannot be used in partition tables; ⑤ The structure of the existing table needs to be modified; ⑥ All partitions must use the same storage engine; ⑦ There are some restrictions on the functions and expressions that can be used in partition functions; ⑧ Some storage engines do not support partitioning; ⑨ For MyISAM partition tables, load index into cache cannot be used; ⑩ For MyISAM tables, more file descriptors need to be opened when using partitioned tables. 3. Try to control the number of table fields The number of fields in a single table should not be too large. Optimize and adjust according to the business scenario, and try to adjust the number of table fields to be small and precise. This has the following benefits:
So how many fields are appropriate for a single table? The evaluation is based on a single table with a size of 1G and 5 million rows of data:
==> It is recommended to limit the number of fields in a single table to 20~50 4. Balancing Paradigms and Redundancy The design of the database table structure also requires balance. In the past, we often said that we should strictly follow the three major paradigms, so let’s first talk about what the paradigm is: First normal form: a single field cannot be divided any further. Uniqueness. Second normal form: There are no non-primary attributes that only depend on part of the primary key. Eliminate incomplete dependencies. Third paradigm: Eliminate transitive dependencies. To summarize paradigms and redundancy in one sentence: Redundancy is trading storage for performance. The paradigm is to trade performance for storage. Therefore, redundancy is generally more popular in actual work. When designing the model, the specific trade-off between these two aspects must first be based on the computing power and storage resources provided by the enterprise. Secondly, in general, data warehouses are implemented in the Internet industry according to the Kimball model, and modeling is also task-driven, so the trade-off between redundancy and paradigm meets the task requirements. For example, a set of indicator data must be processed before 8 a.m., but the calculation time window is very small. In order to reduce the calculation time of the indicator as much as possible, the multi-table association should be reduced as much as possible during the calculation process, and more redundancy is required when designing the model. 5. Reject 3B Database concurrency is like city traffic, growing nonlinearly This requires us to pay attention to bottlenecks under high concurrency when developing the database to prevent database paralysis due to high concurrency. The rejection of 3B here refers to:
2. Field Class Principles 1. Make good use of numeric field types Three types of numeric values:
Here are some common examples: 1) INT(1) VS INT(11) Many people cannot tell the difference between INT(1) and INT(11). I believe everyone is very curious. In fact, 1 and 11 are just differences in display length. That is, no matter what the value of x in int(x) is, the value range of the stored number is still the value range of the int data type itself, and x is just the length of the data display. 2) BIGINT AUTO_INCREMENT As we all know, the maximum value supported by a signed int is about 2.2 billion, which is far greater than our needs and the performance limit that a single MySQL table can support. For OLTP applications, the size of a single table is generally maintained at tens of millions and will not reach the upper limit of 2.2 billion. If you want to increase the reserved amount, you can change the primary key to an unsigned int with an upper limit of 4.2 billion, which is already sufficient. Using bigint will occupy more disk and memory space. After all, memory space is limited. Ineffective occupation will lead to more data swapping in and out, which will increase IO pressure and be detrimental to performance. Therefore, it is recommended to use the int unsigned type for the auto-increment primary key, but it is not recommended to use bigint. 3) DECIMAL(N,0) When using the DECIMAL data type, the number of decimal places is generally not 0. If the number of decimal places is set to 0, it is recommended to use the INT type. 2. Convert characters to numbers Numeric VS string indexes have more advantages:
Example: Use unsigned INT to store IP instead of CHAR(15) INT UNSIGNED You can use INET_ATON() and INET_NTOA() to convert between IP strings and values. 3. Use ENUM or SET first For some enumerated data, we recommend using ENUM or SET first. Such scenarios are suitable for: 1) String type 2) The possible values are known and finite Storage: 1) ENUM occupies 1 byte and is converted to numerical operation 2) SET depends on the node and occupies up to 8 bytes 3) Single quotes are required when comparing (even for numeric values) Example: `sex` enum('F','M') COMMENT 'Gender'; `c1` enum('0','1','2','3') COMMENT 'Audit'; 4. Avoid using NULL fields Why do we try to add NOT NULL DEFAULT '' when designing database table fields? Here we have to talk about the disadvantages of using NULL fields: Difficult to optimize queries NULL columns plus indexes require extra space Composite indexes containing NULL are invalid Example: 1) `a` char(32) DEFAULT NULL [Not recommended] 2) `b` int(10) NOT NULL [Not recommended] 3) `c` int(10) NOT NULL DEFAULT 0 [Recommended] 5. Use less and split TEXT/BLOB The processing performance of TEXT type is much lower than that of VARCHAR
Try not to use TEXT/BLOB data types If it is necessary for business needs, it is recommended to split it into a separate table Example: CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT, data TEXT NOT NULL, PRIMARY KEY(id) )ENGINE=InnoDB; 6. Don’t store images in the database First picture: It can be seen that if all the images are stored in the database, the database size will increase and the reading and writing speed will slow down. Disadvantages of storing pictures in database:
★Recommended solution: Save the image path in the database Generate a path based on year, month, and day. Whether to generate the path according to year, month, day or year and month depends on your needs (not necessarily on the date). The key is to understand why you need to spread it out into multiple folders. This is explained by a principle: The operating system has a limit on the number of files in a single directory. When there are a large number of files. The speed of getting files from the directory will become slower and slower. Therefore, in order to maintain speed, it is necessary to distribute it into multiple directories according to fixed rules. The images are scattered to disk paths. The database field stores something like "images/2012/09/25/1343287394783.jpg" The original uploaded image file name will be renamed and saved, for example, generated according to the timestamp, 1343287394783.jpg. This is to avoid duplicate file names, which can occur when multiple people upload images to the same directory. No matter what rules you use to name the pictures, as long as the picture name is unique. For example, if the website has a large number of concurrent visitors, the directory generation should be as detailed as possible. For example, if it is accurate to the hour, each hour can be a folder. At the same time, in 0.001 seconds, two users are uploading pictures at the same time (because then the pictures will be stored in the same hourly folder). Because the timestamp is accurate to the second. In order to ensure the uniqueness of the image name and avoid overwriting, milliseconds and microseconds can be added after the timestamp. The rule of summary is that the greater the concurrent access volume, the higher the The more precise the better. Off topic: 1) Why is the saved disk path "images/2012/09/25/1343287394783.jpg" instead of "/images/2012/09/25/1343287394783.jpg" (with a slash at the beginning)? When you need to retrieve the image path to display the image on the page, if it is a relative path, you can use "./" + "images/2012/09/25/1343287394783.jpg" to assemble it. If you need a separate domain name (for example, when doing CDN acceleration), you can use domain names like img1.xxx.com and img2.xxx.com. Directly assemble "http://img1.xxx.com/" + "images/2012/09/25/1343287394783.jpg" 2) Why is the saved disk path "images/2012/09/25/1343287394783.jpg" instead of "http://www.xxx.com/images/2012/09/25/1343287394783.jpg" This actually involves CDN knowledge, and I won’t expand on the specific CDN knowledge here. In short: CDN service: Very suitable for static content. So for product pictures, as the number of visits increases, you only need to upload the pictures to their server when renting a CDN service. Example: Beijing accesses Changsha server, the distance is too far. I can completely put the product pictures on the cloud service in Beijing (I think the cloud storage currently provided to websites is actually CDN, which provides diversion and local access to websites). In this way, when users in Beijing visit the website, the pictures can actually be obtained from nearby locations. No long distance transmission is required. Use a domain name img.xxx.com to load pictures. This domain name resolves to a cloud service in Beijing. Practice: The database stores "images/2012/09/25/1343287394783.jpg". The images are not actually stored on the web server. Upload it to the CDN server in Beijing. I took it out from the database, directly "img.xxx.com/" + "images/2012/09/25/1343287394783.jpg" For example, if there are multiple ones, name them img1.xx.com and img2.xx.com Anyway, you can do whatever you want. So if you save the domain name directly. It becomes very troublesome. Migration troubles. 3. Index Principles 1. Add indexes carefully and reasonably
Example: Do not create an index on the "Gender" column Theoretical articles will tell you that fields with high value duplication rates are not suitable for indexing. Don’t say that the gender field has only two values. Netizens have personally tested that a field uses the first letters of the pinyin as the value, with a total of 26 possibilities. After adding the index, with a data volume of millions, the speed of using the index is slower than not using the index! Why is gender not suitable for indexing? Because you need to pay extra IO overhead to access the index, what you get from the index is just the address. If you want to actually access the data, you still need to perform IO on the table. If you want to retrieve a few data points from a table of 1 million rows, then using the index to quickly locate them and accessing the index is well worth the IO overhead. But if you take 500,000 rows of data from 1 million rows of data, such as the gender field, then you will need to access the index 500,000 times and then access the table 500,000 times. The total cost will not be less than directly scanning the table once. 2. Character fields must have a prefix index Discrimination: Single letter discrimination: 26 4-letter discrimination: 26*26*26*26 = 456,976 5-letter discrimination: 26*26*26*26*26 = 11,881,376 6-letter discrimination: 26*26*26*26*26*26 = 308,915,776 Character fields must have a prefix index, for example: `pinyin` varchar(100) DEFAULT NULL COMMENT 'District Pinyin', KEY `idx_pinyin` (`pinyin`(8)), ) ENGINE=InnoDB 3. Do not perform operations on index columns There are two reasons: 1) The index will not be used. 2) Will result in a full table scan Example: BAD SAMPLE: select * from table WHERE to_days(current_date) – to_days(date_col) <= 10 GOOD SAMPLE: select * from table WHERE date_col >= DATE_SUB('2011-10-22',INTERVAL 10 DAY); 4. Use auto-increment column or global ID as INNODB primary key
5. Avoid foreign keys as much as possible
It is recommended that the program guarantee the constraints For example, our original table creation statement is as follows: CREATE TABLE `user` ( `user_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `user_name` varchar(50) NOT NULL DEFAULT '' COMMENT 'User name', PRIMARY KEY (`user_id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `order` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `total_price` decimal(10,2) NOT NULL DEFAULT '0.00', `user_id` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `for_indx_user_id` (`user_id`), CONSTRAINT `for_indx_user_id` FOREIGN KEY (`user_id`) REFERENCES `user` (`user_id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; Without foreign key constraints: CREATE TABLE `user` ( `user_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `user_name` varchar(50) NOT NULL DEFAULT '' COMMENT 'User name', PRIMARY KEY (`user_id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `order` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `total_price` decimal(10,2) NOT NULL DEFAULT '0.00', `user_id` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; After the foreign key constraint is not applied, in order to speed up the query we usually add an index to the field that does not establish the foreign key constraint. CREATE TABLE `order` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `total_price` decimal(10,2) NOT NULL DEFAULT '0.00', `user_id` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `idx_user_id` (`user_id`), )ENGINE=InnoDB DEFAULT CHARSET=utf8; In actual development, foreign key constraints are generally not established. 4. SQL Class Principles 1. Keep SQL statements as simple as possible During the development process, we try to keep the SQL statements simple. Let's compare large SQL and multiple simple SQL statements.
Reject large SQL statements and break them down into multiple simple SQL statements
2. Keep transactions (connections) short
Example: 1) Waiting for picture upload when posting 2) A large number of sleep connections 3. Avoid using SP/TRIG/FUNC as much as possible In online OLTP systems, we should:
Leave all of the above tasks to the client program 4. Try not to use SELECT * When using SELECT *, more CPU, memory, IO, and network bandwidth will be consumed. When writing query statements, we should try not to use SELECT * and only take the required data columns:
Example: Not recommended: SELECT * FROM tag WHERE id = 999148 recommend: SELECT keyword FROM tag WHERE id = 999148 5. Rewrite OR to IN() For the same field, rewrite or as in() OR efficiency: O(n) IN efficiency: O(Log n) When n is large, OR will be much slower Pay attention to controlling the number of INs. It is recommended that n be less than 200. Example: Not recommended: Select * from opp WHERE phone='12347856' or phone='42242233' recommend: Select * from opp WHERE phone in ('12347856' , '42242233') 6. Rewrite OR as UNION For different fields, change "or" to "union"
Example: Not recommended: Select * from opp WHERE phone='010-88886666' or cellPhone='13800138000'; recommend: Select * from opp WHERE phone='010-88886666' union Select * from opp WHERE cellPhone='13800138000'; 7. Avoid negative queries and % prefix fuzzy queries In actual development, we should try to avoid negative queries. What are negative queries? They are mainly as follows: NOT, !=, <>, !<, !>, NOT EXISTS, NOT IN, NOT LIKE, etc. At the same time, we must also avoid % prefix fuzzy query, because this will use B+ Tree, and will cause the index to not be used, and will lead to a full table scan, the performance and efficiency can be imagined Example: 8. Reduce COUNT(*) In development, we often use COUNT(*), but we don’t know that this usage will cause a lot of resource waste. Because COUNT(*) has a large resource overhead, we should try to use it as little as possible. For counting statistics, we recommend:
Let's compare COUNT(*) with several other COUNTs: `id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'Company id', `sale_id` int(10) unsigned DEFAULT NULL, in conclusion: COUNT(*)=COUNT(1) COUNT(0)=COUNT(1) COUNT(1)=COUNT(100) COUNT(*)!=COUNT(col) 9. LIMIT efficient paging Traditional paging: Select * from table limit 10000,10; LIMIT principle:
Recommended paging: Select * from table WHERE id>=23423 limit 11; #10+1 (10 items per page) select * from table WHERE id>=23434 limit 11; Paging method 2: Select * from table WHERE id >= ( select id from table limit 10000,1 ) limit 10; Paging method three: SELECT * FROM table INNER JOIN (SELECT id FROM table LIMIT 10000,10) USING (id) ; Paging method four: #First use the program to obtain the ID: select id from table limit 10000,10; #Use in to get the record corresponding to ID Select * from table WHERE id in (123,456…) ; The specific need is to analyze and reorganize the index according to the actual scenario Example: 10. Use UNION ALL instead of UNION If you do not need to deduplicate the results and only want to query and display multiple tables together, use UNION ALL, because UNION has deduplication overhead. Example: MySQL>SELECT * FROM detail20091128 UNION ALL SELECT * FROM detail20110427 UNION ALL SELECT * FROM detail20110426 UNION ALL SELECT * FROM detail20110425 UNION ALL SELECT * FROM detail20110424 UNION ALL SELECT * FROM detail20110423; 11. Decompose the connection to ensure high concurrency It is not recommended to perform JOIN on more than two tables in a high-concurrency DB Proper decomposition of joins ensures high concurrency:
Example: Original SQL: MySQL> Select * from tag JOIN tag_post on tag_post.tag_id=tag.id JOIN post on tag_post.post_id=post.id WHERE tag.tag='Second-hand toys'; Breaking down the SQL: MySQL> Select * from tag WHERE tag='Second-hand toys'; MySQL> Select * from tag_post WHERE tag_id=1321; MySQL> Select * from post WHERE post.id in (123,456,314,141) 12. GROUP BY removes sorting Use GROUP BY to achieve grouping and automatic sorting No sorting required: Order by NULL Specific sorting: Group by DESC/ASC Example: 13. Comparison of column values of the same data type Principle: Numbers for numbers, characters for characters Comparison of numeric columns and character types: Convert to double precision for comparison Comparison between character columns and numeric types: The entire character column is converted to a numeric value, and index queries are not used Example: Field: `remark` varchar(50) NOT NULL COMMENT 'Remarks, empty by default', MySQL>SELECT `id`, `gift_code` FROM gift WHERE `deal_id` = 640 AND remark=115127; 1 row in set (0.14 sec) MySQL>SELECT `id`, `gift_code` FROM pool_gift WHERE `deal_id` = 640 AND remark='115127'; 1 row in set (0.005 sec) 14. Load data Fast batch data import:
Try not to use INSERT ... SELECT. One reason is that there is a delay, and the other is that synchronization errors may occur. 15. Break up large batch updates
Example: update post set tag=1 WHERE id in (1,2,3); sleep 0.01; update post set tag=1 WHERE id in (4,5,6); sleep 0.01; … 16. Know Every SQL As a DBA or even a database developer, we must be very familiar with every SQL of the database. Common commands include:
5. Principles of Agreement 1. Isolation between online and offline Build a database ecosystem to ensure wireless database operation permissions Principle: Online connects online, offline connects offline
2. Prohibit subqueries without DBA confirmation
Example: MySQL> select * from table1 where id in (select id from table2); MySQL> insert into table1 (select * from table2); // may cause replication anomalies 3. Never explicitly lock the program
For consistency issues such as concurrent deductions, we use transactions to handle them and perform a second conflict check before committing. 4. Unified character set is UTF8 5. Unified naming conventions 1) Library and table names are all in lowercase 2) The default index name is "idx_field name" 3) Use abbreviations for library names, preferably between 2 and 7 letters DataSharing ==> ds 4) Avoid using reserved words in naming It is recommended that database developers keep all the above pitfalls in mind. I hope it will be helpful for everyone’s study, and I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: Solution to the ineffective global style of the mini program custom component
>>: How to use Linux paste command
Table of contents Preface Problem Description Cau...
Sometimes it is necessary to perform simple verif...
Code: <input type="text" class="...
Preface When I was working on a project recently,...
Preface When a 403 cross-origin error occurs No &...
Summary of common operators and operators in java...
Preface The count function is used to count the r...
Preface Vue Router is the official routing manage...
Following are the quick commands to clear disk sp...
Problem Description When using Windows Server 201...
CHAR and VARCHAR types are similar, differing pri...
Purpose Understand the Nginx ngx_http_limit_conn_...
This article shares MYSQL logs and backup and res...
Abstract: This article will demonstrate how to se...
Table of contents illustrate 1. Enable Docker rem...