Database Command Specification
Basic database design specifications 1. All tables must use the Innodb storage engine. If there are no special requirements (i.e., functions that Innodb cannot meet, such as column storage, storage space data, etc.), all tables must use the Innodb storage engine (Myisam is used by default before MySQL 5.5, and Innodb is used by default after 5.6). 2. The character set of the database and table is unified to use UTF8 3. All tables and fields need to be commented Use the comment clause to add comments to tables and columns, and maintain the data dictionary from the beginning 4. Try to control the size of the data in a single table. It is recommended to control it within 5 million. The amount of data can be controlled by archiving historical data (applied to log data), sub-library and sub-table (applied to business data), etc. 5. Use MySQL partition tables with caution. Partition tables are physically represented by multiple files, but logically represented as one table. 6. Try to separate hot and cold data and reduce the width of the table Use cache more effectively to avoid reading useless cold data; Put columns that are often used together into one table (to avoid more join operations). 7. It is forbidden to create reserved fields in the table. The naming of reserved fields is difficult to make the names clear and meaningful. The reserved field cannot confirm the data type stored, so the appropriate type cannot be selected. Modifications to reserved field types will lock the table. 8. It is forbidden to store large binary data such as pictures and files in the database. Usually, the files are very large, which will cause the data volume to grow rapidly in a short period of time. When the database reads the database, it usually performs a large number of random IO operations. When the file is large, the IO operation is very time-consuming. Usually stored on a file server, the database only stores file address information 9. Do not perform database stress testing online Database field design specifications 1. Give priority to the smallest data type that meets storage needs reason: The larger the column field, the more space is required to create the index. The fewer index nodes can be stored in a page, the more IO times are required for traversal, and the worse the index performance is. method: Convert a string into a digital type for storage, such as converting an IP address into an integer inet_aton converts ip to unsigned integer (4-8 bits) For non-negative data (such as auto-increment ID, integer IP), unsigned integers should be used for storage: Unsigned data can take up twice as much storage space as signed data. SIGNED INT -2147483648~2147483647 2. Avoid using TEXT and BLOB data types. The most common TEXT type can store 64k of data. It is recommended to separate BLOB or TEXT columns into separate extension tables. Moreover, for this kind of data, Mysql still needs to perform a second query, which will make the SQL performance very poor, but it does not mean that such data types cannot be used. If you must use it, it is recommended to separate the BLOB or TEXT column into a separate extension table. When querying, do not use select * but only retrieve the necessary columns. Do not query the TEXT column if the data in the column is not needed. TEXT or BLOB types can only use prefix indexes because MySQL has a limit on the length of index fields, so TEXT types can only use prefix indexes, and TEXT columns cannot have default values. 3. Avoid using ENUM types To modify an ENUM value, you need to use the ALTER statement The ORDER BY operation of ENUM type is inefficient and requires additional operations It is forbidden to use numeric values as ENUM enumeration values 4. Define all columns as NOT NULL whenever possible reason: Indexing NULL columns requires extra space to store, so more space is used for comparisons and calculations. NULL values must be handled specially. 5. Use TIMESTAMP (4 bytes) or DATETIME type (8 bytes) to store time TIMESTAMP stores the time range 1970-01-01 00:00:01 ~ 2038-01-19-03:14:07 People often use strings to store date data (incorrect practice) Disadvantage 1: Cannot use date functions for calculations and comparisons Disadvantage 2: Using strings to store dates takes up more space 6. Financial-related monetary data must use the decimal type Inaccurate floating point: float, double Index design specifications 1. Limit the number of indexes on each table. It is recommended that a single table should have no more than 5 indexes, but the more the better! Indexes can improve efficiency but can also reduce efficiency. Indexes can increase query efficiency, but they can also reduce the efficiency of insertion and update, and even query efficiency in some cases. Because when the MySQL optimizer chooses how to optimize a query, it will evaluate each available index based on unified information to generate the best execution plan. If many indexes can be used for queries at the same time, it will increase the time for the MySQL optimizer to generate an execution plan and also reduce query performance. 2. Do not create a separate index for each column in the table 3. Every Innodb table must have a primary key Innodb organizes tables in the order of primary key indexes. Do not use frequently updated columns as primary keys, and do not use multi-column primary keys (equivalent to joint indexes) Common index column recommendations Columns that appear in the WHERE clause of SELECT, UPDATE, and DELETE statements. Fields that are included in ORDER BY, GROUP BY, and DISTINCT. Do not create an index for all columns that match the fields in 1 and 2. It is usually better to create a joint index for the fields in 1 and 2. Related columns of multi-table joins How to choose the order of index columns The purpose of creating an index is to use the index to search for data, reduce random IO, and increase query performance. The less data the index can filter out, the less data will be read from the disk. The most discriminative column is placed on the left side of the joint index (discriminativeness = number of different values in the column / total number of rows in the column) Avoid creating redundant and duplicate indexes (increases the time it takes for the query optimizer to generate an execution plan) Duplicate index examples: primary key(id), index(id), unique index(id) For frequent queries, consider using covering indexes first. Covering index: an index that includes all query fields (fields included in where, select, ordery by, group by). Benefits of covering index: Avoid secondary query of Innodb table index Innodb stores data in the order of clustered indexes. For Innodb, the secondary index stores the primary key information of the row in the leaf node. If you use the secondary index to query data, after finding the corresponding key value, you must perform a secondary query through the primary key to obtain the data we actually need. In a covering index, all data can be obtained from the key value of the secondary index, avoiding secondary queries on the primary key, reducing IO operations, and improving query efficiency. Random IO can be converted into sequential IO to speed up query efficiency. Since covering indexes are stored in the order of key values, for IO-intensive range searches, the data IO is much less than randomly reading each row from the disk. Therefore, using covering indexes during access, the random read IO of the disk can be converted into sequential IO for index searches. Index SET Specification Avoid using foreign key constraints as much as possible. It is not recommended to use foreign key constraints, but you must create indexes on the association keys between tables. Foreign keys can be used to ensure the referential integrity of data, but it is recommended to implement foreign keys on the business side. This will affect the write operations of the parent and child tables, thereby reducing performance. Database SQL development specifications 1. It is recommended to use precompiled statements for database operations Precompiled statements can reuse these plans, reducing the time required for SQL compilation and solving the SQL injection problem caused by dynamic SQL. Passing only parameters is more efficient than passing SQL statements. The same statement can be parsed once and used multiple times to improve processing efficiency. 2. Avoid implicit conversion of data types Implicit conversion will cause the index to fail, such as: select name,phone from customer where id = '111'; 3. Make full use of existing indexes on the table Avoid using double % signs in query conditions. For example: a like '%123%' (if there is no leading %, but only trailing %, the index on the column can be used) A SQL statement can only use one column in a composite index for range query. For example, if there is a joint index on columns a, b, and c, and there is a range query on column a in the query condition, the indexes on columns b and c will not be used. When defining a joint index, if column a is to be used for range search, column a should be placed on the right side of the joint index, and left join or not exists should be used to optimize the not in operation, because not in also usually causes index invalidation. 4. When designing a database, you should consider future expansion 5. The program connects to different databases using different accounts, and performs cross-database queries Leave room for database migration and sub-database and sub-table to reduce business coupling and avoid security risks caused by excessive permissions 6. Do not use SELECT *. You must use SELECT <field list> to query. reason: Consumes more CPU and IO and network bandwidth resources. Cannot use covering indexes to reduce the impact of table structure changes. 7. Do not use INSERT statements without a field list like: insert into values ('a','b','c'); Should use: insert into t(c1,c2,c3) values ('a','b','c'); 8. Avoid using subqueries. You can optimize subqueries into join operations. Usually, the subquery can be converted into a correlated query for optimization only when the subquery is in the in clause and the subquery is a simple SQL (excluding union, group by, order by, and limit clauses). Reasons for poor subquery performance: The result set of a subquery cannot use an index. Usually, the result set of a subquery will be stored in a temporary table. No matter it is a temporary table in memory or a temporary table on disk, there will be no index, so the query performance will be affected to a certain extent. Especially for subqueries that return larger result sets, the impact on query performance is greater. Since subqueries will generate a large number of temporary tables without indexes, they will consume too many CPU and IO resources and generate a large number of slow queries. 9. Avoid using JOIN to join too many tables For MySQL, there is an associated cache, and the size of the cache can be set by the join_buffer_size parameter. In MySQL, if you join multiple tables in the same SQL statement, one more association cache will be allocated. The more tables are associated in one SQL statement, the more memory will be occupied. If a large number of multi-table join operations are used in the program and the join_buffer_size is not set reasonably, it will easily cause server memory overflow, which will affect the stability of server database performance. At the same time, for the association operation, temporary table operations will be generated, affecting the query efficiency. Mysql allows a maximum of 61 tables to be associated, and it is recommended not to exceed 5. 10. Reduce the number of interactions with the database Databases are more suitable for processing batch operations. Combining multiple identical operations together can improve processing efficiency. 11. When performing or judgment on the same column, use in instead of or The number of in values should not exceed 500. The in operation can make more effective use of indexes, or rarely use indexes in most cases. 12. Do not use order by rand() for random sorting order by rand() will load all the data that meets the conditions in the table into the memory, and then sort all the data in the memory according to the randomly generated values, and may generate a random value for each row. If the data set that meets the conditions is very large, it will consume a lot of CPU, IO and memory resources. It is recommended to get a random value in the program and then get the data from the database. 13. Function conversion and calculation of columns are prohibited in the WHERE clause When a column is converted or calculated using a function, the index cannot be used. Not recommended: recommend: 14. Use UNION ALL instead of UNION when there are obviously no duplicate values UNION will put all the data of the two result sets into a temporary table and then perform deduplication. 15. Split complex large SQL into multiple small SQL The big SQL is logically complex and requires a lot of CPU to calculate. Database Operation Behavior Specification Batch write (UPDATE, DELETE, INSERT) operations of more than 1 million rows must be performed in batches multiple times 1. Large batch operations may cause serious master-slave delays In a master-slave environment, large batch operations may cause serious master-slave delays. Large batch write operations generally take a long time to execute. Only when the execution on the master database is completed will it be executed on other slave databases, so it will cause a long delay between the master database and the slave database. 2. When binlog logs are in row format, a large number of logs will be generated Large batch write operations will generate a large number of logs, especially for row-format binary data. Since the modification of each row of data is recorded in the row format, the more data we modify at a time, the more logs will be generated, and the longer it will take to transmit and restore the logs. This is also a reason for the master-slave delay. 3. Avoid large transaction operations Large batches of data must be modified in one transaction, which will cause large batches of data in the table to be locked, resulting in a large amount of blocking, which will have a very large impact on MySQL performance. In particular, long-term blocking will occupy all available connections of the database, which will prevent other applications in the production environment from connecting to the database. Therefore, it is important to note that large batch write operations must be performed in batches. For large tables, use pt-online-schema-change to modify the table structure Avoiding master-slave delays caused by large table modifications Avoid locking the table when modifying table fields. Be cautious when modifying the data structure of a large table, as this will cause serious table locking operations, which is intolerable, especially in a production environment. pt-online-schema-change will first create a new table with the same structure as the original table, modify the table structure on the new table, then copy the data in the original table to the new table, and add some triggers to the original table. Copy the newly added data in the original table to the new table. After all the row data is copied, name the new table as the original table and delete the original table. Break down the original DDL operation into multiple small batches. It is forbidden to grant super permissions to the account used by the program When the maximum number of connections is reached, another user with super privileges is run. For the program to connect to the database account, follow the principle of least privilege The database account used by the program can only be used under one DB. The account used by the program across databases is not allowed to have drop permissions in principle. You may also be interested in:
|
>>: VMware vCenter 6.7 installation process (graphic tutorial)
Table of contents Props comparison of class compo...
Table of contents 1. Custom routing 2. Tab naviga...
Preface In this article, we will use Docker to bu...
This article uses examples to illustrate the prin...
This article example shares the specific code of ...
A style sheet describes how a document should be ...
01. Command Overview The seq command is used to g...
Table of contents background Technical Solution S...
Vue card flip carousel display, while switching d...
What is a generator? A generator is some code tha...
Requirements: The PC side and the mobile side are...
Modern browsers no longer allow JavaScript to be ...
Overview binlog2sql is an open source MySQL Binlo...
1. I purchased a VPS and CentOS system, and found...
<br />This web page production skills tutori...