MySQL database development specifications [recommended]

MySQL database development specifications [recommended]

Recently, we have been capturing SQL online for optimization while organizing the development specifications to minimize the number of new problematic SQLs entering the production database. Today we also had a training for the company's development. I won’t upload the PPT here, but there were about ten SQL production cases in it. Because most of the specifications are universal, we also drew on the specifications of Qunar and Ganji. However, in the process of writing this article, each specification was based on a negative example that was used as a reference in work. If time permits, I will single out a portion of it to analyze its principles or illustrate it with cases.

1. Naming conventions

1. Database names, table names, and field names must use lowercase letters and be separated by underscores.

(1)MySQL has a configuration parameter lower_case_table_names=1, which means that the table names are stored in lowercase and are case-insensitive. If it is 0, the library table name is stored as it is, case sensitive; if it is 2, it is stored as it is, but compared in lowercase.

(2) If uppercase and lowercase letters are mixed, multiple tables such as abc, Abc, ABC, etc. may coexist, which may easily lead to confusion.

(3) Field names are displayed as case-sensitive, but are not case-sensitive in actual use. That is, you cannot create two fields with the same name but different case.

(4) To unify standards, use lowercase letters for library names, table names, and field names.

2. The database name starts with d, the table name starts with t, and the field name starts with f_

(1) For example, in table t_crm_relation, the crm in the middle represents the business module name

(2) Views begin with view_, events begin with event_, triggers begin with trig_, stored procedures begin with proc_, and functions begin with func_

(3) The common index is named idx_col1_col2, and the unique index is named uk_col1_col2 (the f_public part can be removed). Such as idx_companyid_corpid_contacttime(f_company_id,f_corp_id,f_contact_time)

3. The library name, table name, and field name must not exceed 32 characters.

The library name, table name, and field name support a maximum of 64 characters, but for the sake of uniformity, easy identification, and reduced transmission volume, no more than 32 characters are allowed

4. Temporary database and table names must be suffixed with tmp plus date

Such as t_crm_relation_tmp0425. The backup table is similar, with the form of _bak20160425.

5. The table divided by date and time must comply with the format of _YYYY[MM][DD]

This is also to prepare for the possibility of separate tables in the future, such as t_crm_ec_record_201403, but t_crm_contact_at201506 breaks this specification.
Those without time characteristics are directly named as t_tbname_001.

2. Basic specifications of database tables

1. Use Innodb storage engine

Starting from version 5.5, the default storage engine of MySQL is InnoDB. Starting from version 5.7, MyISAM is abandoned for system tables.

2. Table character set uniformly uses UTF8

(1) The UTF8 character set takes up 3 bytes to store Chinese characters and 1 byte to store English characters

(2) The proofreading character set uses the default utf8_general_ci

(3) The connected client also uses utf8. Specify charset or SET NAMES UTF8; when establishing a connection. (For those who have been using latin1 in their projects for a long time, there is no way to save them)

(4) If you need to store emojis such as EMOJ, you can apply to use the UTF8MB4 character set.

3. Add comments to all tables

(1) Try to add comments to the fields

(2) Status type must specify the meaning of the main value, such as "0-offline, 1-online"

4. Control the number of fields in a single table

(1) The upper limit of the number of fields in a single table is about 30. If there are more fields, consider vertical splitting the table. First, separate hot and cold data. Second, separate large fields. Third, do not separate columns that are often used together for conditions and return columns.

(2) The table fields are controlled in a concise and precise manner, which can improve IO efficiency and cache more valid data in memory, thereby improving response speed and concurrency capabilities, and subsequent alter table operations are also faster.

5. All tables must explicitly specify a primary key

(1) The primary key should be auto-incremented as much as possible. InnoDB table is actually an index-organized table. Sequential storage can improve access efficiency and make full use of disk space. It is also needed for some complex queries that may require self-join for optimization.

(2) When a globally unique primary key is required, use an external ticket server (under construction)

(3) If there is no primary key or unique index, update/delete locates the row for the operation through all fields, which is equivalent to a full table scan for each row.

(4) In some cases, a joint unique primary key can be used, but this requires consultation with the DBA.

6. Foreign key references are not enforced

Even if the fields of the two tables have a clear foreign key reference relationship, FOREIGN KEY is not used because new records will be checked against the primary key table, affecting performance.

7. Use stored procedures and views appropriately, and prohibit the use of triggers and events

(1) Although stored procedures can simplify business-side code and may be used when writing complex logic in traditional enterprises, changes are very frequent in Internet enterprises. It is quite troublesome to upgrade a stored procedure when there are separate databases and tables. And because it does not record logs, it is not convenient to debug performance issues. If you use a procedure, be sure to consider what happens if the execution fails.

(2) To a certain extent, the use of views is also to reduce the complexity of SQL in the code, but sometimes the universality of the view will sacrifice performance (for example, returning unnecessary fields).

(3) The same applies to triggers, but they should not be used to constrain data consistency. MySQL only supports "row-based triggering", that is, a trigger is always targeted at a record rather than the entire SQL statement. If the data set being changed is very large, the efficiency will be very low. Hiding the work behind a SQL statement will be disastrous if a problem occurs, but it will be difficult to analyze and locate quickly. Furthermore, the pt-osc tool cannot be used when ddl is required. Put it in transaction execution.

(4) Events are also a form of laziness. We have encountered several cases where the business was affected by the failure of scheduled tasks, and MySQL was unable to issue a failure warning for it. Establish a dedicated job scheduler platform.

a. The amount of data in a single table is controlled within 5000w

b. Plain text passwords are not allowed to be stored in the database

3. Field Specification

1. Definition of string types such as char, varchar, text, etc.

(1) For columns with a fixed length, if the column is updated frequently, char is suitable.

(2) Although varchar can store variable-length strings, it cannot be too small or too large. UTF8 can store up to 21844 Chinese characters or 65532 English characters

(3) varbinary(M) stores binary strings. It stores bytes instead of characters, so there is no concept of character set. The length of M is 0-255 (bytes). Only used for case-sensitive types when sorting or comparing, not including password storage

(4) The TEXT type is similar to VARCHAR in that it stores variable lengths and has a maximum limit of 2^16. However, the content after 20 bytes is stored outside the data page (row_format=dynamic). Its use requires one more addressing and has no default value.

It is generally used to store values ​​that have a large average capacity and are not operated as frequently as other fields.

Some articles on the Internet say that you should avoid using text and blob. You should know that if you only use varchar, it may cause row overflow. The effect is similar, but because each row occupies too many bytes, the number of data rows and pages that buffer_pool can cache will decrease. In addition, indexes are generally not created on text and blob. Instead, third-party full-text search engines such as Sphinx are used. If a (prefix) index is indeed created, it will affect performance. Everything depends on the specific scenario.

Also, split text/blob into another table if possible

(5) BLOB is an extended version of varbinary. The content is stored in binary strings, has no character set, and is case-sensitive. There is a scenario that is often mentioned but not used: do not store images in the database.

2. Definition of numeric types such as int, tinyint, decimal, etc.

(1) Use tinyint instead of enum and boolean
When the ENUM type needs to modify or add enumeration values, online DDL is required, which is costly; if the ENUM column value contains numeric types, it may cause confusion about the default value.
Tinyint uses 1 byte and is generally used for status, type, and flag columns.

(2) It is recommended to use UNSIGNED to store non-negative values. Compared with not using unsigned, the range of available values ​​can be doubled.

(3) int uses a fixed 4-byte storage. The difference between int(11) and int(4) is only in the display width.

(4) Use Decimal instead of float/double to store precise floating-point numbers. For types such as currency and amount, use decimal, such as decimal(9,2). By default, float can only be accurate to 6 significant digits.

3.Timestamp and datetime selection

(1) The datetime and timestamp types occupy different storage spaces, 8 bytes for the former and 4 bytes for the latter. As a result, the time ranges that they can represent are different. The former ranges from 1000-01-01 00:00:00 to 9999-12-31 23:59:59, and the latter ranges from 1970-01-01 08:00:01 to 2038-01-19 11:14:07. Therefore, the range supported by TIMESTAMP is smaller than that of DATATIME.

(2) timestamp can automatically update the time field when inserting/updating rows (such as f_set_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP), but a table can only have one such definition.

(3) The timestamp display is related to the time zone and is always stored internally in UTC milliseconds. Also subject to strict mode restrictions

(4) Timestamp is preferred, but datetime is also acceptable.

(5) Do not use time functions on time columns in the where condition

4. It is recommended that all fields be defined as NOT NULL

(1) If it is an index field, it must be defined as not null. Because null values ​​can affect cordinate statistics and affect the optimizer's choice of index.

(2) If you cannot guarantee that there will be a value when inserting, use default '' or 0 when defining it.

5. Field definitions with the same meaning must be the same

For example, if there are f_user_id fields in different tables, then their types and field lengths should be designed to be the same.

4. Index Specification

1. Any new select, update, and delete operations must be explained first to see the index usage.

Try to avoid the appearance of extra columns: Using File Sort, Using Temporary, and be cautious when going online if the number of rows exceeds 1,000.
explain

(1) type: ALL, index, range, ref, eq_ref, const, system, NULL (from left to right, performance from worst to best)

(2) possible_keys: indicates which index MySQL can use to find records in the table. If there is an index on the field involved in the query, the index will be listed, but it may not be used in the query.

(3) key: indicates the key (index) that MySQL actually decides to use
If no index was chosen, key is NULL. To force MySQL to use or ignore the index on the possible_keys column, use FORCE INDEX, USE INDEX, or IGNORE INDEX in your query.

(4) ref: indicates which columns or constants are used to find the values ​​on the index column.

(5) rows: The estimated number of rows needed to find the required records based on table statistics and index selection.

(6) Extra

a.Using temporary: Indicates that MySQL needs to use a temporary table to store the result set, which is common in sorting and grouping queries
b. Using filesort: The sorting operation that cannot be completed using indexes in MySQL is called "file sorting"

1. Limitation of the number of indexes

(1) Indexes are a double-edged sword. They increase the maintenance burden, increase IO pressure, and multiply the space occupied by indexes.

(2) The number of indexes in a single table should be controlled within 5, or should not exceed 20% of the number of table fields. If multiple fields in a single table need to use separate indexes for query requirements, DBA evaluation is required.

2. Avoid redundant indexes

(1.) InnoDB table is an index-organized table. The primary key is a clustered index placed together with the data. The ordinary index ultimately points to the primary key address, so it is redundant to make the primary key the last column. If f_crm_id is used as the primary key, the f_crm_id on the joint index (f_user_id,f_crm_id) is completely redundant.

(2) (a,b,c), (a,b), the latter is a redundant index. Prefix index can be used to achieve acceleration and reduce maintenance burden

3. If there is no special requirement, use the auto-increment id as the primary key

(1.) The primary key is a clustered index and is written sequentially. If the combined unique index is used as the primary key, it is written randomly and is suitable for tables with few writes and many reads.

(2) Primary key updates are not allowed

4. Try to build indexes on columns with high selectivity

(1) Do not create indexes on low-cardinality columns, such as gender and type. But there is one case, idx_feedbackid_type (f_feedback_id,f_type), if f_type=1 is often used for comparison and 90% of the rows can be filtered out, then this composite index is worth creating. Sometimes the same query statement uses different indexes due to different condition values, and the same principle applies.

(2) Index selectivity calculation method (cardinality ÷ number of data rows)

Selectivity = Cardinality / Total Rows = select count(distinct col1)/count(*) from tbname. The closer it is to 1, the better the filtering effect of using the index on col1.

(3) When the number of rows scanned by index exceeds 30%, switch to full table scan

5. Leftmost prefix principle

(1) When MySQL uses a joint index, it matches from left to right. When a disconnection or range query is encountered, the subsequent index columns cannot be used. For example, the index idx_c1_c2_c3 (c1,c2,c3) is equivalent to creating three indexes (c1), (c1,c2), and (c1,c2,c3). The index can be used for field comparisons that include the above three conditions. However, for example, where c1=a and c3=c, only the index of column c1 can be used. For cases such as c2=b and c3=c, this index cannot be used at all.

(2) Index matching will also be stopped when encountering a range query (>, <, between, like). For example, if c1=a and c2 > 2 and c3=c, only the comparison on columns c1 and c2 can use the index, and the index of the arrangement (c1, c2, c3) may all be used.

(3) The order of the fields in the where condition has nothing to do with the index order. The MySQL optimizer will automatically adjust the order.

6. Prefix Index

(1) When creating an index for a column with a length of more than 30 characters, consider using a prefix index. For example, idx_cs_guid2 (f_cs_guid(26)) means to use the first 26 characters as the index. This can improve search efficiency and save space.

(2) Prefix indexes also have their disadvantages. They cannot be used when ORDER BY or GROUP BY is performed on the column, and they cannot be used as covering indexes.

(3) If you create a prefix index on a binary storage column such as varbinary or blob, you need to consider the character set. The number of bytes in brackets is

7. Reasonable use of covering indexes to reduce IO

In the INNODB storage engine, the secondary index (non-primary key index, also known as auxiliary index, secondary index) does not directly store the row address, but stores the primary key value.
If the user needs to query a data column that is not included in the secondary index, the user needs to first find the primary key value through the secondary index, and then query other data columns through the primary key, so the query needs to be performed twice. A covering index can obtain all required data columns in one index, thus avoiding a secondary search in the table, saving IO and thus being more efficient.

For example, SELECT email, uid FROM user_email WHERE uid=xx. If uid is not the primary key, you can add an index as index(uid, email) to improve performance.

8. Try not to create indexes on frequently updated columns

If you do not create an index on the column with ON UPDATE CURRENT_STAMP defined, the maintenance cost will be too high (fortunately, MySQL has an insert buffer, which will merge the index insertion)

5. SQL Design

1. Avoid directly using SELECT * to read all fields

Even if all fields are needed, network bandwidth consumption is reduced, covering indexes can be effectively used, and changes in table structure have little impact on the program

2. When you are sure that there is only one result returned, use limit 1

On the premise of ensuring that the data is correct and the number of result sets can be determined, use limit more often to return the results as quickly as possible.

3. Be careful with implicit type conversions

(1) Conversion rules

a. When at least one of the two parameters is NULL, the result of the comparison is also NULL. The exception is that when using <=> to compare two NULLs, it returns 1. In both cases, no type conversion is required.

b. Both parameters are strings, and will be compared as strings without type conversion

c. Both parameters are integers, so they are compared as integers without type conversion.

d. When a hexadecimal value is compared to a non-digit number, it is treated as a binary string.

e. If one parameter is TIMESTAMP or DATETIME and the other parameter is a constant, the constant will be converted to timestamp

f. If one parameter is of decimal type, if the other parameter is a decimal or integer, the integer will be converted to decimal for comparison. If the other parameter is a floating point number, the decimal will be converted to a floating point number for comparison.

g. In all other cases, both arguments are converted to floating point numbers before comparison.

(2) If an index is built on the string type, and if this field is compared with an int type value, it complies with Section g. For example, if the type of f_phone is defined as varchar, but f_phone in (098890) is used in where, both parameters will be treated as floating-point types. The occurrence of this implicit conversion is not the worst thing. The worst thing is that after the string is converted to float, MySQL cannot use the index, which leads to performance problems. If f_user_id = '1234567', it meets the requirement of clause b, and the numbers are directly compared as strings.

4. Do not use functions on where condition columns

(1) This will cause the index to become invalid, such as lower(email), f_qq % 4. Can be calculated on the constant on the right

(2) When the returned result set is not very large, you can use functions on the returned columns to simplify program development

5. Use like fuzzy matching, and don’t put % first

This will cause the index to fail. If you have this search requirement, consider other solutions, such as sphinx full-text search

6. When it comes to complex SQL, be sure to refer to the existing index design and explain it first.

(1) Simple SQL splitting, not using code processing complexity as an excuse.

(2) For example, in the OR condition: f_phone='10000' or f_mobile='10000', both fields have indexes, but only one of them can be used. It can be split into 2 SQL statements or union all.

(3) The advantage of explaining first is that you can add more query restrictions to utilize the index.

7. When using join, try to use the index on the same table in the where condition

(1) For example, select t1.a,t2.b * from t1,t2 and t1.a=t2.a and t1.b=123 and t2.c= 4 . If the fields t1.c and t2.c are the same, then the index (b,c) on t1 only uses b. At this time, if you change t2.c=4 in the where condition to t1.c=4, you can use the complete index.

(2) This situation may occur when the field design is redundant (anti-normalization).

(3) Correctly select inner join and left join

8. Use less subqueries and more joins

In versions lower than 5.6, subqueries are very inefficient, unlike Oracle, which calculates the subquery first and then the outer query. Version 5.6 began to be optimized

9. Consider using union all, use union less often, and pay attention to deduplication

(1) union all does not remove duplicates, but it does not require sorting operations, so it is faster than union. If there is no need to remove duplicates, union all is preferred.

(2) If limit is used in the UNION result, add limit to each of the two sub-SQL statements if there may be many return values. If there is still order by, please contact the DBA.

10. The content of IN should not exceed 200 words.

Use batch mode for more than 500 values, otherwise one execution will affect the concurrency of the database, because a single SQL can only and always occupy a single CPU, and may cause master-slave replication delays.

11. Say no to big things

For example, if multiple selects and updates are performed in one transaction, if it is a high-frequency transaction, it will seriously affect the concurrency capability of MySQL, because the locks and other resources held by the transaction can only be released when the transaction is rolled back/committed. But at the same time, we must also weigh the consistency of data writing.

12. Avoid using comparisons like is null and is not null

13.order by .. limit

This type of query is more optimized through indexes, but the order by field is important. For example, if the primary key id and f_time are both in ascending order, you can consider ordering by id instead of f_time.

14.c1 < a order by c2

The difference from the above is that there is a range query before the order by. As can be seen from the previous content, an index like (c1, c2) is not needed, but the (c2, c1) index can be used. It can also be rewritten as a join method.

15. Paging Optimization

It is recommended to use a reasonable paging method to improve paging efficiency. Do not use jump paging in the case of large pages.

If there is a paging statement similar to the following:

SELECT FROM table1 ORDER BY ftime DESC LIMIT 10000,10;

This paging method will cause a lot of IO because MySQL uses a read-ahead strategy.

Recommended paging method:

SELECT FROM table1 WHERE ftime < last_time ORDER BY ftime DESC LIMIT 10

That is, pass in the boundary value of the previous paging

SELECT * FROM table as t1 inner JOIN (SELECT id FROM table ORDER BY time LIMIT 10000,10) as t2 ON t1.id=t2.id

16. count

(1) First, there are differences between count(), count(1), and count(col1). Count() indicates the number of records in the entire result set, while count(1) indicates the number of records in the result set that are counted by primary key. In most cases, count() and count(1) have the same effect, but count(col1) indicates the number of records in the col1 column that are NOT null in the result set. Prefer count()

(2) Counting large amounts of data is a resource-consuming operation and may even slow down the entire database. If the query performance problem cannot be solved, the product design should be restructured. For example, when count queries are frequently needed, consider using a summary table.

(3) In the case of distinct, the group by method may be more efficient.

17. Change delete and update statements to select and then explain

Select operations can only cause database slowness, but write operations are the culprit for locking tables.

18. Reduce the number of interactions with the database and try to use batch SQL statements

(1) INSERT ... ON DUPLICATE KEY UPDATE ...: If the inserted row results in a duplicate value in a UNIQUE index or PRIMARY KEY, the old row is UPDATEd. If there is no duplicate, the old row is directly inserted, affecting one row.

(2) REPLACE INTO is similar, but it deletes old rows when there is a conflict. INSERT IGNORE does the opposite, keeping the old rows and discarding the new rows to be inserted.

(3) INSERT INTO VALUES(),(),(), merge insert.

19. Eliminate dangerous SQL

(1) Remove the meaningless or always true condition "where 1=1". This will be terrible if you encounter update/delete or SQL injection.

(2) DDL statements are not allowed in SQL. Generally, permissions such as create/alter are not granted, but Alibaba Cloud RDS only distinguishes between read and write users.

VI. Code of Conduct

(1) It is not allowed to import live network data without the knowledge of the DBA

(2) For large-scale updates, such as repairing data, avoid peak hours and notify the DBA. Direct execution of SQL is done by operation and maintenance or DBA colleagues

(3) Timely processing of SQL statements for offline services

(4) Complex SQL online review

Because there is currently no SQL review mechanism, complex SQL statements such as multi-table join, count, and group by should be proactively reported to the DBA for evaluation.

(5) The database solution selection and design of important projects must be notified to the DBA in advance

Summarize

The above is the full content of this article. I hope it will be helpful to everyone.

refer to:

MySQL prepare principle detailed explanation

Several important MySQL variables

Detailed explanation of the code between the MySQL master library binlog (master-log) and the slave library relay-log

Thank you for reading, and I hope my friends will support this site!

You may also be interested in:
  • Summary of MySQL usage specifications
  • Super detailed MySQL usage specification sharing
  • Summary of MySQL database usage specifications
  • An experienced person will show you how to develop a professional and standardized MySQL startup script
  • Summary of MySQL development standards and usage skills
  • MySQL database naming standards and conventions
  • Detailed explanation of Mysql table creation and index usage specifications
  • MYSQL database naming and design specifications
  • Professional MySQL development design specifications and SQL writing specifications

<<:  Use of VNode in Vue.js

>>:  Summary of Creating and Using Array Methods in Bash Scripts

Recommend

How to use bar charts in Vue and modify the configuration yourself

1. Import echart in HTML file <!-- Import echa...

How to fix the four sides of the table to scroll up, down, left and right

question: When I was doing project statistics rec...

Uncommon but useful tags in Xhtml

Xhtml has many tags that are not commonly used but...

Why Seconds_Behind_Master is still 0 when MySQL synchronization delay occurs

Table of contents Problem Description Principle A...

Summary of the data storage structure of the nginx http module

Starting from this section, we will explain the i...

How to implement DIV's blur function

Use anti-shake to make DIV disappear when the mou...

Linux swap partition (detailed explanation)

Table of contents linux 1. What is SWAP 2. What d...

Sample code using vue-router in html

Introducing vue and vue-router <script src=&qu...

How to quickly build an LNMP environment with Docker (latest)

Preface Tip: Here you can add the approximate con...

Install Python virtual environment in Ubuntu 18.04

For reference only for Python developers using Ub...

Zabbix's psk encryption combined with zabbix_get value

Since Zabbix version 3.0, it has supported encryp...

An article to understand Linux disks and disk partitions

Preface All hardware devices in the Linux system ...

How to quickly build ELK based on Docker

[Abstract] This article quickly builds a complete...

Detailed example of using useState in react

useState useState adds some internal state to a c...