Million-level data processing solutionData storage structure designTable field design
Index design
Query statement optimization
Ten million level data processing solutionData storage structure designAt this stage, the data itself has great value. In addition to meeting regular business needs, there will also be some needs for data analysis. At this time, the data variability is not high, and basically no modification of the original structure is considered. Generally, optimization is considered from three aspects: partitioning, table sharding, and database sharding: Partition:
Sub-table: Sub-tables are divided into horizontal sub-tables and vertical sub-tables. Horizontal table partitioning is splitting a table into smaller tables with the same data structure, such as table1, table2, etc., to alleviate the database read and write pressure. Vertical table partitioning is to separate some fields to form a new table. The data structures of each table are different, which can optimize the situation of locking the table under high concurrency. As you can imagine, if you want to split the table, the logic of the program needs to be modified. Therefore, generally in the early stages of the project, when you foresee a large amount of data, you will consider splitting the table. It is not recommended to divide the table in the later stage because the cost is very high. Sub-library: The sub-database is generally in master-slave mode. A database server master node is replicated to multiple databases of one or more slave nodes. The master database is responsible for write operations and the slave database is responsible for read operations, thereby achieving optimization goals such as master-slave separation, high availability, and data backup. Of course, the master-slave mode also has some defects, such as master-slave synchronization delay, problems caused by too large binlog files, etc. I will not go into details here (the author can't learn it anymore). other: Hot and cold meters isolated. For historical data, if there are few people querying and using it, it can be moved to another cold database and only provided for query to alleviate the large amount of data in the hot table. Database table primary key designFor database primary key design, I personally recommend a self-increasing numeric ID with a time attribute. (Distributed self-increment ID generation algorithm)
Why use these algorithms? This is related to the MySQL data storage structure. From a business perspective: When designing a database, you don't need to think about which field to set as the primary key. Then, these fields are only unique in theory. For example, if the book number is used as the primary key, the book number is only unique in theory, but in practice, duplicates may occur. Therefore, it is better to set an auto-increment ID that is not related to the business as the primary key, and then add a unique constraint for the book number. Technically speaking: 1. If the table uses an auto-increment primary key, each time a new record is inserted, the record will be added sequentially to the subsequent position of the current index node. When a page is full, a new page will be automatically opened. In general, it can improve the performance of queries and insertions. 2. For InnoDB, the primary key index stores both the index value and the row data in the leaf node, which means that the data file itself stores data in the b+ tree format. 3. If no primary key is defined, a non-empty UNIQUE key will be used as the primary key; if there is no non-empty UNIQUE key, the system generates a 6-byte rowid as the primary key; in a clustered index, N rows form a page (a page is usually 16K in size). If irregular data is inserted, in order to maintain the balance of the B+ tree, frequent page splits and page rotations will occur, and the insertion speed will be slower. Therefore, the primary key value of the clustered index should be a continuously increasing value rather than a random value (do not use a random string or UUID). 4. Therefore, for the primary key of InnoDB, try to use an integer, and an increasing integer. This is very efficient in both storage and query. MySQL Interview QuestionsMySQL database tens of millions of data query optimization solutionThe later the limit paging query is, the slower the query will be. This also leads us to a conclusion: 1. The query time of the limit statement is proportional to the position of the starting record. 2. The MySQL limit statement is very convenient, but it is not suitable for direct use for tables with many records. The table uses InnoDB as the storage engine, id as the auto-increment primary key, and the default primary key index SELECT id FROM test LIMIT 9000000,100; There are currently two optimization solutions: using id as the query condition using a subquery and using join. 1. id>= (subquery) form implementation select * from test where id >= (select id from test limit 9000000,1)limit 0,100 Use the join form; SELECT * FROM test a JOIN (SELECT id FROM test LIMIT 9000000,100) b ON a.id = b.id The time taken to use these two types of optimized queries is relatively close. In fact, they both use the same principle, so the effects are similar. But I personally recommend using join and minimizing the use of subqueries. Note: Currently, the query is at the tens of millions level. If it is increased to the millions level, the speed will be faster. SELECT * FROM test a JOIN (SELECT id FROM test LIMIT 1000000,100) b ON a.id = b.id Which MySQL storage engines have you used?What are their characteristics and differences?This is a question often asked during senior developer interviews. In fact, we often encounter this in our daily development. There are so many MySQL storage engines, but the ones we use most often are InnoDB and MyISAM. So if the interviewer asks what storage engines MySQL has, you only need to tell the two commonly used ones. So what are their characteristics and differences?MyISAM: The default table type, which is based on the traditional ISAM type. ISAM is the abbreviation of Indexed Sequential Access Method, which is a standard method for storing records and files. It is not transaction-safe and does not support foreign keys. If a large number of selects are performed, insert MyISAM is more suitable. InnoDB: An engine that supports transaction security. Its biggest feature is that it supports foreign keys, row locks, and transactions. If there are a large number of updates and inserts, it is recommended to use InnoDB, especially for multiple concurrent and high QPS situations. Note: In versions prior to MySQL 5.5, the default search engine is MyISAM. In versions after MySQL 5.5, the default search engine is changed to InnoDB. Differences between MyISAM and InnoDB1. InnoDB supports transactions, but MyISAM does not. For InnoDB, each SQL statement is encapsulated into a transaction by default and automatically committed, which will affect the speed. Therefore, it is best to put multiple SQL statements between begin and commit to form a transaction. 2.InnoDB supports foreign keys, while MyISAM does not. 3. InnoDB is a clustered index and uses B+Tree as the index structure. The data file is bound to the (primary key) index (the table data file itself is an index structure organized by B+Tree). There must be a primary key, and the efficiency of the primary key index is very high. MyISAM is a non-clustered index and also uses B+Tree as the index structure. The index and data files are separate, and the index saves the pointer to the data file. The primary key index and secondary indexes are independent. 4. InnoDB does not save the specific number of rows in the table, and the entire table needs to be scanned when executing select count(*) from table. MyISAM uses a variable to save the number of rows in the entire table. When executing the above statement, you only need to read the variable, which is very fast. 5. Innodb does not support full-text indexing, while MyISAM does. MyISAM has higher query efficiency. InnoDB after 5.7 supports full-text indexing. 6. InnoDB supports table and row-level locks (default), while MyISAM supports table-level locks. ; 7. InnoDB tables must have a primary key (if the user does not specify one, it will find or generate one itself), while Myisam does not have one. 8.Innodb storage files are frm and ibd, while Myisam is frm, MYD, and MYI. 9.Innodb: frm is the table definition file, ibd is the data file. 10.Myisam: frm is the table definition file, myd is the data file, and myi is the index file. Optimization of MySQL complex query statementsWhen it comes to complex SQL optimization, most of the time, it is due to multi-table associations that cause a large number of complex SQL statements. So how should we optimize this kind of SQL? There are actually routines for optimization, and we just need to follow the routines. Complex SQL optimization solution: 1. Use the EXPLAIN keyword to check SQL. EXPLAIN can help you analyze the performance bottlenecks of your query statements or table structures. The query results of EXPLAIN will also tell you how your index primary key is used, how your data table is searched and sorted, whether there is a full table scan, etc. 2. Try to use index fields for query conditions. If a table has multiple conditions, try to use composite index queries. When using composite indexes, pay attention to the order of the fields. 3. Use join as much as possible to associate multiple tables and reduce the use of subqueries. If the associated fields of the table can use the primary key, use the primary key, that is, use the index field as much as possible. If the associated field is not an index field, you can consider adding an index based on the situation. 4. Try to use limit for paging batch query, and do not retrieve all at once. 5. Absolutely avoid using select *, try to select specific required fields, and reduce the query of unnecessary fields; 6. Try to convert or to union all. 7. Try to avoid using is null or is not null. 8. Pay attention to the use of like. Pre-blur and full-blur will not use the index. 9. Try to minimize the use of functions in the query field after Where, because functions will cause index failure. 10. Avoid using not equal to (!=) because it does not use the index. 11. Use exists instead of in, and not exists instead of not in, which will be more efficient; 12. Avoid using the HAVING clause. HAVING will only filter the result set after all records are retrieved, which requires sorting, totaling, and other operations. If you can limit the number of records through the WHERE clause, you can reduce this overhead. 13. Never ORDER BY RAND() The above is a detailed explanation of the query and storage of tens of millions of data in the MySQL database. For more information about the query and storage of tens of millions of data in the MySQL database, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Detailed analysis of compiling and installing vsFTP 3.0.3
>>: Some key points of website visual design
When learning about inline-block, I found that the...
Recently, a new requirement "front-end cache...
This article example shares the specific code of ...
1. Introduction Is it considered rehashing old st...
The first point to make is that people can judge t...
1. Overview Users expect the web applications the...
Preface If we want to achieve the effect of onlin...
Install and configure the MySql database system. ...
Table of contents Overall Effect Listen for conta...
According to Chinese custom, we are still celebra...
Table of contents Preface 1. What is a lock? 2. L...
This article shares the data display code for Jav...
This article example shares the specific code of ...
As we all know, the CSS position absolute is set ...
There are two ways to install nodejs in linux. On...