This is just a statistic of a simple data migration. The amount of data is not large, but the trouble lies in some intermediate steps processing and consideration. There is no content about SQL optimization and index optimization, so please be gentle. backgroundThe number of records in the user's ophthalmology attribute table is about 986w. The purpose is to parse eight fields of the attribute values (json format) of about 29w records into numbers and dump them into records of the statistical table for chart analysis. The following structures and data are mostly made up by me, so don't take them seriously. The structure of the user ophthalmology attribute table is as follows CREATE TABLE `property` ( `id` int(11) NOT NULL AUTO_INCREMENT, `ownerId` int(11) NOT NULL COMMENT 'Record ID or template ID', `ownerType` tinyint(4) NOT NULL COMMENT 'Type. 0: record 1: template', `recorderId` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Recorder ID', `userId` bigint(20) NOT NULL DEFAULT '0' COMMENT 'User ID', `roleId` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Role ID', `type` tinyint(4) NOT NULL COMMENT 'Field type. 0: Text 1: Option 2: Time 3: Image 4: ICD10 9: New Image', `name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Field name', `value` mediumtext NOT NULL COMMENT 'Field value', PRIMARY KEY (`id`), UNIQUE KEY `idxOwnerIdOwnerTypeNameType` (`ownerType`,`ownerId`,`name`,`type`) USING BTREE, KEY `idxUserIdRoleIdRecorderIdName` (`userId`,`roleId`,`recorderId`,`name`) )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Attributes'; Problem Analysis1. The attribute value is in Json format and needs to be processed using Json operation functionsBecause the attribute value is in Json format, as follows. A relatively large Json, but only 8 field values are needed, which are extracted and classified into different statistical indicators. { ...... "sight": { "nakedEye": { "left": "0.9", "right": "0.6" }, "correction": { "left": "1", "right": "1" } }, ...... "axialLength": { "left": "21", "right": "12" }, "korneaRadius": { "left": "34", "right": "33" }, ...... } Therefore, you need to use the Json operation function: json_extract(value,'$.key1.key2'). But it should be noted that the value extracted by this function is enclosed in "". For example, the result of executing json_extract(value,'$.sight.nakedEye.left') on the above record is "22"; it is also possible that the field value is an empty string, in which case the result is "". Therefore, you need to use the replace function to delete the "" in the result. The final expression for extracting the field is: replace(json_extract(value,'$.sight.nakedEye.left'),'"',''). The result is NULL if the field does not exist; either the outer sight does not exist, or the inner left does not exist. 2. The field content is not standardized and messyIdeally, all the numbers filled in are standardized, so after the above step, they can be extracted and directly imported into the new table. However, the reality is cruel, and the things filled in are a mess. for example:
There is no other way but to look for product and business matches. Fortunately, there are not many, just over 4,000. You can get an idea by just taking a quick look. The following solutions were obtained:
How to do it specifically? Step 1: Exclude normal numeric data and empty data WHERE `nakedEyeLeft` REGEXP '[^0-9.]' = 1 // This can already exclude null AND `nakedEyeLeft` != '' Step 2: If it does not contain a number, set it to NULL or an empty string SET nakedEyeLeft = IF(nakedEyeLeft NOT regexp '[0-9]', '', nakedEyeLeft) Step 3: Extract the first value of the data starting with a number SET nakedEyeLeft = IF((nakedEyeLeft + 0 = 0), nakedEyeLeft, nakedEyeLeft + 0) Combined together, SET nakedEyeLeft = IF(nakedEyeLeft NOT regexp '[0-9]''', '', IF((nakedEyeLeft + 0 = 0), nakedEyeLeft, nakedEyeLeft + 0)) WHERE `nakedEyeLeft` REGEXP '[^0-9.]' = 1 // This can already exclude null AND `nakedEyeLeft` != '' PS: The SQL for processing one field looks simple, but because 8 fields are processed at a time in batches, the combined SQL becomes very long. Be careful not to enter the wrong field. The last one left is the fourth category: mixed text and numbers, with more than 40 items. Some of them look simple and can be processed automatically using regular expressions, such as <1 or less than 1. The growth value of the record needs to be calculated by finding the last record: an increase of 10 compared to the last record. The remaining ones are a bit complicated and need human processing to extract usable data, such as BD234/KD23 I wonder if everyone who sees this also finds this a bit troublesome? I thought I had done it with gritted teeth, but the business said to just process it as 0. If I find out it is 0 later, I can save it again through the page. There is no need to determine whether it starts with a number, just add + 0; if it starts with a number, the first number will be retained; otherwise = 0. The final data format SQL is: UPDATE property SET nakedEyeLeft = IF(nakedEyeLeft NOT regexp '[0-9]''', '', nakedEyeLeft + 0) WHERE `nakedEyeLeft` REGEXP '[^0-9.]' = 1 // This can already exclude null AND `nakedEyeLeft` != ''; 3. It takes too long to extract content and format. There are still 9 million+ records.The property table has more than 9 million data items, but the only known conditions for the required records are name, ownerType, and type, and there is no way to match the existing index. If you search directly, you will have to scan the entire table, extract and format data; what's more, you will also need to associate other tables and supplement some other fields of statistical indicators. In this case, if the statistical table is imported directly, the result is that the two tables + the related table will be locked together for a long time, during which time no changes or insertions can be made, which is not very realistic. Reduce the number of scan lines Method 1: Add indexes to name, ownerType, and type to reduce the number of scanned records to 200,000. But the problem is that after adding indexes to 9 million data, the indexes need to be deleted (because they are not required by business conditions), which will lead to two fluctuations; Coupled with the subsequent processing time of locking the table, the problem is still very serious. Method 2: Use a table with fewer records as the driving table, which can be associated with the target table. CREATE TABLE `property` ( `ownerId` int(11) NOT NULL COMMENT 'Record ID or template ID', `ownerType` tinyint(4) NOT NULL COMMENT 'Type. 0: record 1: template', `type` tinyint(4) NOT NULL COMMENT 'Field type. 0: Text 1: Option 2: Time 3: Image 4: ICD10 9: New Image', `name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Field name', `value` mediumtext NOT NULL COMMENT 'Field value', Omit other fields UNIQUE KEY `idxOwnerIdOwnerTypeNameType` (`ownerType`,`ownerId`,`name`,`type`) USING BTREE )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Attributes'; The ownerId in the table can be associated with the record table, plus the previous conditions name, ownerType, type, so it just hits ``idxOwnerIdOwnerTypeNameType (ownerType,ownerId,name,type). CREATE TABLE `medicalrecord` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) NOT NULL DEFAULT '' COMMENT 'Record name', `type` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Record type. ', Omit other fields KEY `idxName` (`name`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='record'; The record table can hit the index idxName through name='ophthalmology record', and the number of scanned rows is only 20,000, plus the attribute table 290,000, and the final number of scanned rows is only about 300,000, which is 30 times less than the full table scan of the attribute table! ! ! . Avoid table lock time for data extraction and formatting Because there are 8 fields, each field needs to be extracted and formatted, and judgment is required in the middle. In this way, the same extraction and formatting operations in one SQL statement must be executed multiple times. Therefore, in order to avoid such problems, an intermediate table is needed to temporarily store the extraction and formatting results. CREATE TABLE `propertytmp` ( `id` int(11) NOT NULL AUTO_INCREMENT, `value` mediumtext NOT NULL COMMENT 'Field value', `nakedEyeLeft` varchar(255) DEFAULT NULL COMMENT 'Vision - Naked Eye - Left Eye', `nakedEyeRight` varchar(255) DEFAULT NULL COMMENT 'Vision - Naked Eye - Right Eye', `correctionLeft` varchar(255) DEFAULT NULL COMMENT 'Vision-Correction-Left Eye', `correctionRight` varchar(255) DEFAULT NULL COMMENT 'Vision-Correction-Right Eye', `axialLengthLeft` varchar(255) DEFAULT NULL COMMENT 'Axial length - left eye', `axialLengthRight` varchar(255) DEFAULT NULL COMMENT 'Axial length - right eye', `korneaRadiusLeft` varchar(255) DEFAULT NULL COMMENT 'Corneal curvature - left eye', `korneaRadiusRight` varchar(255) DEFAULT NULL COMMENT 'Corneal curvature - right eye', `updated` datetime NOT NULL COMMENT 'Update time', `deleted` tinyint(1) NOT NULL DEFAULT '0', PRIMARY KEY (`id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; First import the data into the table, extract it based on this, and then format it. Comparison of final execution results Data import comparisonResults: Full table scan, attribute table import into intermediate table (40s), attribute table new index + import (6s + 3s), association import (1.4s). Because it needs to be associated with other tables, it is not as ideal as expected. Intermediate table data extraction: 7.5s UPDATE `propertytmp` SET nakedEyeLeft = REPLACE(json_extract(value,'$.sight.axialLength.left'),'"',''), nakedEyeLeft = REPLACE(json_extract(value,'$.sight.nakedEye.left'),'"',''), nakedEyeRight = REPLACE(json_extract(value,'$.sight.nakedEye.right'),'"',''), correctionLeft = REPLACE(json_extract(value,'$.sight.correction.left'),'"',''), correctionRight = REPLACE(json_extract(value,'$.sight.correction.right'),'"',''), axialLengthLeft = REPLACE(json_extract(value,'$.axialLength.left'),'"',''), axialLengthRight = REPLACE(json_extract(value,'$.axialLength.right'),'"',''), korneaRadiusLeft = REPLACE(json_extract(value,'$.korneaRadius.left'),'"',''), korneaRadiusRight = REPLACE(json_extract(value,'$.korneaRadius.right'),'"',''); Intermediate table data formatting: 2.3s Regularization is faster than I thought. UPDATE propertytmp SET nakedEyeLeft = IF(nakedEyeLeft NOT REGEXP '[0-9]' AND nakedEyeLeft != '', '', nakedEyeLeft + 0), nakedEyeRight = IF(nakedEyeRight NOT REGEXP '[0-9]' AND nakedEyeRight != '', '', nakedEyeRight + 0), correctionLeft = IF(correctionLeft NOT REGEXP '[0-9]' AND correctionLeft != '', '', correctionLeft + 0), correctionRight = IF(correctionRight NOT REGEXP '[0-9]' AND correctionRight != '', '', correctionRight + 0), axialLengthLeft = IF(axialLengthLeft NOT REGEXP '[0-9]' AND axialLengthLeft != '', '', axialLengthLeft + 0), axialLengthRight = IF(axialLengthRight NOT REGEXP '[0-9]' AND axialLengthRight != '', '', axialLengthRight + 0), korneaRadiusLeft = IF(korneaRadiusLeft NOT REGEXP '[0-9]' AND korneaRadiusLeft != '', '', korneaRadiusLeft + 0), korneaRadiusRight = IF(korneaRadiusRight NOT REGEXP '[0-9]' AND korneaRadiusRight != '', '', korneaRadiusRight + 0) WHERE (`nakedEyeLeft` REGEXP '[^0-9.]' = 1 AND `nakedEyeLeft` != '') OR (`nakedEyeRight` REGEXP '[^0-9.]' = 1 AND `nakedEyeRight` != '') OR (`correctionLeft` REGEXP '[^0-9.]' = 1 AND `correctionLeft` != '') OR (`correctionRight` REGEXP '[^0-9.]' = 1 AND `correctionRight` != '') OR (`axialLengthLeft` REGEXP '[^0-9.]' = 1 AND `axialLengthLeft` != '') OR (`axialLengthRight` REGEXP '[^0-9.]' = 1 AND `axialLengthRight` != '') OR (`korneaRadiusLeft` REGEXP '[^0-9.]' = 1 AND `korneaRadiusLeft` != '') OR (`korneaRadiusRight` REGEXP '[^0-9.]' = 1 AND `korneaRadiusRight` != ''); Statistical indicators intermediate table Because when actually importing the statistical indicator table, you also need to exclude empty data and associate other tables for supplementation. In order to reduce the impact on the indicator table, an intermediate table of the indicator table was created with the same structure and an ID auto-increment of the target table + 10000. Import the data from the attribute intermediate table into the indicator intermediate table, and then directly INSERT ... SELECT FROM, which will be very fast. Of course, this step is actually a bit of an overreaction, but in order to avoid some fluctuations online, it is better to be cautious. SummarizeThis is a simple record of data migration experience. There is no content about index optimization or SQL optimization, I just think that everyone needs to pay attention to performance and consider the impact on users. This is the end of this article about extracting MySQL Json internal fields and dumping them as numbers. For more related MySQL extracting Json dumps as numbers, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: CSS implements the bottom tapbar function
>>: Div picture marquee seamless connection implementation code
mycli MyCLI is a command line interface for MySQL...
Most people have heard of the concept of server-s...
I recently used the input size and maxlength attri...
Floating ads are a very common form of advertisin...
In MySQL, there is a function called "group_...
I searched the entire web and found all kinds of ...
Some time ago, I needed to use pip downloads freq...
Table of contents Primary key constraint Unique p...
MTR stands for Mini-Transaction. As the name sugg...
Currently encountering such a problem My situatio...
Table of contents 1. Problematic SQL statements S...
1. What is Continuous Delivery The software produ...
MySQL download address: https://obs.cn-north-4.my...
Core code <!DOCTYPE html> <html lang=&qu...
Method 1: Use the SET PASSWORD command First log ...