We are not discussing PHP, JSP or .NET environment here. We look at the problem from the perspective of architecture. The implementation language is not a problem. The advantage of language lies in implementation rather than quality. No matter what language you choose, architecture is something you must face. 1. Processing of massive data As we all know, for some relatively small sites, the amount of data is not very large, select and update can solve the problems we face. The load itself is not very large, and it can be solved by adding a few indexes at most. For large websites, the daily data volume may be millions. If a many-to-many relationship is not well designed, there will be no problem in the early stage, but as the number of users increases, the amount of data will increase exponentially. At this time, the cost of selecting and updating a table (not to mention multi-table joint queries) is very high. 2. Data concurrency processing At some point, the CTO of 2.0 has a magic sword, which is cache. For caching, it is also a big problem when there is high concurrency and high processing. The cache is globally shared across the entire application. However, when we make changes, if two or more requests simultaneously request an update of the cache, the application will crash. At this time, a good data concurrency processing strategy and caching strategy are needed. In addition, there is the problem of database deadlock. We may not feel it normally, but the probability of deadlock is very high under high concurrency conditions, and disk cache is a big problem. 3. File storage issues For some 2.0 sites that support file uploads, while we are thankful that hard disk capacity is getting larger and larger, we should consider more about how files should be stored and effectively indexed. A common solution is to store files by date and type. However, when the file volume is massive, if a hard disk stores 500G of trivial files, then the disk's Io will be a huge problem during maintenance and use. Even if your bandwidth is sufficient, your disk may not respond. If uploading is involved at this time, the disk will easily be over. Perhaps using RAID and dedicated storage servers can solve the current problem, but there is also the problem of access from different locations. Maybe our server is in Beijing, but how can we solve the access speed in Yunnan or Xinjiang-Tibet? If we do distributed storage, how should we plan our file index and architecture? So we have to admit that file storage is a very difficult problem. 4. Processing of data relationships We can easily plan a database that conforms to the third normal form, which is full of many-to-many relationships and can use GUID to replace INDENTIFY COLUMN. However, in the 2.0 era filled with many-to-many relationships, the third normal form is the first one that should be abandoned. It is necessary to effectively minimize multi-table joint queries. 5. Data indexing issues As we all know, indexing is the most convenient, cheapest, and easiest way to improve database query efficiency. However, in the case of high UPDATE, the cost of update and delete will be unimaginably high. I have encountered a situation where it takes 10 minutes to complete the update of a focused index. For the site, this is basically unbearable. Indexing and updating are natural enemies. Problems A, D, and E are issues that we have to consider when doing architecture, and they may also be the issues that take up the most time. 6. Distributed processing For 2.0 websites, due to their high interactivity, the effect of CDN implementation is basically 0. The content is updated in real time and we handle it in a routine manner. In order to ensure the access speed in various places, we need to face a huge problem, that is, how to effectively realize data synchronization and update. Real-time communication between servers in various places is an issue that must be considered. 7. Analysis of the pros and cons of Ajax AJAX is the reason for both success and failure. AJAX has become the mainstream trend, and suddenly we find that post and get based on XMLHTTP are so easy. The client gets or posts data to the server, and the server returns the data after receiving the data request. This is a normal AJAX request. However, when processing AJAX, if we use a packet capture tool, the data return and processing will be clear at a glance. For some AJAX requests that require a lot of computation, we can build a packet dispatcher that can easily kill a webserver. 8. Analysis of data security For the HTTP protocol, data packets are transmitted in plain text. Maybe we can say that we can use encryption, but for the G problem, the encryption process may be plain text (for example, we know that QQ can easily determine its encryption and effectively write an encryption and decryption method that is the same as it). When your site traffic is not very large, no one will care about you, but when your traffic increases, the so-called plug-ins and the so-called mass messages will follow one after another (you can see the clues from the mass messages of QQ at the beginning). Perhaps we can say with satisfaction that we can use higher-level judgment or even HTTPS to achieve this. Please note that when you do these processes, you will pay a huge amount of database, IO and CPU costs. For some mass messages, it is basically impossible. The author can already realize group messaging on Baidu Space and QQ Space. If everyone is willing to give it a try, it’s actually not that difficult. 9. Data synchronization and cluster processing issues When one of our database servers is overwhelmed, we need to do database-based load and clustering. This may be the most troubling problem. Data transmission based on the network may cause data delay depending on the design of the database. This is a terrible and inevitable problem. In this case, we need to use other means to ensure effective interaction within the delay of a few seconds or even a few minutes. For example, data hashing, segmentation, content processing and other issues. 10. Data sharing channels and OPENAPI trends OpenAPI has become an inevitable trend. From Google, Facebook, Myspace to domestic campuses, everyone is considering this issue. It can retain users more effectively and stimulate more interest among users, and allow more people to help you do the most effective development. At this time, an effective data sharing platform and a data open platform become indispensable. Ensuring the security and performance of data under open interfaces is another issue that we must seriously consider. |
<<: Detailed process of zabbix monitoring process and port through agent
>>: The implementation code of the CSS3 input box is similar to the animation effect of Google login
Brief description <br />In IE6 and 7, in a ...
Preface Recently, I was analyzing the startup pro...
Recently, a database in the production environmen...
If someone asked you whether running EXPLAIN on a...
This article uses an example to describe how to c...
Scenario 1. Maintain a citizen system with a fiel...
Since the introduction of the contentEditable attr...
Table of contents Overall Effect Listen for conta...
First create a specific project directory for you...
Detailed example of removing duplicate data in My...
TeamCenter12 enters the account password and clic...
Phenomenon The system could compile the Linux sys...
When MySQL performs DDL operations such as alter ...
1. Development environment vue 2. Computer system...
Requirement: Celery is introduced in Django. When...