10 issues that must be considered when designing and building large-scale website architecture

10 issues that must be considered when designing and building large-scale website architecture

We are not discussing PHP, JSP or .NET environment here. We look at the problem from the perspective of architecture. The implementation language is not a problem. The advantage of language lies in implementation rather than quality. No matter what language you choose, architecture is something you must face.

1. Processing of massive data

As we all know, for some relatively small sites, the amount of data is not very large, select and update can solve the problems we face. The load itself is not very large, and it can be solved by adding a few indexes at most. For large websites, the daily data volume may be millions. If a many-to-many relationship is not well designed, there will be no problem in the early stage, but as the number of users increases, the amount of data will increase exponentially. At this time, the cost of selecting and updating a table (not to mention multi-table joint queries) is very high.

2. Data concurrency processing

At some point, the CTO of 2.0 has a magic sword, which is cache. For caching, it is also a big problem when there is high concurrency and high processing. The cache is globally shared across the entire application. However, when we make changes, if two or more requests simultaneously request an update of the cache, the application will crash. At this time, a good data concurrency processing strategy and caching strategy are needed.

In addition, there is the problem of database deadlock. We may not feel it normally, but the probability of deadlock is very high under high concurrency conditions, and disk cache is a big problem.

3. File storage issues

For some 2.0 sites that support file uploads, while we are thankful that hard disk capacity is getting larger and larger, we should consider more about how files should be stored and effectively indexed. A common solution is to store files by date and type. However, when the file volume is massive, if a hard disk stores 500G of trivial files, then the disk's Io will be a huge problem during maintenance and use. Even if your bandwidth is sufficient, your disk may not respond. If uploading is involved at this time, the disk will easily be over.

Perhaps using RAID and dedicated storage servers can solve the current problem, but there is also the problem of access from different locations. Maybe our server is in Beijing, but how can we solve the access speed in Yunnan or Xinjiang-Tibet? If we do distributed storage, how should we plan our file index and architecture?

So we have to admit that file storage is a very difficult problem.

4. Processing of data relationships

We can easily plan a database that conforms to the third normal form, which is full of many-to-many relationships and can use GUID to replace INDENTIFY COLUMN. However, in the 2.0 era filled with many-to-many relationships, the third normal form is the first one that should be abandoned. It is necessary to effectively minimize multi-table joint queries.

5. Data indexing issues

As we all know, indexing is the most convenient, cheapest, and easiest way to improve database query efficiency. However, in the case of high UPDATE, the cost of update and delete will be unimaginably high. I have encountered a situation where it takes 10 minutes to complete the update of a focused index. For the site, this is basically unbearable.

Indexing and updating are natural enemies. Problems A, D, and E are issues that we have to consider when doing architecture, and they may also be the issues that take up the most time.

6. Distributed processing

For 2.0 websites, due to their high interactivity, the effect of CDN implementation is basically 0. The content is updated in real time and we handle it in a routine manner. In order to ensure the access speed in various places, we need to face a huge problem, that is, how to effectively realize data synchronization and update. Real-time communication between servers in various places is an issue that must be considered.

7. Analysis of the pros and cons of Ajax

AJAX is the reason for both success and failure. AJAX has become the mainstream trend, and suddenly we find that post and get based on XMLHTTP are so easy. The client gets or posts data to the server, and the server returns the data after receiving the data request. This is a normal AJAX request. However, when processing AJAX, if we use a packet capture tool, the data return and processing will be clear at a glance. For some AJAX requests that require a lot of computation, we can build a packet dispatcher that can easily kill a webserver.

8. Analysis of data security

For the HTTP protocol, data packets are transmitted in plain text. Maybe we can say that we can use encryption, but for the G problem, the encryption process may be plain text (for example, we know that QQ can easily determine its encryption and effectively write an encryption and decryption method that is the same as it). When your site traffic is not very large, no one will care about you, but when your traffic increases, the so-called plug-ins and the so-called mass messages will follow one after another (you can see the clues from the mass messages of QQ at the beginning). Perhaps we can say with satisfaction that we can use higher-level judgment or even HTTPS to achieve this. Please note that when you do these processes, you will pay a huge amount of database, IO and CPU costs. For some mass messages, it is basically impossible. The author can already realize group messaging on Baidu Space and QQ Space. If everyone is willing to give it a try, it’s actually not that difficult.

9. Data synchronization and cluster processing issues

When one of our database servers is overwhelmed, we need to do database-based load and clustering. This may be the most troubling problem. Data transmission based on the network may cause data delay depending on the design of the database. This is a terrible and inevitable problem. In this case, we need to use other means to ensure effective interaction within the delay of a few seconds or even a few minutes. For example, data hashing, segmentation, content processing and other issues.

10. Data sharing channels and OPENAPI trends

OpenAPI has become an inevitable trend. From Google, Facebook, Myspace to domestic campuses, everyone is considering this issue. It can retain users more effectively and stimulate more interest among users, and allow more people to help you do the most effective development. At this time, an effective data sharing platform and a data open platform become indispensable. Ensuring the security and performance of data under open interfaces is another issue that we must seriously consider.

<<:  Detailed process of zabbix monitoring process and port through agent

>>:  The implementation code of the CSS3 input box is similar to the animation effect of Google login

Recommend

Detailed graphic tutorial on installing Ubuntu 20.04 dual system on Windows 10

win10 + Ubuntu 20.04 LTS dual system installation...

Description of the default transaction isolation level of mysql and oracle

1. Transaction characteristics (ACID) (1) Atomici...

Table shows the border code you want to display

Common properties of tables The basic attributes ...

Some improvements in MySQL 8.0.24 Release Note

Table of contents 1. Connection Management 2. Imp...

Detailed explanation of 10 common HTTP status codes

The HTTP status code is a 3-digit code used to in...

Detailed analysis of MySQL master-slave delay phenomenon and principle

1. Phenomenon In the early morning, an index was ...

Solve the problem of garbled data in MySQL database migration

Under the instructions of my leader, I took over ...

React and Redux array processing explanation

This article will introduce some commonly used ar...

JavaScript Interview: How to implement array flattening method

Table of contents 1 What is array flattening? 2 A...

Nginx configuration file detailed explanation and optimization suggestions guide

Table of contents 1. Overview 2. nginx.conf 1) Co...

Detailed explanation of JQuery selector

Table of contents Basic selectors: Level selector...

In-depth analysis of Linux NFS mechanism through cases

Continuing from the previous article, we will cre...

Detailed steps for using AES.js in Vue

Use of AES encryption Data transmission encryptio...