Being an operation and maintenance engineer is a very hard job in the early stages. During this period, you may be doing tasks such as repairing computers, cutting network cables, and moving machines, which makes you seem to have no status! Your time is also fragmented, with all kinds of trivial matters surrounding you. It is difficult to reflect your personal value, and you gradually become confused about the industry and feel that there is no future for development. These boring tasks can indeed make people feel deprived. From a technical point of view, these are actually basic skills, which will invisibly bring certain help to the later operation and maintenance work, because I have also been through this and can deeply understand it. Therefore, during this period, you must maintain a positive attitude and continue learning. I believe I will repay you one day in the future! Okay, let’s get to the point. Based on my many years of experience in operation and maintenance, I will share with you the learning path for senior operation and maintenance engineers. primary 1. Linux Basics: In the initial stage, you need to be familiar with Linux/Windows operating system installation, directory structure, startup process, etc. 2. System Management Mainly learn the Linux system. In the production environment, most of the work is completed in the character interface, so you need to master dozens of commonly used basic management commands, including user management, disk partitioning, software package management, file permissions, text processing, process management, performance analysis tools, etc. 3. Network Basics You must be familiar with the OSI and TCP/IP models. You need to know the basic concepts and implementation principles of switches and routers. 4. Shell script programming basics Master the basic syntax structure of Shell and be able to write simple scripts. intermediate 1. Network services The most commonly used network services must be deployed, such as vsftp, nfs, samba, bind, dhcp, etc. A code version management system is indispensable. You can learn the mainstream SVN and GIT and be able to deploy and use them simply. Data is often transferred between servers, so you need to be able to use: rsync and scp. Data synchronization: inotify/sersync. To complete some repetitive tasks, you can write them into scripts and run them regularly, so you need to be able to configure the scheduled task service crond under Linux. 2. Web Services Almost every company has a website, and to make the website run, it is necessary to build a web service platform. If it is developed in PHP language, usually LAMP or LNMP website platform is built. This is the spelling of a combination of technical terms. Separately speaking, it means that you need to be able to deploy Apache, Nginx, MySQL and PHP. If it is developed in JAVA language, Tomcat is usually used to run the project. In order to increase the access speed, Nginx can be used as a reverse proxy for Tomcat. Nginx processes static pages and Tomcat processes dynamic pages to achieve dynamic and static separation. It is not as simple as deployment. You also need to know the working principle of HTTP protocol and simple performance tuning. 3. Database The database chosen is MySQL, which is the most widely used open source database in the world. It’s definitely a good idea to learn it! You also need to know some simple SQL statements, user management, common storage engines, and database backup and recovery. If you want to go deeper, you must know master-slave replication, performance optimization, and mainstream cluster solutions: MHA, MGR, etc. NoSQL is so popular that of course it is indispensable. Just learn Redis and MongoDB. 4. Security Security is very important. Don't wait until the system is hacked before making security policies. It will be too late at this point! Therefore, when a server is online, a security access control policy should be implemented immediately, such as using iptables to limit access to only trusted source IPs, and shutting down some useless services and ports. You must know some common attack types, otherwise how can you prescribe the right remedy! For example, CC, DDOS, ARP, etc. 5. Monitoring system Monitoring is essential and is a life-saving straw for timely detection and tracing of problems. You can choose to learn the mainstream Zabbix open source monitoring system, which is rich in functions and can meet basic monitoring needs. Monitoring points include basic server resources, interface status, service performance, PV/UV, logs, etc. You can also make a dashboard to display some real-time key data, such as Grafana, which will be very cool. 6. Advanced Shell Script Programming Shell scripts are a powerful tool for Linux to automatically complete tasks. You must be proficient in writing them, so you need to further learn functions, arrays, signals, sending emails, etc. The three musketeers of text processing (grep, sed, awk) are really good. You can rely on them for text processing under Linux. 7. Python development basics Shell scripts can only complete some basic tasks. If you want to complete more complex tasks, such as calling APIs, multi-process, etc. You need to learn a high-level language. Python is the most widely used language in the field of operation and maintenance. It is simple and easy to use. You can’t go wrong by learning it! At this stage, you only need to master the basics, such as basic grammatical structure, file object operations, functions, iterative objects, exception handling, sending emails, database programming, etc. upscale 1. Web Static Cache Users often complain that website access is slow, but the server resources are still abundant! Slow website access may not be caused by saturated server resources. There are many influencing factors, such as the network, the number of forwarding layers, etc. For the network, there are north-south communication problems, and access between them will be slow. This can be solved by using CDN, while caching static pages and intercepting requests at the top level response as much as possible to reduce backend request and response time. If you don’t use CDN, you can also use cache services such as Squid, Varnish, and Nginx to cache static pages and place them at the traffic entrance. 2. Cluster After all, a single server has limited resources and cannot support high traffic. The most critical technology to solve this problem is to use a load balancer to horizontally expand multiple Web servers and provide services to the outside world at the same time, which can multiply the performance. The mainstream open source technologies for load balancers include LVS, HAProxy, and Nginx. Be sure to get familiar with one or two! The performance bottleneck of the Web server has been solved. The database is more critical. We should use a cluster. For example, we can use a master-slave architecture for MySQL. On this basis, read and write are separated. The master is responsible for writing, and multiple slaves are responsible for reading. The slave database can be expanded horizontally. A four-layer load balancer is added in front to carry tens of millions of PVs. It is perfect! You also need to know how to use high-availability software to avoid single points of attack. The mainstream ones include Keepalived, Heartbeat, etc. Why are there so many pictures on the website? NFS shared storage cannot support it, and the processing is very slow. It’s easy to deal with! Distributed file system, parallel processing of tasks, no single point, high reliability, high performance and other features, the mainstream ones are FastDFS, MFS, HDFS, Ceph, GFS, etc. In the early stages, I suggest learning FastDFS, which can meet small and medium-sized needs. 3. Virtualization The utilization rate of hardware server resources is very low, which is quite a waste! The idle servers can be virtualized into many virtual machines, each of which is a complete operating system. It can greatly improve resource utilization. It is recommended to learn the open source KVM+OpenStack cloud platform. Virtual machines are OK as a basic platform, but the elastic scaling of application business is too heavy! It takes several minutes to start up, and the file is so large that it is too difficult to expand quickly! That's easy. Use containers. The main features of containers are rapid deployment and environmental isolation. A service is encapsulated into an image, and hundreds of containers can be created in minutes. The mainstream container technology is none other than Docker. Of course, a single Docker machine in a production environment cannot meet business needs in most cases. Instead, you can deploy Kubernetes and Swarm to manage containers in clusters, forming a large resource pool for centralized management and providing strong support for the infrastructure. 4. Automation Repeated work not only fails to improve efficiency, but also fails to reflect its value. All operation and maintenance work is standardized, such as the unification of environment version, directory structure, operating system, etc. Only on the basis of standardization can more convenient automation be achieved. A complex task can be completed by clicking the mouse or typing a few commands. How cool! Therefore, all operations are automated as much as possible to reduce human errors and improve work efficiency. Mainstream server centralized management tools: Ansible, Saltstack You can choose either of these two. Continuous Integration Tool: Jenkins 5. Advanced Python Development You can further study Python development and master object-oriented programming. It is also best to learn a web framework to develop websites, such as Django and Flask, which is mainly used to develop operation and maintenance management systems. You can write some complex processes into the platform, and then integrate centralized management tools to create a management platform for operation and maintenance. 6. Log analysis system Logs are also very important. Regular analysis can discover potential risks and extract valuable information. An open source logging system: ELK Learn to deploy and use, and provide log viewing requirements for developers. 7. Performance optimization Just deploying is not enough. Performance optimization can maximize the service capacity. This is also quite difficult and one of the key points to a high salary. You have to put in some effort to learn for the sake of money! You can think from the dimensions of hardware layer, operating system layer, software layer and architecture layer. consciousness 1. Persistence Learning is a very long process, and it is a career that each of us needs to persist in throughout our lives. Persistence is valuable, persistence is difficult, and success lies in persistence! 2. Goal Nothing without a goal is called work, nothing without quantification is called a goal. Set a goal for each stage. For example: first set a small achievable goal, earn 100 million! 3. Share Learn to share. The value of technology lies in its ability to effectively transfer knowledge to the outside world so that more people can know it. If everyone contributed a little, what would happen? If you are heading in the right direction, you don’t have to worry about the distance! Ten Linux common sense 1. GNU and GPL The GNU Project (also known as the GNU Project) is a free software collective collaboration project publicly initiated by Richard Stallman on September 27, 1983. Its goal is to create a completely free operating system. GNU is also known as the Free Software Engineering Project. GPL is the GNU General Public License (GPL), which is the "anti-copyright" concept and one of the GNU protocols. Its purpose is to protect the free use, copying, research, modification and release of GNU software. At the same time, the software must be released in the form of source code. The GNU system combined with the Linux kernel forms a complete operating system: a GNU system based on Linux, which is usually called "GNU/Linux", or simply Linux. 2. Linux distribution A typical Linux distribution includes: the Linux kernel, some GNU program libraries and tools, a command line shell, a graphical interface X Window system and a corresponding desktop environment such as KDE or GNOME, and thousands of application software ranging from office suites, compilers, text editors to scientific tools. Mainstream distributions: Red Hat Enterprise Linux, CentOS, SUSE, Ubuntu, Debian, Fedora, Gentoo 3. Unix and Linux Linux is based on Unix and belongs to the Unix class. The Uinx operating system supports multi-user, multi-tasking, multi-threading and supports multiple CPU architectures. Linux inherits the network-centric design concept of Unix and is a multi-user network operating system with stable performance. 4. Swap partition Swap partition, also known as the swap area, is used by the system to swap when the physical memory is insufficient. That is, when the system's physical memory is insufficient, a portion of the hard disk space is released for use by currently running programs. When those programs are to be run, the saved data is restored from the Swap partition to the memory. Programs whose memory space is released are generally those that have not been operated for a long time. Swap space should generally be greater than or equal to the size of physical memory. The minimum size should not be less than 64M, and the maximum size should be twice the size of physical memory. 5. The concept of GRUB GNU GRUB (GRand Unified Bootloader, referred to as "GRUB") is a multi-operating system boot management program from the GNU project. GRUB is a boot manager that supports multiple operating systems. In a computer with multiple operating systems, GRUB can be used to select the operating system that the user wants to run when the computer starts. At the same time, GRUB can boot different kernels on the Linux system partition, and can also be used to pass startup parameters to the kernel, such as entering single-user mode. 6. Buffer and Cache Cache is a temporary storage located between the CPU and the memory. The cache capacity is much smaller than the memory but the exchange speed is much faster than the memory. Cache caches file data blocks to resolve the mismatch between CPU operation speed and memory read and write speed, thereby increasing the data exchange speed between the CPU and memory. The larger the cache, the faster the CPU processing speed. Buffer: High-speed cache memory speeds up access to data on the disk by caching disk (I/O device) data blocks, reduces I/O, and increases the speed of data exchange between memory and hard disk (or other I/O devices). Buffer is about to be written to disk, and cache is read from disk. 7. TCP three-way handshake (1) The requester sends a SYN (SYN=A) packet and waits for confirmation from the responder (2) The responder receives SYN and returns SYN(A+1) and its own ACK(K) packet to the requester. (3) The requester receives the SYN+ACK packet from the responder and sends an ACK packet (K+1) to the responder again. The requester and responder establish a TCP connection, complete a three-way handshake, and start data transmission. 8. Linux system directory structure The Linux file system uses a tree-like directory structure with links, that is, there is only one root directory (usually represented by "/"), which contains information about lower-level subdirectories or files; subdirectories can contain information about even lower-level subdirectories or files. /: The root of the first hierarchy, the root directory of the entire file system hierarchy. That is, the entry to the file system, the highest level directory. /boot: Contains the files required by the Linux kernel and system boot program, such as kernel, initrd; the grub system boot manager is also in this directory. /bin: Commands required by the basic system. Its functions are similar to those of "/usr/bin". All files in this directory are executable. Ordinary users can also execute them. /sbin: Basic system maintenance commands, which can only be used by the super user. /etc: All system configuration files. /dev: Device file storage directory, such as terminals, disks, optical drives, etc. /var: stores frequently changing data, such as logs, emails, etc. /home: The default storage directory for ordinary users' directories. /opt: The storage directory for third-party software, such as user-defined software packages and compiled software packages, are installed in this directory. /lib: The directory where library files and kernel modules are stored, containing all shared library files required by system programs. 9. Hard links and soft links Hard Link: A hard link is a link that uses the same index node (inode number), which means that multiple file names can point to the same file index node (hard links do not support directory links and cannot link across partitions). Deleting a hard link will not affect the source file of the index node and the multiple hard links under it. ln source new-link ln -s source new-link 10. RAID Technology Redundant Arrays of independent Disks (RAID), cheap redundant (independent) disk array. RAID is a technology that combines multiple independent physical hard disks in different ways to form a hard disk group (logical hard disk), providing higher storage performance and data backup technology than a single hard disk. RAID technology can combine multiple disks together as a logical volume to provide disk spanning function; it can divide data into multiple data blocks (Blocks) and write/read multiple disks in parallel to increase the speed of disk access; it can provide fault tolerance through mirroring or verification operations. Specific functions are implemented in different RAID combinations. From the user's perspective, the disk group composed of RAID is like a hard disk, which can be partitioned, formatted and other operations. The storage speed of RAID is much higher than that of a single hard disk, and it can provide automatic data backup and good fault tolerance. RAID level. Different RAID combinations are divided into different RAID levels: RAID 0: It is called Stripping storage technology. All disks are read and written in parallel. It is the simplest form of disk array. It only needs two or more hard disks. It is low-cost and can provide the performance and throughput of the entire disk. However, RAID 0 does not provide data redundancy and error repair functions. Therefore, damage to a single hard disk will result in the loss of all data. (RAID 0 simply increases disk capacity and performance, but does not provide reliability guarantees for data. It is suitable for environments that do not require high data security.) RAID 1: Mirrored storage, which achieves data redundancy by mirroring the data of one of the two disks to the other disk, generating data on the two disks that back up each other, and its capacity is only equal to the capacity of one disk. When data is written to one disk, a mirror image is created on another idle disk, which maximizes the reliability and repairability of the system without affecting performance. When the original data is busy, the data can be read directly from the mirror copy (from the faster of the two hard disks) to improve reading performance. On the contrary, RAID 1 has a slower write speed. RAID 1 generally supports "hot swapping", which means that the hard disks in the array can be removed or replaced while the system is running without interrupting the system. RAID 1 has the highest unit cost of hard disk in the disk array, but it provides high data security, reliability and availability. When a hard disk fails, the system can automatically switch to reading and writing on the mirror disk without the need to reorganize the failed data. RAID 0+1: Also known as RAID 10, it is actually a combination of RAID 0 and RAID 1. It continuously divides data in bits or bytes and reads/writes multiple disks in parallel, while mirroring each disk for redundancy. Through the RAID 0+1 combination, in addition to data being distributed across multiple disks, each disk has its own physical mirror disk, providing redundancy, allowing one or less disk failures without affecting data availability, and providing fast read/write capabilities. RAID 0+1 requires at least 4 hard disks to create a stripe set in disk mirroring. RAID 0+1 technology ensures high data reliability while also ensuring high efficiency in data reading/writing. RAID 5: is a storage solution that takes into account storage performance, data security, and storage cost. RAID 5 can be understood as a compromise between RAID 0 and RAID 1. RAID 5 requires at least three hard drives. RAID 5 can provide data security for the system, but the degree of security is lower than that of mirroring and the disk space utilization is higher than that of mirroring. RAID 5 has a data read speed similar to RAID 0, except that it has an additional parity information, and the data write speed is slightly slower than writing to a single disk. At the same time, since multiple data correspond to one parity information, the disk space utilization of RAID 5 is higher than that of RAID 1, and the storage cost is relatively low. It is currently a widely used solution. You may also be interested in:
|
<<: React internationalization react-intl usage
>>: How to change the default character set of MySQL to utf8 on MAC
MGR (MySQL Group Replication) is a new feature ad...
1. Download the installation package from the off...
1. Python automatically runs at startup Suppose t...
Table of contents javascript tamper-proof object ...
Table of contents 1. Lock and Latch 2. Repeatable...
This work uses the knowledge of front-end develop...
This article uses examples to illustrate the erro...
In the interview, you should have experienced the...
Table of contents 1. Brief Introduction 2. Run sc...
Prototype chain inheritance Prototype inheritance...
I recently encountered a problem when doing IM, a...
1. Experimental description In the virtual machin...
MySQL is easy to install, fast and has rich funct...
Combining the various problems I encountered in m...
Empty link: That is, there is no link with a targ...