1. HBase Overview 1.1 What is HBase HBase is a NoSQL database based on HDFS for distributed data storage, with high reliability, high performance, column storage, scalability, and real-time reading and writing. Hbase can store massive amounts of data, and has high query performance in the later stages, and can return results in seconds for queries of hundreds of millions of data items. 1.2 Characteristics of HBase Tables 1. Large
2. No Mode
3. Column-oriented
4. Sparse
5. Multiple versions of data
6. Single data type
1.3 Logical view of hbase table 2. HBase cluster structure 1. client
2. Zookeeper The client needs a zk cluster to operate hbase table data effect
3. HMaster It is the boss of the entire hbase cluster effect
4. HRegionServer It is the younger brother of the integrated hbase cluster effect
5. Region It is the smallest unit of distributed storage in the entire HBase table. Its data is stored based on hdfs 3. HBase cluster installation and deployment Prerequisites
1. Download the corresponding installation package
2. Plan the installation directory
3. Upload the installation package to the server 4. Unzip the installation package to the specified planning directory
5. Rename the decompression directory
6. Modify the configuration file You need to put the hadoop installation directory in the /etc/hadoop folder
You need to copy the above two hadoop configuration files to the conf folder under the hbase installation directory 1. vim hbase-env.sh #Configure java environment variables export JAVA_HOME=/export/servers/jdk #Specify that the hbase cluster is managed by an external zk cluster, and do not use the built-in zk cluster export HBASE_MANAGES_ZK=false 2. vim hbase-site.xml <!-- Specify the path where hbase is stored on HDFS --> <property> <name>hbase.rootdir</name> <value>hdfs://node1:9000/hbase</value> </property> <!-- Specify that hbase is distributed --> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!-- Specify the address of zk, separate multiple addresses with “,” --> <property> <name>hbase.zookeeper.quorum</name> <value>node1:2181,node2:2181,node3:2181</value> </property> 3. vim regionservers #Specify which nodes are HRegionServer node2 node3 4. vim backup-masters #Specify which nodes are standby Hmasters node2 7. Configure hbase environment variables vim /etc/profile export HBASE_HOME=/export/servers/hbase export PATH=$PATH:$HBASE_HOME/bin 8. Distribute hbase directories and environment variables scp -r hbase node2:/export/servers scp -r hbase node3:/export/servers scp /etc/profile node2:/etc scp /etc/profile node3:/etc 9. Make the environment variables of all hbase nodes effective Execute on all nodes
4. Start and stop the hbase cluster 1. Start the hbase cluster Start the zk and hadoop clusters first Then through hbase/bin start-hbase.sh
2. Stop the hbase cluster Via hbase/bin stop-hbase.sh hbase cluster web management interface 1. After starting the hbase cluster Access address HMaster host name: 16010 5. Hbase shell command line operation hbase/bin/hbase shell Enter the hbase shell client command operation 1. Create a table create 't_user_info','base_info','extra_info' create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} 2. Check which tables are there list Similar to sql:show tables in mysql table 3. View the description information of the table describe 't_user_info' 4. Modify table properties #Modify the maximum number of versions of the column family alter 't_user_info', NAME => 'base_info', VERSIONS => 3 5. Add data to the table put 't_user_info','00001','base_info:name','zhangsan' put 't_user_info','00001','base_info:age','30' put 't_user_info','00001','base_info:address','beijing' put 't_user_info','00001','extra_info:school','shanghai' put 't_user_info','00002','base_info:name','lisi' 6. Query table data //Query according to the conditions get 't_user_info','00001' get 't_user_info','00001', {COLUMN => 'base_info'} get 't_user_info','00001', {COLUMN => 'base_info:name'} get 't_user_info','00001',{TIMERANGE => [1544243300660,1544243362660]} get 't_user_info','00001',{COLUMN => 'base_info:age',VERSIONS =>3} //Full table query scan 't_user_info' 7. Deleting Data delete 't_user_info','00001','base_info:name' deleteall 't_user_info','00001' 8. Delete table disable 't_user_info' drop 't_user_info' 6. The internal principle of hbase
7. HBase addressing mechanism Finding RegionServer
-ROOT-Table
.META. table
Contact regionserver to query target data The regionserver locates the region where the target data is located and issues a query request region is searched in memstore first, and returned if it matches If it is not found in the memstore, it scans in the storefile (it may scan many storefiles---bloomfilter). The bloom filter can quickly return whether the queried rowkey is in this storefile, but there are also errors. If it returns no, it must not be there. If it returns yes, it may not be there. 8. Hbase Advanced Applications Create a table BLOOMFILTER defaults to Row Bloom filter
VSRSIONS defaults to 1 data version
COMPRESSION The default value is NONE compression
disable_all 'toplist.*' disable_all supports regular expressions and lists the currently matching tables. drop_all is the same hbase table pre-partitioning -- manual partitioning One way to speed up batch writing is to create some empty regions in advance. When data is written to HBase, the data load is balanced within the cluster according to the region partitioning. Reduce automatic partitioning when data reaches the storefile size Time consumption, and there is another advantage, that is, the reasonable design of rowkey can make the concurrent requests of each region evenly distributed (tend to be uniform) to maximize the IO efficiency. Row key design Keep the number of column families as small as possible, usually 2-3 rowkey
It is recommended to use the high bit of the rowkey as the hash field, which is randomly generated by the program, and the low bit as the time field. This will increase the probability of evenly distributing data in each RegionServer to achieve load balancing. (Shield) rowkey contradiction
Hotspot resolution
You can use Long.Max_Value - timestamp to append to the end of the key, for example [key][reverse_timestamp]. The latest value of [key] can be obtained by scanning [key] to obtain the first record of [key], because the rowkey in HBase is ordered and the first record is the last entered data. Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links You may also be interested in:
|
<<: Summary of several MySQL installation methods and configuration issues
>>: How to simplify Redux with Redux Toolkit
This time let’s look at a navigation bar layout w...
How to create a Linux virtual machine in VMware a...
I have never been able to figure out whether the ...
Content Detail Tags: <h1>~<h6>Title T...
** Detailed graphic instructions for installing y...
Table of contents Preface How to encapsulate a To...
Preface In the previous article, I shared with yo...
The nginx.conf configuration file is as follows u...
New features in MySQL 8.0 include: Full out-of-th...
1 Start the Docker service First you need to know...
Starting from IE 8, IE added a compatibility mode,...
In the MySQL database, after tables are associate...
Search Mirror docker search rocketmq View image v...
CentOS6.9 installs Mysql5.7 for your reference, t...
After nginx is compiled and installed and used fo...