Abstract: HBase comes with many operation and maintenance tools to provide users with management, analysis, repair and debugging functions. This article will list some commonly used HBase tools. Developers and operation and maintenance personnel can refer to this article and use these tools to perform daily management and operation of HBase. Introduction to HBase components HBase is a popular and widely used NoSQL database. Due to its complex design architecture and processes, it has a high threshold for operation and maintenance personnel with little big data experience. This article introduces and summarizes the existing tools on HBase. Notes written at the beginning: 1) Due to the large differences between different versions of HBase (for example, the hbck tool has been removed from HBase2.x), all command lines used in this article are run in the environment of MRS_1.9.3, corresponding to HBase version 1.3.1. Some commands are not supported on HBase2 (HBase2 will be introduced separately if there is time). 2) The HBase tools mentioned in this article are all open source tools and do not involve optimization and operation and maintenance tools developed by the manufacturer. Canary Tool HBase Canary is a tool for detecting the current status of an HBase cluster. It uses simple queries to check whether the region on HBASE is available (readable). It is mainly divided into two modes 1) Region mode (default): randomly query a piece of data for each CF in each region, and print whether it is successful and the query latency. #Check the t1 and tsdb-uid tables hbase org.apache.hadoop.hbase.tool.Canary t1 tsdb-uid #Note: Scan all regions when no table is specified 2) In regionserver mode, a table is randomly selected on each regionserver for query, and the query success and query latency are printed. #Check a regionserver hbase org.apache.hadoop.hbase.tool.Canary -regionserver node-ana-coreQZLQ0002.1432edca-3d6f-4e17-ad52-098f2adde2e6.com #Note: Scan all regionservers when regionserver is not specified Canary can also specify some simple parameters, which can be referred to as follows Summarize:
HFile Tools HBase HFile viewing tool, mainly used to check the content/metadata of a specific HFile. When a business finds that a region cannot be read, the region cannot be opened on the regionserver due to file problems, or an exception occurs when reading a file, you can use this tool to check whether there is a problem with the HFile alone. #View the details of one of the HFiles under the t1 table and print the KV hbase org.apache.hadoop.hbase.io.hfile.HFile -v -m -p -f /hbase/data/default/t1/4dfafe12b749999fdc1e3325f22794d0/cf1/06e102be436c449693734b222b9e9aab The parameters used are as follows: Summarize:
RowCounter and CellCounter Tools RowCounter is a statistical tool that uses MapReduce tasks to count the number of table rows. Similar to RowCounter, but collects more detailed statistics related to the table, including: the number of rows, column families, qualifiers, and the number of corresponding occurrences in the table. Both tools can specify the start and end positions of the row and timestamp to perform range queries. # RowCounter scan t1 hbase org.apache.hadoop.hbase.mapreduce.RowCounter t1 #Use CellCounter to scan the t1 table and write the results to the /tmp/t1.cell directory of HDFS hbase org.apache.hadoop.hbase.mapreduce.CellCounter t1 /tmp/t1.cell The parameters used are as follows: Summarize: Impact on the cluster: 3 stars (MapReduce needs to be started to scan all regions of the table, occupying cluster resources) Practicality: 3 stars (the only tool for HBase to count the number of rows in its own table, the count efficiency in HBase shell is relatively low) Clean Tool The clean command is a tool used to clear HBase data on ZooKeeper and HDFS. When the cluster wants to clean up or erase all data, HBase can be restored to its original state. #Clear all data in HBase hbase clean --cleanAll uses the following parameters: Summarize: Impact on the cluster: 5 stars (delete all data on the HBase cluster) Practicality: 2 stars (except for scenarios where HBase data needs to be reset, such as switching to HBase on OBS, which is rarely used) HBCK Tools HBase's hbck tool is the most commonly used tool in daily operation and maintenance. It can check the consistency of regions on the cluster. Since the RIT status of HBase is more complex and prone to problems, problems such as region offline/inconsistency are often encountered during daily operation and maintenance. At this time, you can use the corresponding commands to repair it according to the different inspection results of hbck. #Check the region status of table t1 hbase hbck t1 #Fix the meta of table t1 and reassign hbase hbck -fixMeta -fixAssignments t1 Since this tool is used in too many and detailed scenarios, we will not introduce it in detail here. You can view the description of the parameters to repair various abnormal scenarios. Note: If you don't know the cause of the abnormality, do not use the repair command indiscriminately. This may make the problem worse. The parameters used are as follows: Summarize:
RegionSplitter Tool RegionSplitter is HBase's pre-splitting tool. If you do not configure pre-split when initializing the table, HBase does not know how to split the region, which is likely to cause subsequent region/regionserver hotspots. The best way is to first predict the split point and do pre-splitting when building the table to ensure overall load balancing of business access at the beginning. RegionSplitter can perform pre-split when building a table through a specific split algorithm. It comes with two algorithms: HexStringSplit Use 8 hexadecimal characters to split, which is suitable when the row key is a hexadecimal string (ASCII) as a prefix UniformSplit Use a byte array of length 8 to split, and fill it with 00 on the right according to the original byte value (from 0x00 to 0xFF). When putting data in a table partitioned in this way, you need to modify the rowkey in some way. For example, if the original rowkey is rawStr, you need to get its hashCode, reverse the byte bit, and put it in front of the original rowkey string. #Create the test_table table and use the HexStringSplit algorithm to pre-partition 10 hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1 #Tips: This operation is equivalent to create 'test_table', {NAME => 'f1'}, {NUMREGIONS => 10, SPLITALGO => 'HexStringSplit'} in hbase shell Summarize: Regardless of which pre-split algorithm HBase comes with, it is based on the condition that the rowkey of the table data itself conforms to its agreed format. In practice, users still need to design rowkeys according to their business and implement their own pre-split algorithm (implement the SplitAlgorithm interface). Impact on the cluster: 1 star (table creation operation, no impact on other cluster services) Practicality: 3 stars (the actual pre-split is based on the actual business. For testing, the default split algorithm of HBase can be used to construct the rowkey format) FSHLog Tool FSHLog is a WALs file inspection and splitting tool that comes with HBase. It is mainly divided into two parts: dump Dump the specific contents of a WAL file split Trigger the WAL split operation of a WAL folder #dump the contents of a current WALs file hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump /hbase/WALs/node-ana-coreqzlq0002.1432edca-3d6f-4e17-ad52-098f2adde2e6.com,16020,1591846214733/node-ana-coreqzlq0002.1432edca-3d6f-4e17-ad52-098f2adde2e6.com%2C16020%2C1591846214733.1592184625801 Related parameters Summarize:
WALPlayerTools WALPlayer is a tool that replays logs in WAL files to HBase. You can replay data for a certain table or all tables, or specify corresponding time intervals and other conditions to replay data. #Replay the data of a WAL file to table t1 hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /tmp/node-ana-coreqzlq0002.1432edca-3d6f-4e17-ad52-098f2adde2e6.com%2C16020%2C1591846214733.1592184625801 t1 Q&A: Both FSHLog and WALPlayer can restore data in WAL files to HBase. What are the differences between them? FSHLog triggers a WAL split request to HMaster, which will restore all data in WAL to HBase, following HBase's own WAL split process. WALPlayer starts its own MR task to scan the data in the WAL file, puts the qualified data into a specific table or outputs HFile to a specific directory Related parameters: Summarize:
OfflineMetaRepair The OfflineMetaRepair tool is used to repair HBase metadata. It will rebuild HBase metadata based on HBase's region/table metadata on HDFS. #Re-establish the metadata of hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair Q&A: hbck's fixMeta can also repair HBase metadata and specify specific tables for more flexible use. Is it necessary to use OfflineMetaRepair? The hbck tool is an online repair tool for HBase and cannot be used if HBase is not started. OfflineMetaRepair is to repair HBase metadata in offline state Related parameters: Summarize:
Sweeper Tool The Sweeper tool (HBASE-11644) can merge small MOB files in an HBase cluster and delete redundant MOB files. It will start the corresponding SweepJob task based on the Column Family to merge the corresponding MOB files. Note that this tool cannot be run simultaneously with MOB major compaction, and multiple Sweeper tasks of the same Column Family cannot be run simultaneously. #Execute Sweeper on table t1 hbase org.apache.hadoop.hbase.mob.mapreduce.Sweeper t1 cf1 Related parameters: Summarize:
The above are all the HBase operation and maintenance tools introduced this time. Other tools such as Bulkload batch import, data migration, and test-related PE are not described for the time being. If there is anything wrong, please correct me. Thank you. Official documentation: https://hbase.apache.org/book.html This concludes this article on the top 10 common HBase operation and maintenance tools. For more relevant HBase operation and maintenance tools content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Vue easily realizes watermark effect
>>: MySQL database introduction: detailed explanation of multi-instance configuration method
I encountered a very strange problem today. Look a...
After installing docker, there will usually be a ...
1. Pull the image docker pull registry.cn-hangzho...
Table of contents Overview Install Gulp.js Create...
Table of contents 1.Nuxt server-side rendering ap...
When multiple images are introduced into a page, ...
Part 3: ❤Three ways to overlook backend data rece...
Project scenario: 1. Upload file restrictions Fun...
Copy code The code is as follows: <select> ...
1. Go to Vim's official website to download t...
【content】: 1. Use background-image gradient style...
Table of contents MYSQL METADATA LOCK (MDL LOCK) ...
Table of contents Overview Static type checking C...
After the article "This Will Be a Revolution&...
In the process of writing HTML, we often define mu...