Detailed explanation of MLSQL compile-time permission control example

Detailed explanation of MLSQL compile-time permission control example

Preface

The simple understanding of MySQL permissions is that MySQL allows you to do things within your ability and you cannot cross the line.

Permission control is as important as the lifeline of MLSQL. MLSQL needs to access a variety of resources, such as MySQL, Oracle, HDFS, Hive, Kafka, Sorl, ElasticSearch, Redis, API, Web, etc. Different users have different permissions for these data sources (as well as tables and columns).

The traditional model is that each user needs to have a proxy user, and then authorize this proxy user in each data source. This may seem troublesome, but in practice, it is basically difficult to implement. Different data sources are in different teams, so the entire application process may take days or even weeks.
If the above problems are discouraging, then for companies that use Hive as a data warehouse, the access rights to Hive may be even more despairing. Hive's authorization model follows the Linux user, that is, whoever starts Spark has access rights. This is completely unfeasible for multi-tenant MLSQL applications. For example, Spark is started by sparkUser, but the actual executor may be Zhang San, Li Si, etc. Hive cannot know who completed the task, only that it was sparkUser.

There is another point that everyone may feel:

We wrote a script with great difficulty and ran it for an hour when it suddenly failed. We found that the data source accessed on line 350 did not have sufficient permissions. This is really annoying.

Here comes the problem

So, how can we know whether all resources involved in the script are authorized before the script runs?

The answer is: Yes

Off topic: The title is not rigorous, because MLSQL is essentially an interpreted execution language that does not require compilation. A better title would be [Permission Control During Parsing].

If MLSQL has permission verification turned on, it will first scan the entire script and then extract the necessary information, which includes detailed information about various data sources, so that you can know whether you have accessed unauthorized libraries and tables before running. So how does MLSQL do it? Let's look at the following information:

connect jdbc where
driver="com.mysql.jdbc.Driver"
and url="jdbc:mysql://${ip}:${host}/db1?${MYSQL_URL_PARAMS}"
and user="${user}"
and password="${password}"
as db1_ref;

load jdbc.`db1_ref.people`
as people;

save append people as jdbc.`db1_ref.spam`;

Because MLSQL requires that any data source must be loaded using a load statement. When parsing the load statement, MLSQL knows that the user is now accessing a data source based on the JDBC protocol, and it obtains this information through the URL:

db:db1
table: people
operateType: load
sourceType:mysql
tableType: JDBC

Of course, this script user will also write a spam table, and information will be extracted as well:

db:db1
table: people
operateType: save
sourceType:mysql
tableType: JDBC

Then there is a temporary table people, so this script has three tables of information in total, which will then be sent to AuthCenter for judgment. AuthCenter will tell MLSQL which table is not authorized for the current user. If an unauthorized table is found, MLSQL will directly throw an exception. During the whole process, no physical plan will be executed at all, only information extraction from the script.

In MLSQL, we cannot access hive tables in select statements. We can only load them through load statements. For example, the following statement will report an error:

select * from public.abc as table1;

We do not have access to the public.abc library in the select statement. If you need to use it, you can do it as follows:

load hive.`public.abc` as abc;
select * from abc as table1;

How to implement column level control

When MLSQL parses the load statement, it will ask the current user which tables are accessed and which columns are authorized. It will then rewrite the last load statement and provide a new view that only has the columns that the user is authorized to access.

Summarize

Through some effective restrictions, MLSQL can directly extract all data source related information at the syntax parsing level and send it to the corresponding permission center for judgment, avoiding authorization rejection issues during runtime. This move by MLSQL is of great significance. It makes the MLSQL system no longer completely dependent on the permission control of the underlying system, thus greatly simplifying the problem.

Well, that’s all for this article. I hope the content of this article will be of certain reference value to your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • How to enable remote access rights for MySQL database (two methods)
  • Summary of how to set remote access permissions for MySQL database
  • mysql add, delete users and assign permissions
  • How to create, authorize, and revoke MySQL users
  • How to solve the problem of setting trigger permissions in MYSQL
  • Detailed instructions for compiling and installing CentOS MySQL 5.7
  • Simple steps to compile and install MySQL 5.5 under Linux
  • Compile and install MySQL error solution under centos

<<:  Tutorial on how to modify the IP address of a Linux virtual machine, check the gateway, and configure the network environment

>>:  Axios cancel request and avoid duplicate requests

Recommend

Explanation of MySQL performance inspection through show processlist command

The show processlist command is very useful. Some...

HTML symbol to entity algorithm challenge

challenge: Converts the characters &, <, &...

Sample code for deploying Spring-boot project with Docker

1. Basic Spring-boot Quick Start 1.1 Quick start ...

How to query the intersection of time periods in Mysql

Mysql query time period intersection Usage scenar...

Mysql uses insert to insert multiple records to add data in batches

If you want to insert 5 records into table1, the ...

vue $set implements assignment of values ​​to array collection objects

Vue $set array collection object assignment In th...

What are the benefits of using B+ tree as index structure in MySQL?

Preface In MySQL, both Innodb and MyIsam use B+ t...

Basic commands for MySQL database operations

1. Create a database: create data data _name; Two...

In-depth analysis of HTML semantics and its related front-end frameworks

About semantics Semantics is the study of the rel...

MySQL index failure principle

Table of contents 1. Reasons for index failure 2....