转载地址:http://www.cnblogs.com/killkill/archive/2010/09/04/1817266.html
相关博客:(1)http://bbs.csdn.net/topics/350013767
(2)http://www.itpub.net/thread-1325582-1-1.html
以前遇到了 not in 子查询的一个 null 陷阱,有经验的朋友可能知道怎么回事了,用代码来说就是:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
-- 创建两张测试表:
create
table
tmp01
as
with
tmp
as
(
select
1
as
id
from
dual
union
all
select
2
from
dual
union
all
select
3
from
dual
union
all
select
null
from
dual
)
select
*
from
tmp;
create
table
tmp02
as
with
tmp
as
(
select
1
as
id
from
dual
union
all
select
2
from
dual
union
all
select
null
from
dual
)
select
*
from
tmp;
|
我现在想知道表tmp01有哪些id值不在tmp02中,于是我随手就写了一条语句:
1
2
3
|
select
id
from
tmp01
where
id
not
in
(
select
id
from
tmp02 )
|
我期望的结果是:
1
2
3
|
ID
----------
3
|
但实际结果却是:
1
|
no rows selected
|
近日读到了dinjun123的大作《符合列NULL问题的研究》,终于静下心来想想这个问题。
通常使用 not in / not exists 的场景是希望得到两个集合的“差集”,与真正的差集又略有不同,后文将会提到,一般的写法有两种:
1
2
|
select
id
from
tmp01
where
id
not
in
(
select
id
from
tmp02 )
select
id
from
tmp01
where
not
exists (
select
1
from
tmp02
where
tmp02.id=tmp01.id )
|
正如上文提到的例子,第一条语句没有可返回的行(no rows selected),第二条语句返回了结果是:
1
2
3
4
|
ID
----------
(null)
3
|
为什么第一个没有结果呢?
我们可以将第一条语句重写为:
1
|
select
id
from
tmp01
where
id<>1
and
id<>2
and
id<>
null
|
id=1或者2的时候很好理解,当id=3的时候,id<>null 的判断结果是UNKNOW,注意不是false,where子句只认true,其他都不认,所以tmp01中没有一个值经过 id<>1 and id<>2 and id<>null 这个长长的条件判断后能获得true,也就不会有结果集返回了。
那第二条语句为什么返回的结果是两条呢?3容易理解,null为什么也在结果集中呢?明明tmp02中有null值的啊,我们仔细看一下子查询的where 子句 tmp02.id=tmp01.id,我们再逐个值来跟踪一下,这里我用笛卡尔乘积来获得结果:
1
2
3
4
5
6
7
8
9
10
|
set
pagesize 6;
select
tmp01.id
"tmp01.id"
,
tmp02.id
"tmp02.id"
,
(
select
case
when
count
(*)>0
then
' Yes '
else
' No '
end
from
dual
where
tmp01.id=tmp02.id)
"Result Exists?"
from
tmp01,tmp02
order
by
1,2
|
结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
tmp01.id tmp02.id Result Exists?
---------- ---------- ---------------
1 1 Yes
1 2 No
1 (null) No
tmp01.id tmp02.id Result Exists?
---------- ---------- ---------------
2 1 No
2 2 Yes
2 (null) No
tmp01.id tmp02.id Result Exists?
---------- ---------- ---------------
3 1 No
3 2 No
3 (null) No
tmp01.id tmp02.id Result Exists?
---------- ---------- ---------------
(null) 1 No
(null) 2 No
(null) (null) No
|
从结果来看有这么一个规律:只要 null 参与了比较,Result Exists? 就一定为NO(因为结果是UNKNOW),这个也是关于 null 的基本知识,这就解析了为什么第二条语句的输出是两行。
从上面的分析,我们可以“窥视”出 in/not in 的结果是依赖于“=”等值判断的结果;exists/not exists 虽然是判断集合是否为空,但通常里面的子查询做的是值判断。
知道了造成结果集出乎意料的原因,我们就可以修改我们的SQL了,为了测试方便,将原来的表tmp01和tmp02改名:
1
2
|
rename tmp01
to
tmp01_with_null;
rename tmp02
to
tmp02_with_null;
|
我们看看测试用例:
1
2
3
4
5
6
|
test case id tmp01 has null tmp01 has null result has null
------------- ---------------- ---------------- ----------------
1 true true false
2 true false true
3 false true false
4 false false false
|
其中test case 4 就是打酱油的,只要SQL没有写错,一般不会出问题。
最终,SQL语句改写为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
-- not in 求差集
with
tmp01
as
(
select
id
from
tmp01_with_null
--where id is not null
),
tmp02
as
(
select
id
from
tmp02_with_null
--where id is not null
)
-- start here
select
id
from
tmp01
where
id
not
in
(
select
id
from
tmp02
where
id
is
not
null
)
-- 以下是新加的,应付 test case 2
union
all
select
null
from
dual
where
exists (
select
1
from
tmp01
where
id
is
null
)
and
not
exists (
select
1
from
tmp02
where
id
is
null
)
-- not exists 求差集
with
tmp01
as
(
select
id
from
tmp01_with_null
--where id is not null
),
tmp02
as
(
select
id
from
tmp02_with_null
--where id is not null
)
-- start here
select
id
from
tmp01
where
not
exists (
select
1
from
tmp02
where
(tmp02.id=tmp01.id)
-- 这行是新加的,应付 test case 1
or
(tmp02.id
is
null
and
tmp01.id
is
null
)
)
|
写了这么多,有人会提议使用minus操作符:
1
2
3
4
5
6
7
8
9
10
|
with
tmp01
as
(
select
id
from
tmp01_with_null
--where id is not null
),
tmp02
as
(
select
id
from
tmp02_with_null
--where id is not null
)
-- start here
select
id
from
tmp01
minus
select
id
from
tmp02
|
貌似语句很简单,但是结果确不一样,请看下面这条语句:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
with
tmp01
as
(
select
id
from
tmp01_with_null
--where id is not null
union
all
-- 注意这里,现在tmp01已经有重复行了
select
id
from
tmp01_with_null
-- 注意这里,现在tmp01已经有重复行了
),
tmp02
as
(
select
id
from
tmp02_with_null
--where id is not null
)
-- start here
select
'minus '
as
sql_op,id
from
tmp01
minus
select
'minus '
,id
from
tmp02
union
all
-- not in
select
'not in'
,id
from
tmp01
where
id
not
in
(
select
id
from
tmp02
where
id
is
not
null
)
union
all
select
'not in'
,
null
from
dual
where
exists (
select
1
from
tmp01
where
id
is
null
)
and
not
exists (
select
1
from
tmp02
where
id
is
null
)
union
all
-- not exists
select
'not exists'
,id
from
tmp01
where
not
exists (
select
1
from
tmp02
where
(tmp02.id=tmp01.id)
-- 这行是新加的,应付 test case 1
or
(tmp02.id
is
null
and
tmp01.id
is
null
)
);
|
1
2
3
4
5
6
7
|
SQL_OP ID
---------- ----------
minus 3
not in 3
not in 3
not exists 3
not exists 3
|
minus消灭了重复行!这就是前文所说的 not in 和 not exists 并非真正意义上的差集。