Sunday, January 29, 2012

Statement-level constraint validation: MERGE + MTI

While working on release 3.1 of RuleGen which will, among other things, support the MERGE and Multi-Table-Insert (MTI) statements, I decided to investigate how Oracle deals with validating declarative constraints during execution of these two types of DML statements. I discovered, what I consider bugs in this area, which I will demonstrate in this post. But before I do that, I'd like to discuss another phenomena first: how triggers fire during MERGE and MTI. Row triggers fire as expected: whenever a row is inserted, updated or deleted, the respective before and after row triggers will fire for the row. Statement triggers however fire in what may seem somewhat strange manner. But on second thought is actually the way you would want them to fire.

Statement triggers firing during MERGE

Here's the syntax for MERGE:

MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN MATCHED THEN
    UPDATE SET ...
    DELETE WHERE ...
  WHEN NOT MATCHED THEN
    INSERT ... VALUES ...;

As of 10G the various sections above are optional. This gives us five scenarios to consider.

Scenario 1: the merge statement only has the 'WHEN NOT MATCHED' section, like this:
MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN NOT MATCHED THEN
    INSERT ... VALUES ...;

In this case only the before and after INSERT statement-level triggers on the destination table will fire. By the way this is irrespective of whether the MERGE will actually insert rows into that table or not. Oracle will determine at parse time which sections are present in the MERGE statement text and from that ensure that these triggers are fired.

Scenario 2: the merge statement has both the 'WHEN NOT MATCHED' and 'WHEN MATCHED' (without the DELETE) sections, like this:
MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN MATCHED THEN
    UPDATE SET ...
  WHEN NOT MATCHED THEN
    INSERT ... VALUES ...;

In this case the before and after INSERT and UPDATE statement-level triggers will fire.

Scenario 3: the merge statement has both the 'WHEN NOT MATCHED' and 'WHEN MATCHED', with the DELETE sections, like this:
MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN MATCHED THEN
    UPDATE SET ...
    DELETE WHERE ...
  WHEN NOT MATCHED THEN
    INSERT ... VALUES ...;

In this case all three before and after statement-level triggers will fire: INSERT, UPDATE and DELETE.

Scenario 4: the merge statement only has 'WHEN MATCHED' (without the DELETE) section, like this:
MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN MATCHED THEN
    UPDATE SET ...;

In this case only the before and after UPDATE statement-level trigger will fire.

Scenario 5: the merge statement only has 'WHEN MATCHED' with the DELETE section, like this:
MERGE INTO [destination] a
  USING [some source query] drv
    ON ([join-conditions])
  WHEN MATCHED THEN
    UPDATE SET ...
    DELETE WHERE ...;

In this case the before and after UPDATE and DELETE statement-level triggers will fire.

So there they are: five possible combinations of statement-level triggers firing during a MERGE statement. All directly dependent upon which sections appear in the statement.

Note that when you've defined a combined statement-level trigger (one created using 'after/before update or delete or insert'), this trigger potentially fires one, two, or three times. Due to a bug however (bug 9865211 see MOS) the three known booleans INSERTING, UPDATING and DELETING that you can inspect to find out why the combined statement-level trigger is firing, all return FALSE. This is supposedly fixed though in the recent 11.2.0.3 patchset. But even then you really cannot tell the difference between whether for instance the insert trigger is firing due to a MERGE or due to a regular INSERT statement. A fourth boolean, say MERGING, would be convenient for that.

Statement triggers firing during MTI

Here's the syntax for Multi-Table-Insert:
INSERT ALL|FIRST
  WHEN [condition] THEN
    INTO [table] VALUES ...
  WHEN [condition THEN
    INTO [table] VALUES ...
  ...
[some query];

As with the MERGE statement above, Oracle will determine during parse time which before and after statement level INSERT triggers will fire. Suppose you've written an MTI statement like this:
INSERT ALL|FIRST
  WHEN [condition] THEN
    INTO EMP VALUES ...
  WHEN [condition] THEN
    INTO DEPT VALUES ...
[some query];

Then the before insert statement triggers of both tables EMP and DEPT will fire, as well as the after insert statement triggers of tables EMP and DEPT. Not too surprising so far. However we can also write an MTI statement like this (two sections insert into EMP, one into DEPT):
INSERT ALL|FIRST
  WHEN [condition] THEN
    INTO EMP VALUES ...
  WHEN [condition] THEN
    INTO EMP VALUES ...
  WHEN [condition] THEN
    INTO DEPT VALUES ...
[some query];

Note that in this case Oracle will fire the before and after statement INSERT trigger of the EMP table twice each!
So timewise something like this will happen:
before-insert-statement trigger EMP fires.
before-insert-statement trigger EMP fires again.
before-insert-statement trigger DEPT fires.
MTI-executes.
after-insert-statement trigger EMP fires.
after-insert-statement trigger EMP fires again.
after-insert-statement trigger DEPT fires.

Bug or feature?
Anyway using this 'feature' we can write an MTI-statement that will fire the before and after INSERT triggers on a table, any number of times: just add more WHEN branches for that table. And the order in which the triggers fire seem to be determined by the order in which the tables appear inside the MTI statement.

Statement-level constraint consistency

RuleGen implements integrity constraints by generating the necessary triggers for them. One of the goals of RuleGen, is that integrity constraints implemented by triggers, should have the same 'features' as declarative constraints have. That is: you should be able to enable/disable them, have them validate immediately or deferred. And in the case of immediate checking, be immune to the order in which rows happen to get changed by the triggering DML statement.

So, having investigated the way triggers fire for these two types of statements, and thinking about how to make this all work correctly and efficiently when generating code with RuleGen, I was struck with the thought of how Oracle deals with key (primary/unique) and foreign key validation during merge and mti.

As some of you might know, when we execute the three regular DML statement types (insert, update and delete) Oracle works as expected: intermediate violations during statement execution are allowed, as long as they are 'fixed' at completion of the DML execution. Here's an example.

Let's setup a simple table with a few constraints:

create table p
(i number not null
,j number not null
,k number not null
,primary key(i)
,foreign key (j) references p(i)
,unique(k));
insert into p values(1,1,1);
insert into p values(2,2,2);
insert into p values(3,3,3);
commit;
Using an update statement we can switch key-values (primary or unique) without being confronted with the intermediate key-violations that is introduced during the statement execution. See below, we are switching key values 1 and 2 of both the primary key and the unique key:


SQL> update p set i = case i
  2                     when 1 then 2
  3                     when 2 then 1
  4                   end
  5  where i in (1,2)
  6  /

2 rows updated.


SQL> update p set k = case k

  2                     when 1 then 2
  3                     when 2 then 1
  4                   end
  5  where k in (1,2)
  6  /

2 rows updated.


SQL>

SQL> select * from p;
More...

         I          J          K

---------- ---------- ----------
         2          1          2
         1          2          1
         3          3          3

3 rows selected.


SQL>


In a similar way Oracle allows intermediate foreign key violations. Below we've contrived an update statement that introduces a new key value of 4 but will first change a foreign key value to 'point to' this new key value.

SQL> rollback;

Rollback complete.


SQL> select * from p;

More...

         I          J          K

---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3

3 rows selected.


SQL> update (select * from p order by i desc)

  2           set j = case j
  3                     when 1 then 4
  4                     when 3 then 4
  5                     else j
  6                   end
  7              ,i = case i
  8                     when 3 then 4
  9                     else i
 10                   end
 11  /

3 rows updated.


SQL> select * from p;

More...

         I          J          K

---------- ---------- ----------
         1          4          1
         2          2          2
         4          4          3

3 rows selected.


SQL>


And again Oracle allows this: we are not confronted with an intermediate foreign key violation due to the order in which we've designed the above update statement to modify the rows.

So what about this 'statement-level constraint consistency' feature when we deal with merge and mti statements?

SQL> rollback;

Rollback complete.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3

3 rows selected.

SQL> merge into p
  2  using (select 1 as x from dual union all
  3         select 2 as x from dual) drv
  4  on (drv.x = p.i)
  5  when matched then
  6     update set i = case i
  7                     when 1 then 2
  8                     when 2 then 1
  9                    end
 10  /
on (drv.x = p.i)
            *
ERROR at line 4:
ORA-38104: Columns referenced in the ON Clause cannot be updated: "P"."I"

SQL> merge into p
  2  using (select 1 as x from dual union all
  3         select 2 as x from dual) drv
  4  on (drv.x = p.i)
  5  when matched then
  6     update set k = case k
  7                     when 1 then 2
  8                     when 2 then 1
  9                    end
 10  /

2 rows merged.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          2
         2          2          1
         3          3          3

3 rows selected.

SQL>

So apart from the ORA-38104 which exhibits an obvious restriction of the merge statement, Oracle allows us to switch key values with merge.

And what about foreign key statement level consistency? Here's a merge that introduces a new parent key value 4 and updates an existing foreign key value to 4. The first merge processes the rows in a different order as does the second merge:

SQL> rollback;

Rollback complete.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3

3 rows selected.

SQL> merge into p
  2  using (select 1 as x from dual union all
  3         select 4 as x from dual
  4         order by x asc) drv
  5  on (drv.x = p.i)
  6  when matched then
  7     update set j = 4
  8  when not matched then
  9     insert values(4,4,4)
 10  /
merge into p
*
ERROR at line 1:
ORA-02291: integrity constraint (WORK.SYS_C0019049) violated - parent key not found


SQL> merge into p
  2  using (select 1 as x from dual union all
  3         select 4 as x from dual
  4         order by x desc) drv
  5  on (drv.x = p.i)
  6  when matched then
  7     update set j = 4
  8  when not matched then
  9     insert values(4,4,4)
 10  /

2 rows merged.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          4          1
         2          2          2
         3          3          3
         4          4          4

4 rows selected.

SQL>

Again: bug or feature? The first merge introduces a temporary violation in that it updates an fk-value to 4 prior to the insertion of parent 4. As you can see with the second merge processing the insert before the update, all goes well. Row processing order dependent behavior is not what we want (sounds a lot like why we have the mutating table error, right?).

So it's a bug if you ask me.

Now what about the multi-table-insert? Does it allow intermediate foreign key violations? Here's an example where we add two rows to table p, twice, with different row-orders.

SQL> rollback;

Rollback complete.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3

3 rows selected.

SQL> insert all
  2    into p values(5,4,5)
  3    into p values(4,4,4)
  4  select dummy
  5  from dual
  6  /

2 rows created.

SQL> rollback;

Rollback complete.

SQL> insert all
  2    into p values(4,4,4)
  3    into p values(5,4,5)
  4  select dummy
  5  from dual
  6  /

2 rows created.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3
         4          4          4
         5          4          5

5 rows selected.

SQL>

The former MTI statement introduces a child row under key 4 prior to inserting parent 4. Oracle allows it. To me even more proof that the issue shown with the merge statement is indeed a bug.

But wait a minute what about a foreign key between two different tables? Let's introduce a second table called c.

create table c
(l number not null primary key
,i number not null references p(i))
/

And write an MTI statement that introduces parent 4 and child 4 in table c:

SQL> rollback;

Rollback complete.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3

3 rows selected.

SQL> insert all
  2    into c values(1,4)
  3    into p values(4,4,4)
  4  select dummy
  5  from dual
  6  /
insert all
*
ERROR at line 1:
ORA-02291: integrity constraint (WORK.SYS_C0019053) violated - parent key not found


SQL> rollback;

Rollback complete.

SQL> insert all
  2    into p values(4,4,4)
  3    into c values(1,4)
  4  select dummy
  5  from dual
  6  /

2 rows created.

SQL> select * from p;
More...

         I          J          K
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3
         4          4          4

4 rows selected.

SQL>

Ah! Got you again. Bug or feature? Definitely a bug IMHO.

So much for the fun that I've had the past week investigating how to support Merge and Multi Table Insert with RuleGen.

Sunday, June 14, 2009

Two example constraints

In a post over at the Helsinki declaration blog, I've introduced two rather simple constraints in a two table database design holding a BOOK and ORDERLINE table.
  • A user cannot have more than a (sum) total of fifty books in ORDERLINE.
  • A title cannot be ordered more than a (sum) total of eighty in ORDERLINE.
As assertions we would specify these as follows:
create assertion no_more_than_50_per_user as
check(not exists(select 'a user with more than 50 books'
from (select username,sum(amount) as total
from orderline
group by username)
where total > 50)
);

create assertion no_more_than_80_per_title as
check(not exists(select 'a title with more than 80 ordered'
from (select ean,sum(amount) as total
from orderline
group by ean)
where total > 80)
);
Implementing these two constraints with the RuleGen framework is fairly straightforward. Let's start with the first one, no_more_than_50_per_user. What we first need to tell RuleGen is when this constraints needs to be validated:
  • on inserts?
  • on deletes?
  • on updates?
Now obviously whenever an ORDERLINE row is deleted, there is no need to validate this constraint. When we insert an ORDERLINE the constraint must be validated, as the insert could introduce a total of more than fifty books for a user. Likewise when we update the AMOUNT column and increase its current value, the constraint must be validated too.

The way we tell RuleGen this is by specifying (in this case) two queries:
select username
from inserted_rows;

select new_username as username
from updated_rows
where new_amount > old_amount;
The objects selected from by these queries (inserted_rows and updated_rows) are maintained by the framework and will hold the rows inserted/updated by the current transaction. We select the username value in above queries to tell RuleGen that this constraint is to be validated per username. RuleGen will execute these queries, and upon rows fetched from them, continue with the next step.

Next we need to tell RuleGen how this constraint is to be validated. This is done by specifying yet another query. A query on the tables that are constrained. The goal of this query is to seek a violation of the constraint. Here it is.
select 'Cannot have more than fifty books ordered for user '||p_username||'. Found: '||TOTAL||'.' as MSG
from (select sum(AMOUNT) as TOTAL
from ORDERLINE
where USERNAME = p_username)
where TOTAL > 50;
When this query fetches a row, RuleGen will use the MSG-value to raise an exception, signalling that the constraint is violated.

The second constraint is very much like the one demonstrated above. I'll leave it up to the reader to specify the RuleGen when and how for that one.