INFOSYS 290A / Section 2: XML and Databases |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
SQL/XML |
ID | FIRSTNAME | LASTNAME | SEX | BIRTHDATE |
1 | Sam | Jones | M | 10.29.1958 |
2 | Susan | Johnson | F | 05.26.2001 |
3 | Sally | Johannson | F | 02.09.1962 |
4 | Santiago | Jimenez | M | 09.09.1925 |
5 | Sun | Jiang | F | 11.12.2003 |
Prescriptions
PATIENTID | DOCTORID | MEDICATIONNAME | DOSAGE | QUANTITY | DIAGNOSIS |
1 | 1 | Pink pills | 10 mg | 13 pills | <Diagnosis>Covered in anti-pink splotches.</Diagnosis> |
1 | 1 | Green liquid | 15 ml | 1 liter | <Diagnosis>A <b>very</b> bad case of something needing green liquid.</Diagnosis> |
3 | 1 | Pink and green pills | 15 mg | 27 pills | <Diagnosis>Ewwwww. Yech.</Diagnosis> |
4 | 1 | Red pills | 8 mg | 32 pills | <Diagnosis>Very, very sick.</Diagnosis> |
4 | 2 | Little striped things | 17.3 mg | 14 things | <Diagnosis>Little striped things beats eating worms.</Diagnosis> |
4 | 2 | Pink pills | 8 mg | 53 pills | <Diagnosis>Hyperhypohippopotamoiditus</Diagnosis> |
4 | 1 | Green liquid | 7 ml | .5 liter | <Diagnosis>An abundance of slithy toves.</Diagnosis> |
5 | 3 | Little striped things | 3.71 mg | 27 things | <Diagnosis>Lacking mimsy borogoves.</Diagnosis> |
Doctors
ID | FIRSTNAME | LASTNAME | SPECIALTY |
1 | Sean | Wilson | Hematology |
2 | Kim | Gonzales | Hidrosis |
3 | Devin | Liu | Lachrimation |
PatientsDoctors
PATIENTID | DOCTORID |
1 | 1 |
1 | 2 |
2 | 2 |
3 | 1 |
4 | 1 |
4 | 2 |
5 | 3 |
5 | 2 |
Element and attribute names are specified with SQL identifiers in SQL/XML. As a result, when quoted identifiers are needed -- such as to construct the mixed-case element name Foo -- they must be placed in the quoting character used by the database. In standard SQL, this is a double quote ("). For example:
XMLELEMENT(NAME Patient)
constructs an element named PATIENT, while:
XMLELEMENT(NAME "Patient")
constructs an element named Patient. (String literals, on the other hand, use single quotes (').)
It is not necessary for the white space in your results to exactly match the white space shown in the results listed in each question.
Be very careful when matching the parentheses used by each SQL/XML function. It is very easy to get these wrong.
Be sure to use commas to separate the various items in each SQL/XML function. For example:
XMLELEMENT(NAME "Patient", XMLELEMENT(NAME "FirstName", FirstName) <==== COMMA MISSING!!! XMLELEMENT(NAME "LastName", LastName) )
is incorrect because a comma is missing after the first child XMLELEMENT.
Write a query that creates a result set with two columns. The first column is patient ID and the second column is an XML document with the form:
<Patient> <ID>...</ID> <FirstName>...</FirstName> <LastName>...</LastName> <Sex>...</Sex> <Birthdate>...</Birthdate> </Patient>
Answer:
SELECT ID, XMLELEMENT(NAME "Patient", XMLFOREST(ID AS "ID", FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate")) FROM PATIENTS;
Same as (1), except that the document uses attributes instead of child elements:
<Patient ID="..." FirstName="..." LastName="..." Sex="..." Birthdate="..." />
Answer:
SELECT ID, XMLELEMENT(NAME "Patient", XMLATTRIBUTES(ID AS "ID", FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate")) FROM PATIENTS;
Same as (1), except that ID is an attribute:
<Patient ID="..."> <FirstName>...</FirstName> <LastName>...</LastName> <Sex>...</Sex> <Birthdate>...</Birthdate> </Patient>
Answer:
SELECT ID, XMLELEMENT(NAME "Patient", XMLATTRIBUTES(ID AS "ID"), XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate")) FROM PATIENTS;
Same as (2), except that (a) the result set has a single column containing an XML document and (b) the Patient elements are wrapped in a Patients element:
<Patients> <Patient ID="..." FirstName="..." LastName="..." Sex="..." Birthdate="..." /> ... <Patient ID="..." FirstName="..." LastName="..." Sex="..." Birthdate="..." /> </Patients>
Answer:
SELECT XMLELEMENT(NAME "Patients", XMLAGG(XMLELEMENT(NAME "Patient", XMLATTRIBUTES(ID AS "ID", FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate") ) ) ) FROM PATIENTS;
Same as (3), except that the document also lists all prescriptions for the patient:
<Patient ID="..."> <FirstName>...</FirstName> <LastName>...</LastName> <Sex>...</Sex> <Birthdate>...</Birthdate> <Prescription> <Medication>...</Medication> <Dosage>...</Dosage> <Quantity>...</Quantity> </Prescription> ... </Patient>
Answer:
SELECT ID, XMLELEMENT(NAME "Patient", XMLATTRIBUTES(ID AS "ID"), XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate"), (SELECT XMLAGG(XMLELEMENT(NAME "Prescription", XMLFOREST (MEDICATIONNAME AS "MedicationName", DOSAGE AS "Dosage", QUANTITY AS "Quantity") ) ) FROM PRESCRIPTIONS WHERE PATIENTS.ID = PRESCRIPTIONS.PATIENTID) ) FROM PATIENTS;
Same as (5), except that (a) the document lists the patient's doctors instead of the patient's prescriptions, and (b) that the Doctor elements are wrapped in a Doctors element:
<Patient ID="..."> <FirstName>...</FirstName> <LastName>...</LastName> <Sex>...</Sex> <Birthdate>...</Birthdate> <Doctors> <Doctor> <FirstName>...</FirstName> <LastName>...</LastName> </Doctor> ... </Doctors> </Patient>
Answer:
SELECT ID, XMLELEMENT(NAME "Patient", XMLATTRIBUTES(ID AS "ID"), XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName", SEX AS "Sex", BIRTHDATE AS "Birthdate"), XMLELEMENT (NAME "Doctors", (SELECT XMLAGG(XMLELEMENT(NAME "Doctor", XMLFOREST (FIRSTNAME AS "FirstName", LASTNAME AS "LastName") ) ) FROM PATIENTSDOCTORS, DOCTORS WHERE PATIENTS.ID = PATIENTSDOCTORS.PATIENTID AND DOCTORS.ID = PATIENTSDOCTORS.DOCTORID) ) ) FROM PATIENTS;
Write a query that creates a result set with two columns. The first column is the doctor's ID and the second column is an XML document listing the doctor's name, the prescriptions they wrote, and the patient information for each prescription:
<Doctor ID="..."> <FirstName>...</FirstName> <LastName>...</LastName> <Prescriptions> <Prescription> <Patient> <FirstName>...</FirstName> <LastName>...</LastName> </Patient> <Medication>...</Medication> <Dosage>...</Dosage> <Quantity>...</Quantity> </Prescription> ... </Prescriptions> </Doctor>
Answer:
SELECT ID, XMLELEMENT(NAME "Doctor", XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName"), XMLELEMENT (NAME "Prescriptions", (SELECT XMLAGG(XMLELEMENT(NAME "Prescription", (SELECT XMLELEMENT(NAME "Patient", XMLFOREST (FIRSTNAME AS "FirstName", LASTNAME AS "LastName") ) FROM PATIENTS WHERE PATIENTS.ID = PRESCRIPTIONS.PATIENTID), XMLFOREST (MEDICATIONNAME AS "MedicationName", DOSAGE AS "Dosage", QUANTITY AS "Quantity") ) ) FROM PRESCRIPTIONS WHERE PRESCRIPTIONS.DOCTORID = DOCTORS.ID) ) ) FROM DOCTORS;
Write a query that creates a result set with two columns. The first column is the patient's ID and the second column is the patient's diagnosis:
<Diagnosis>...</Diagnosis>
Answer:
SELECT PATIENTID, DIAGNOSIS FROM PRESCRIPTIONS;
Write a query that creates a result set with two columns. The first column is the patient's ID and the second column is a document that summarizes the diagnosis in prose form:
<Summary PatientID="..."> The patient <FirstName>...</FirstName> <LastName>...</LastName> has been diagnosed as, "<Diagnosis>...</Diagnosis>" </Summary>
Answer:
SELECT PATIENTID, XMLELEMENT(NAME "Summary", XMLATTRIBUTES (Patientid AS "PatientID"), 'The patient ', (SELECT XMLELEMENT(NAME "FirstName", Firstname) FROM PATIENTS WHERE PATIENTS.ID = PRESCRIPTIONS.PATIENTID), ' ', (SELECT XMLELEMENT(NAME "LastName", Lastname) FROM PATIENTS WHERE PATIENTS.ID = PRESCRIPTIONS.PATIENTID), ' has been diagnosed as, "', DIAGNOSIS, '"' ) FROM PRESCRIPTIONS;
Write an update query that replaces the diagnosis for patient 5 with:
<Diagnosis>Gyring and gimbling wabes</Diagnosis>
Answer:
UPDATE PRESCRIPTIONS SET DIAGNOSIS = XMLPARSE(DOCUMENT '<Diagnosis>Gyring and gimbling wabes</Diagnosis>') WHERE PATIENTID = 5;
OPTIONAL (NO CREDIT -- THIS IS JUST FOR FUN). Write a query that creates a result set with two columns. The first column is the doctor's ID and the second column is an XML document listing all of the names of the medications they prescribed. The tricky part is that the list of medication names must not contain any duplicates.
<Doctor ID="..."> <FirstName>...</FirstName> <LastName>...</LastName> <Medication>...</Medication> ... <Medication>...</Medication> </Doctor>
Answer:
SELECT ID, XMLELEMENT(NAME "Doctor", XMLATTRIBUTES (ID AS "ID"), XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName"), (SELECT XMLAGG(XMLELEMENT(NAME "MedicationName", D.M)) FROM (SELECT DISTINCT MEDICATIONNAME AS M FROM DOCTORS, PRESCRIPTIONS WHERE PRESCRIPTIONS.DOCTORID = DOCTORS.ID) AS D ) ) FROM DOCTORS;
Note that DISTINCT cannot directly be applied to the XMLELEMENT function that creates the MedicationName element, which is the obvious thing to try. The reason is that this would require the database to compare XML values, which is not allowed.
OPTIONAL (NO CREDIT -- THIS IS JUST FOR FUN). Same as (8), except that all of the diagnoses for a given patient are placed inside a single <Diagnoses> element with a PatientID attribute:
<Diagnoses PatientID="..."> <Diagnosis>...</Diagnosis> ... <Diagnosis>...</Diagnosis> </Diagnoses>
For a harder variation, only use the Prescriptions table in your query.
Answer:
SELECT ID, XMLELEMENT(NAME "Diagnoses", XMLATTRIBUTES (ID AS "PatientID"), (SELECT XMLAGG(DIAGNOSIS) FROM PRESCRIPTIONS WHERE PRESCRIPTIONS.PATIENTID = PATIENTS.ID) ) FROM PATIENTS;
or:
SELECT P.PATIENTID, XMLELEMENT(NAME "Diagnoses", XMLATTRIBUTES (P.PATIENTID AS "PatientID"), (SELECT XMLAGG(DIAGNOSIS) FROM PRESCRIPTIONS WHERE PRESCRIPTIONS.PATIENTID = P.PATIENTID) ) FROM (SELECT DISTINCT PATIENTID FROM PRESCRIPTIONS) AS P;
OPTIONAL (NO CREDIT -- THIS IS JUST FOR FUN). What rock band do the doctors play in?
There were a number of common problems people made while doing this homework:
In the final version of SQL/XML 1.0, the NAME keyword in XMLELEMENT (before the element name) and the AS keyword in XMLFOREST and XMLATTRIBUTES (between the SQL expression and the XML name) are both required. Judging by the answers people submitted, some products that implement SQL/XML do not require these keywords. Presumably this is because they implemented an earlier version of the specification that did not require them. (This is certainly true with the NAME keyword. I'm not sure about the AS keyword.)
When AS clauses are omitted from XMLFOREST and XMLATTRIBUTES, column names are used as element and attribute names. When unquoted identifiers are used as column names, they are upper-cased by the SQL processor before being used to construct element and attribute names. As a result, the element and attribute names are in upper case. Since case is significant in XML identifiers, if the desired identifier contains any lower case letters, the AS clause must be used. For example, the following call to XMLFOREST:
XMLFOREST(FirstName, LastName)
creates elements named FIRSTNAME and LASTNAME. To create elements named FirstName and LastName, use the call:
XMLFOREST(FirstName AS "FirstName", LastName AS "LastName")
or explicitly upper-case identifiers that would be upper-cased by the system so later programmers are not confused:
XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName")
XMLCONCAT requires at least two arguments. This is not clear from some of the product documentation I read (and might not even be required by some implementations), but is clear from the SQL/XML specification. Furthermore, I can think of no use cases for calling XMLCONCAT with a single argument. In the apparently obvious cases, XMLAGG should be used instead, as XMLCONCAT will not do what is expected. For example, suppose I want to create a document containing multiple PatientName elements inside a Patients element. The following query will not work:
SELECT XMLELEMENT(NAME "Patients", XMLCONCAT(XMLELEMENT(NAME "PatientName", LastName))) <==== Incorrect! FROM PATIENTS
The hope of this query is that the patient names for all rows in the PATIENTS table will be concatenated together. This fails because the select list is evaluated once for each row in the PATIENTS table. That is, for each row, a separate Patients element with a single LastName child will be constructed. (In fact, the query fails with a syntax error because XMLCONCAT only has one argument, but if one argument was allowed, then this is what would happen.)
Similarly, attempts to force all of the patient last names into a Patients element through the use of a subquery that uses XMLCONCAT will also fail. For example, the following query fails:
SELECT XMLELEMENT(NAME "Patients", XMLCONCAT((SELECT XMLELEMENT(NAME "PatientName", LastName) FROM PATIENTS))) <==== Incorrect! FROM PATIENTS
The reason this query fails is that the subquery returns a result set with more than one row, which is not allowed -- subqueries in the select list must return a result set consisting of a single row with a single column. People attempting this are probably hoping that the multiple rows are somehow processed as a list of XML values which can then be concatenated together. In fact, the system treats each row separately.
The correct way to do this is with XMLAGG, which concatenates values across rows. (XMLCONCAT concatenates values within a single row.) For example:
SELECT XMLELEMENT(NAME "Patients", XMLAGG(XMLELEMENT(NAME "PatientName", LastName))) FROM PATIENTS
XMLAGG is commonly used with correlated subqueries when there is a master / subordinate relationship between tables. For example, suppose you want to create an XML document with some header information about a patient and then a list of all the medications for the patient. In this case, the master table is PATIENTS and the subordinate table is PRESCRIPTIONS. XMLAGG is used to aggregate the prescriptions for a particular patient. For example:
SELECT ID, XMLELEMENT(NAME "Patient", XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName"), (SELECT XMLAGG(XMLELEMENT(NAME "MedicationName", MEDICATIONNAME)) FROM PRESCRIPTIONS WHERE PRESCRIPTIONS.PATIENTID = PATIENTS.ID) ) FROM PATIENTS;
Conceptually, what happens here is that a row is retrieved from the PATIENTS table and a Patient element with FirstName and LastName children is constructed from the data in that row. The query processor then executes a SELECT statement against the PRESCRIPTIONS table and retrieves only those rows that apply to the patient (PRESCRIPTIONS.PATIENTID = PATIENTS.ID). (This is why it is called a correlated subquery -- the rows in the subordinate table are correlated with the rows in the master table.) For each row in the subordinate table, it constructs a MedicationName element. It then aggregates these across all the rows that apply to the patient, creating a forest of MedicationName elements which is then inserted into the Patient element.
A common mistake is to include the name of the table used in the outer SELECT in the FROM list of the subquery. For example:
SELECT ID, XMLELEMENT(NAME "Patient", XMLFOREST(FIRSTNAME AS "FirstName", LASTNAME AS "LastName"), (SELECT XMLAGG(XMLELEMENT(NAME "MedicationName", MEDICATIONNAME)) FROM PRESCRIPTIONS, PATIENTS WHERE PRESCRIPTIONS.PATIENTID = PATIENTS.ID) /\ || Incorrect! ) FROM PATIENTS;
The problem with this is that the join condition (PRESCRIPTIONS.PATIENTID = PATIENTS.ID) is then evaluated not for the single row currently being processed by the outer select, but for all rows in the PATIENTS table. This results in the medications for all patients being included in each Patient element.
If you are having trouble conceptualizing this, consider each of the subqueries separately. The first subquery cannot be executed in the absence of the outer select statement, since there is no PATIENTS.ID column that PRESCRIPTIONS.PATIENTID can be compared against. The second subquery, on the other hand, stands by itself: It instructs the system to return a result set containing the names of medications for all patients that have a corresponding row in the PATIENTS table. Assuming that this is always true, the subquery is equivalent to:
SELECT XMLAGG(XMLELEMENT(NAME "MedicationName", MEDICATIONNAME)) FROM PRESCRIPTIONS
Clearly, this is not what is wanted, since we only want to get the prescriptions that apply to the patient currently being processed by the outer select statement.