SQL
Template:About Template:Lead too short Template:Reorganise Template:Infobox programming language
SQL (Template:IPAc-en,<ref name="learningSQL" /> or Template:IPAc-en;<ref name="oed" /> Structured Query Language<ref name="Britannica" /><ref name="oed-US" /><ref name="IBM-SQL" /><ref name="MS-SQL-def" />) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).
Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language, data manipulation language, and a data control language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks."<ref name="codd-relational-model" /> Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language.<ref name="SQL-Fundamentals" /><ref name="IBM-sql" />
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987.<ref name="ISO/IEC" /> Since then, the standard has been revised to include a larger set of features. Despite the existence of such standards, though, most SQL code is not completely portable among different database systems without adjustments.
History
SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s.<ref name="chamberlin-boyce-sequel" /> This version, initially called SEQUEL (Structured English QUEry Language), was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed during the 1970s.<ref name="chamberlin-boyce-sequel" /> The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.<ref name="oppel-databases" />
In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw the potential of the concepts described by Codd, Chamberlin, and Boyce, and developed their own SQL-based RDBMS with aspirations of selling it to the U.S. Navy, Central Intelligence Agency, and other U.S. government agencies. In June 1979, Relational Software, Inc. introduced the first commercially available implementation of SQL, Oracle V2 (Version2) for VAX computers.
After testing SQL at customer test sites to determine the usefulness and practicality of the system, IBM began developing commercial products based on their System R prototype including System/38, SQL/DS, and DB2, which were commercially available in 1979, 1981, and 1983, respectively.<ref name="IBM-history" />
Design
SQL deviates in several ways from its theoretical foundation, the relational model and its tuple calculus. In that model, a table is a set of tuples, while in SQL, tables and query results are lists of rows: the same row may occur multiple times, and the order of rows can be employed in queries (e.g. in the LIMIT clause).
Critics argue that SQL should be replaced with a language that strictly returns to the original foundation: for example, see The Third Manifesto.
Syntax
Language elements
The SQL language is subdivided into several language elements, including:
- Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)<ref name="ANSI/ISO/IEC" />
- Expressions, which can produce either scalar values, or tables consisting of columns and rows of data
- Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) (true/false/unknown) or Boolean truth values and are used to limit the effects of statements and queries, or to change program flow.
- Queries, which retrieve the data based on specific criteria. This is an important element of SQL.
- Statements, which may have a persistent effect on schemata and data, or may control transactions, program flow, connections, sessions, or diagnostics.
- SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.
- Insignificant whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.
Operators
Operator | Description | Example |
---|---|---|
=
|
Equal to | Template:Code |
<>
|
Not equal to (many DBMSs accept != in addition to <> )
|
Template:Code |
>
|
Greater than | Template:Code |
<
|
Less than | Template:Code |
>=
|
Greater than or equal | Template:Code |
<=
|
Less than or equal | Template:Code |
Template:Code | Between an inclusive range | Template:Code |
[[Where_(SQL)#LIKE|Template:Code]] | Match a character pattern | Template:Code |
Template:Code | Equal to one of multiple possible values | Template:Code |
Template:Code or Template:Code | Compare to null (missing data) | Template:Code |
Template:Code | Is equal to value or both are nulls (missing data) | Template:Code |
Template:Code | Used to change a field name when viewing results | Template:Code |
Other operators have at times been suggested and/or implemented, such as the skyline operator (for finding only those records that are not 'worse' than any others).
SQL has the case/when/then/else/end
expression, which was introduced in SQL-92. In its most general form, which is called a "searched case" in the SQL standard, it works like else if in other programming languages:
<source lang="sql"> CASE WHEN n > 0
THEN 'positive' WHEN n < 0 THEN 'negative' ELSE 'zero'
END </source>
SQL tests WHEN
conditions in the order they appear in the source. If the source does not specify an ELSE
expression, SQL defaults to ELSE NULL
. An abbreviated syntax—called "simple case" in the SQL standard—mirrors switch statements:
<source lang="sql"> CASE n WHEN 1
THEN 'one' WHEN 2 THEN 'two' ELSE 'I cannot count that high'
END </source>
This syntax uses implicit equality comparisons, with the usual caveats for comparing with NULL.
For the Oracle-SQL dialect, the latter can be shortened to an equivalent DECODE
construct:
<source lang="oracle11"> SELECT DECODE(n, 1, 'one',
2, 'two', 'i cannot count that high')
FROM some_table; </source>
The last value is the default; if none is specified, it also defaults to NULL
.
However, unlike the standard's "simple case", Oracle's DECODE
considers two NULL
s equal with each other.<ref name="DECODE" />
Queries
The most common operation in SQL, the query, makes use of the declarative SELECT
statement. SELECT
retrieves data from one or more tables, or expressions. Standard SELECT
statements have no persistent effects on the database. Some non-standard implementations of SELECT
can have persistent effects, such as the SELECT INTO
syntax provided in some databases.<ref name="ms-sql-select-into" />
Queries allow the user to describe desired data, leaving the database management system (DBMS) to carry out planning, optimizing, and performing the physical operations necessary to produce that result as it chooses.
A query includes a list of columns to include in the final result, normally immediately following the SELECT
keyword. An asterisk ("*
") can be used to specify that the query should return all columns of the queried tables. SELECT
is the most complex statement in SQL, with optional keywords and clauses that include:
- The
FROM
clause, which indicates the table(s) to retrieve data from. TheFROM
clause can include optionalJOIN
subclauses to specify the rules for joining tables. - The
WHERE
clause includes a comparison predicate, which restricts the rows returned by the query. TheWHERE
clause eliminates all rows from the result set where the comparison predicate does not evaluate to True. - The
GROUP BY
clause projects rows having common values into a smaller set of rows.GROUP BY
is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. TheWHERE
clause is applied before theGROUP BY
clause. - The
HAVING
clause includes a predicate used to filter rows resulting from theGROUP BY
clause. Because it acts on the results of theGROUP BY
clause, aggregation functions can be used in theHAVING
clause predicate. - The
ORDER BY
clause identifies which column[s] to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without anORDER BY
clause, the order of rows returned by an SQL query is undefined. - The
DISTINCT
keyword<ref>
Template:Cite book </ref> eliminates duplicate data.<ref> Template:Cite book </ref>
The following example of a SELECT
query returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set.
<source lang="sql"> SELECT *
FROM Book WHERE price > 100.00 ORDER BY title;
</source>
The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.
<source lang="sql"> SELECT Book.title AS Title,
count(*) AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title;
</source>
Example output might resemble the following:
Title Authors ---------------------- ------- SQL Examples and Guide 4 The Joy of SQL 1 An Introduction to SQL 2 Pitfalls of SQL 1
Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the Book table, one could re-write the query above in the following form:
<source lang="sql"> SELECT title,
count(*) AS Authors FROM Book NATURAL JOIN Book_author GROUP BY title;
</source>
However, manyTemplate:Quantify vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively.
SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select list to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price.
<source lang="sql"> SELECT isbn,
title, price, price * 0.06 AS sales_tax FROM Book WHERE price > 100.00 ORDER BY title;
</source>
Subqueries
Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a subquery. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases, the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function AVG
receives as input the result of a subquery:
<source lang="sql"> SELECT isbn,
title, price FROM Book WHERE price < (SELECT AVG(price) FROM Book) ORDER BY title;
</source>
A subquery can use values from the outer query, in which case it is known as a correlated subquery.
Since 1999 the SQL standard allows named subqueries called common table expressions (named and designed after the IBM DB2 version 2 implementation; Oracle calls these subquery factoring). CTEs can also be recursive by referring to themselves; the resulting mechanism allows tree or graph traversals (when represented as relations), and more generally fixpoint computations.
Inline view
An Inline view is the use of referencing an SQL subquery in a FROM clause. Essentially, the inline view is a subquery that can be selected from or joined to. Inline View functionality allows the user to reference the subquery as a table. The inline view also is referred to as a derived table or a subselect. Inline view functionality was introduced in Oracle 9i.<ref name="Inline View/Derived Table" />
In the following example, the SQL statement involves a join from the initial Books table to the Inline view "Sales". This inline view captures associated book sales information using the ISBN to join to the Books table. As a result, the inline view provides the result set with additional columns (the number of items sold and the company that sold the books):
<source lang="sql"> SELECT b.isbn, b.title, b.price, sales.items_sold, sales.company_nm FROM Book b
JOIN (SELECT SUM(Items_Sold) Items_Sold, Company_Nm, ISBN FROM Book_Sales GROUP BY Company_Nm, ISBN) sales ON sales.isbn = b.isbn
</source>
Null or three-valued logic (3VL)
The concept of Null was introducedTemplate:By whom into SQL to handle missing information in the relational model. The word NULL
is a reserved keyword in SQL, used to identify the Null special marker. Comparisons with Null, for instance equality (=) in WHERE clauses, results in an Unknown truth value. In SELECT statements SQL returns only results for which the WHERE clause returns a value of True; i.e., it excludes results with values of False and also excludes those whose value is Unknown.
Along with True and False, the Unknown resulting from direct comparisons with Null thus brings a fragment of three-valued logic to SQL. The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Lukasiewicz three-valued logic (which differ in their definition of implication, however SQL defines no such operation).<ref name="Klein" />
|