|
SQL
|
TODAY'S
TIP: RAID-5
This level of RAID differs from mirroring and is similar to RAID-3 in
that it computes and writes parity information that can be used to
rebuild data in the event of a loss of one drive in the array. The
difference between this and RAID-3 is that the parity information may
be written to any drive in the array, removing the single-drive hot
spot that is encountered with RAID-3.
With
RAID-5, the data is striped across the drives and the parity information for
the strips is written to an additional drive. There is, however, one large
disadvantage to this, referred to as the write-penalty: Whenever a write occurs,
the parity data needs to be
read, re-computed, and written with the related data while the data and parity
drives are locked. RAID-5 is excellent for read-only systems, such as data
warehouses, and are useable for OLTP systems that do not incur a high transaction
volume. RAID-5 uses only
approximately 20 percent overhead (this percentage varies depending on theconfiguration
of the drives) rather than the 100 percent overhead that mirroring incurs.
Hardware-implemented RAID-5 performs much better than software implementations,
and write-cache can decrease that pesky write-penalty.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: RAID-S
This is a modified version of RAID-5. RAID-S is different from RAID-5 in that
it stripes the parity, not the data, and it implements a write-cache to remove
the RAID-5 write penalty. To implement data striping, third-party software
needs to be implemented. Without this extra software, you may experience bottlenecks
when data is being written. The write-cache improves performance by deferring
writes and performing parity calculations at a time when the system is less
busy.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: RAID-7
Another modified version of RAID-5, this one differs by implementing a write-cache.
With this implementation, data is striped and the parity data is not. The
parity data is held on one or more drives. RAID-7 and RAID-S are just two
examples of enhanced performance being applied to the RAID-5 architectures.
Most RAID-5 implementations are now delivering write-cache features.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: RAID-0+1
This is also referred to as RAID-10. It includes both data striping
(RAID-0) and mirroring (RAID-1). It is an expensive implementation due
to the number of drives that must be purchased to support disk
striping and the mirroring of every drive. It does, however, deliver
the highest degree of performance and reliability. It is the
recommended approach for implementing very high-end OLTP systems.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: ORACLE: SHOWING HIERARCHIES WITH LPAD AND
LEVEL
In a previous tip, we showed you how to traverse a hierarchy in Oracle using
the connect by and prior keywords. Here's the example we used:
Select parent_column, child_column
From Family_Tree
start with child_column = 'Adam'
connect by parent_column = PRIOR child_column;
If our table contains the following:
Parent_column
child_column
Unknown Adam
Adam Steve
Adam Bill
Bill Mary
Steve Helen
Mary Ann
The
result below will be returned, indented to show the effect of the hierarchy:
Parent_column child_column
Unknown Adam
Adam Steve
Steve Helen
Adam Bill
Bill Mary
Mary Ann
Now for the new stuff: This hierarchy can be indented in SQL using the
Level and lpad keywords. Accordingly, the query that will visually
represent a hierarchy using proprietary Oracle features becomes:
Select parent_column, LPAD(' ',4*(LEVEL-1)) || child_column
From Family_Tree
start with child_column = 'Adam'
connect by parent_column = PRIOR child_column;
In this case, the statement
LPAD(' ',4*(LEVEL-1))
will
pad the child_column (to its left) with four spaces for every level of hierarchy.
Oracle programmers commonly use this trick to display data in a more understandable
manner.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: SQL SERVER: UNIQUE IDS
In SQL Server, unique IDs for a table can be implemented with a
column's Identity property. There is, however, another option, in the
form of the uniqueidentifier datatype using a ROWGUIDCOL property. You
can have only one of these on a table (as with Identity), and this is
a globally unique identifier (GUID). The column, which takes up 32
hexadecimal characters, is 16 bytes wide. When displayed, it is
separated by four dashes. To insert a new row, use the NEWID()
function and SQL Server will automatically generate a new unique ID,
as shown here:
Create
table Party (party_id uniqueidentifier,
party_name char(50));
Insert
into Party
values NEWID(), 'new party');
Uniqueidentifier
allows null values, so this should be implemented as
Not Null in order to create a primary key on the uniqueidentifier
column.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: STORAGE TRENDS: SAN
There are some interesting new storage trends on the horizon, the
first of which is called Storage Area Networks (SAN). SAN is a network
of storage devices that manage and store data without tying up
servers. A centralized SAN allows you to tie multiple servers to a
single storage system. This arrangement provides you with greater
flexibility in managing your corporate storage. Cabling distances
allow for campus-wide as well as metropolitan areas. Networking speed
and overhead is critical to the success of this type of
implementation.
SAN
technology is still relatively immature but is expected to be a
$12 billion industry by 2002. In the meantime, SANs are relatively
expensive and have a few issues, such as the need for serverless
backup (i.e., a direct backup from storage to tape without need for an
intervening server) and secure access that allows cross-platform
servers to recognize space allocated to them (to prevent problems like
Solaris data overwriting Windows NT data). Some advantages of SANs are
that they perform at 100 Mbytes per second without message traffic,
offer centralized management, and are highly scaleable. The long-term
upside for SANs is high.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: STORAGE TRENDS: NAS
Storage Area Networks (described in our previous tip) can be compared to Network-Attached
Storage (NAS) boxes, which are storage servers that attach to a LAN. NASs
are connected to the network in the same manner as file servers, and adding
or removing a NAS system is similar to adding and removing a network node.
The positives of NAS are that the technology is here today, they deliver direct
backup, and they may perform fine on gigabit LANs. The downside of NAS is
that scalability is accomplished by adding more servers to your environment--which
incurs the overhead of passing data, messaging, and Ethernet traffic across
your LAN. This is expected to be a $6 billion industry by 2002.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: NAS AND SAN: THE DIFFERENCES
The terminology for Network-Attached Storage (NAS) and Storage Area
Networks (SAN) can be confusing. Each is often compared to the other,
but both fall under the "storage-network" umbrella. The most important
difference between them is that a SAN is channel attached while a NAS
is network attached. To database designers, this means that a SAN has
the potential to achieve higher performance, particularly when data
from a database is transactional and being retrieved at the block
level. This unique use of storage by databases is different from other
software (such as a word processor) in that it does not necessarily
deal with a complete file of data at any given time.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: ESTIMATING TABLE SIZE: THE ARITHMETIC
Estimating the size of a table is half science and half art. The
scientific part is very precise. You first need to determine the
number of transactions that will insert, update, and delete rows on
the table. This information can be retrieved through already-running
systems that are being replaced by your new table(s) or from
information gathered from your most knowledgeable business users and
data modelers. Once you have this information, perform the following
tasks:
- Determine the growth rate. For example: The table will grow at
10,000 rows per day. This should be done for a reasonable length of
time, such as one year (if that is your budget cycle). You now know
the number of rows that your table will need to accommodate.
- Determine the length of every row: Take the data type of every
column and add them up. (Include null-indicator variables.) Estimate
the average length of every varchar column and be aware of the manner
in which your DBMS stores null values. For example, if a column is
null, is the entire space for the column stored, or is a single-byte
null indicator variable stored? Add this to the row overhead.
- Find out how row updates will increase row-length. This will impact
the amount of space you will want to leave free (pctfree) to allow a
row to increase in size without being moved to another block.
- Determine the number of rows that will fit into a block. To do this,
take the block size (or page size) and subtract the block overhead.
Once you've done this, subtract the amount of space free (pctfree from
Step 3) that you want to maintain. For example, if your block size is
4096, block overhead is 96, and percent free is 10, then your usable
space is 4000 * 0.9 = 3600 bytes. Divide this by row-length (from Step
2), which also includes row overhead. If row-length plus row-overhead
is 100, then 36 rows can be stored on a block. Make sure that this
number is less than the total number of rows that can be stored in a
block. Take the lower of the two numbers.
- Divide the total number of rows (from Step 1) by the rows per block
to find out how many blocks of data will be required.
Once you've gone through all this, you'll have a pretty good idea of
the amount of space your table will need.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: ESTIMATING TABLE SIZE: DOUBLE UP
In our previous tip, I outlined the steps to determine the amount of
space that a table will take up. Armed with this information, you now
need to decide how much space you should budget for. This is where you
need to be creative. You now need to factor in your level of
confidence on the number of rows and the growth predictions you were
given. You also need to determine the best way to size varying length
columns. Should you take the maximum size or the estimated average
size? Will your percent-free estimates be sufficient? If not, your
table may require frequent reorgs or space may be wasted. In either
case, system performance will suffer. How big is the table? If it is
very small, you can easily double or triple your predictions.
When
performing space estimates, play it safe and consider doubling
your estimates. Remember that disk space is cheap, right? Well, you
may find that not to be the case--but still, the additional space will
be needed (and consumed) in short order, so don't shortchange yourself
when it comes to disk space.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S TIP: ESTIMATING INDEX SIZE
The process of determining index size is very similar to that for
table size:
- Determine the number of rows to be held by the index.
- Calculate the index length, including index overhead. Bear in mind
that some databases store varying length columns at their full size in
an index. Also, duplicate index entries may be stored differently than unique
values--so you'll need to know how your database does this and estimate the
number of duplicate values by the appropriate calculation. Unique indexes
may also have different space
requirements.
- Determine the number of index entries that will fit into a block.
- Divide the total number of rows (from Step 1) by the rows per block
to find out how many blocks will be required for a specific index.
Of course, you'll want to follow the calculations outlined for your
DBMS, since they all differ, but these basic steps apply in all cases.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: EXCEPT OPERATOR
To find rows in one set that do not exist in another set, use the
except operator (as defined in SQL-92 and SQL-99). For example, here's
how you find column1 from Table1 that does not exist in column2 of
Table2:
Select column1 from Table1
Except
Select column2 from Table2;
The
except operator will remove duplicates, and a single null value
will be returned in the case of multiple null values.
To return duplicates, use except all.
Keep
in mind, of course, that other proprietary implementations (such
as Minus in Oracle) exist.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: SQL-99: INCONSISTENCY IN ALL AND DISTINCT
You've seen the use of the All and Distinct options in operations such as
Union and Except. When using these operations, the default is Distinct. Therefore,
a statement such as
Select column_list from Table1
Union
Select column_list from Table2;
will
return only distinct (i.e., unique) values. To return duplicates,
code the All option as follows:
Select column_list from Table1
Union All
Select column_list from Table2;
This differs from the All|Distinct option at the beginning of a Select statement where the default is All. Therefore, coding the following:
Select column_list from Table1;
is the same as
Select All column_list from Table1;
Bear
in mind as you write SQL statements that these defaults are
inconsistent with one another.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: DATA INTEGRITY: REFERENTIAL INTEGRITY
In our previous tip, you saw that data integrity is defined in the relational
model through two rules. The first is entity integrity, and
the second is referential integrity. Referential integrity states that every
value of a foreign key must match a value in the primary key--or it must be
null. A foreign key may also match the value in a unique key. In this case,
the not null columns of the foreign key must match the not null columns of
the primary key.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: DATABASE IMPLEMENTED R.I.
Referential integrity (R.I.) may be implemented in one of three ways:
- In the database using declarative R.I. rules
- In the database using triggers
- In application code
Database implemented R.I. through declarative (DDL) rules is the preferred
approach, since it is implemented in the database schema and the rules are
consistent and well understood. There are cases, however, where the required
rules are more complex than declarative R.I. will allow and need to be implemented
in triggers--as are there cases where declarative R.I. does not perform well.
Keep in mind that mixing declarative R.I. with R.I. implemented in triggers can be dangerous since the order in which both rules may be executed needs to be clearly understood. It's important to know exactly when a declarative R.I. rule is being performed in relation to your trigger. You may also incur a performance penalty since your trigger and declarative R.I. may both be accessing the same data. If even one trigger is required to implement declarative R.I., consider implementing all R.I. in the triggers.
The
final option is to implement R.I. in your application code. This is a last
resort since the R.I. is not built into the database. It is
often practiced by software vendors who need to implement R.I. in a consistent
way across diverse DBMS types. It was also practiced in the past by DBMSs
that did not support triggers (like older versions of DB2).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODAY'S
TIP: DECLARATIVE R.I. AND DELETE CASCADE
When defining declarative database R.I., one option is to implement the rule
as Delete Cascade. With this option, a delete of a parent row (the row containing
the primary key) will also delete all foreign key rows. Here is an example
of a Delete Cascade that creates a primary key on the parent table and a foreign
key on the child table:
Create table Table1 (column1 smallint, column2 smallint) Constraint Table1_P Primary Key (column1);
Create table Table2 (column_A smallint, column_B smallint); Alter table Table_2
add constraint Table1_Table2_fk
Foreign Key (column_A) references Table1(column1) on Delete Cascade;
When
implementing this option, it's important that a multilevel Delete has consistent
R.I. actions. If you have a three-level hierarchy and the first level implements
a Delete Cascade and the second level implements Delete Restrict, the Delete
will be rolled back--which may not be your desired result.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Copyright acontractorsworld.com
©, All rights reserved.
|