Tuesday, December 8, 2015

Performance improvement methods and my understanding of index



Perormance improvement methods and my understanding of index to improve performance of SQL queries:





Some of the terminology worth knowing:
Tablespace
Tablespaces are a logical organization of space.
Tablespaces owns the database’s datafiles.
Database objects are stored within tablespaces.

Database: a logical collection of shared data stored in tablespaces.
File: A physical datafile belonging to a single tablespace.
Segment: A set of one or more extents that contains all the data for a specific structure within a tablespace.  (TABLE, INDEX, CLUSTER, ROLLBACK, TEMP, etc.)
Extent: A set of contiguous data blocks with in a database that make up a segment.
Block: One or more physical file blocks allocated from an existing datafile.

TYPES OF INDEX:

B-TREE index.
Bitmap index.


What is a b-tree index?
In a tree, records are stored in locations called leaves. The starting point is called the root. The maximum number of children per node is called the order of the tree. The maximum number of access operations required to reach the desired leaf (data stored on the leaf) is called the depth (level). Oracle indexes are balanced b-trees; the order is the same at every node and the depth is the same for every leaf.

The bigger the order, the more leaves and nodes you can put at a certain depth. This means that there are fewer levels to traverse to get to the leaf (which contains the data you want). In  all balanced b-trees, the number of hops to a leaf == depth


Characteristic of Bitmap Indexes

For columns with very few unique values (low cardinality)
Columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 %  that the column is ideal candidate, consider also 0.2% – 1%)
Tables that have no or little insert/update are good candidates (static data in warehouse)
Stream of bits: each bit relates to a column value in a single row of table
create bitmap index person_region on person (region);        Row     Region   North   East   West   South       1       North        1             0          0         0       2       East          0              1          0        0       3       West         0             0          1         0       4       West         0             0          1         0       5       South        0             0          0         1       6       North        1             0          0         0

Statistics generated include the following:
Table statistics
Number of rows
Number of blocks
Average row length
Column statistics
Number of distinct values (NDV) in column
Number of nulls in column
Data distribution (histogram)
Index statistics
Number of leaf blocks
Levels
Clustering factor
System statistics
I/O performance and utilization
CPU performance and utilization


To view statistics in the data dictionary, query the appropriate data dictionary view. The following list shows the DBA_ views:
DBA_TABLES
DBA_TAB_COL_STATISTICS
DBA_INDEXES



To verify that index statistics are available and decide which are the best indexes to use in an application, query the data dictionary view DBA_INDEXES

SQL> SELECT INDEX_NAME "NAME", NUM_ROWS, DISTINCT_KEYS "DISTINCT",  1  LEAF_BLOCKS, CLUSTERING_FACTOR "CF", BLEVEL "LEVEL",  2  AVG_LEAF_BLOCKS_PER_KEY "ALFBPKEY"  3  FROM DBA_INDEXES  4  WHERE owner = 'SH'  5* ORDER BY INDEX_NAME;


The optimizer uses the following criteria when determining which index to use:
Number of rows in the index (cardinality).
Number of distinct keys. These define the selectivity of the index.
Level or height of the index. This indicates how deeply the data 'probe' must search in order to find the data.
Number of leaf blocks in the index. This is the number of I/Os needed to find the desired rows of data.
Clustering factor (CF). This is the collocation amount of the index block relative to data blocks. The higher the CF, the less likely the optimizer is to select this index.

How index are created?

Automatically : A unique index is created automatically when you define a PRIMARY KEY or UNIQUE KEY constraint in a table definition. The name of index is name given to constraint.
 Manually :     Users can create non unique indexes on columns to speed up access to the rows.

Create an index on one or more columns
    CREATE INDEX index_name
    ON table (column[, column]…);
 Example :
    CREATE INDEX emp_last-name_idx
    ON employees(last_name);
This will speed up query access to the LAST_NAME
 column in the EMPLOYEES table.


Create an index if
 A column contains a wide range of values
 A column contains a large number of null values
 One or more columns are frequently used together in a WHERE clause or a join condition
 The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows


When not to create an Index

It is usually not worth creating an Index if :
 The table is small
 The columns are not often used as a condition in the query
 Most queries are expected to retrieve more than 2 to 4 percent of rows in the table
 The table is updated frequently
 The indexed columns are referenced as part of an expression


Confirming Indexes:
Check the Index creation and its fields in
 The USER_INDEXES data dictionary view contains the name of the index and its uniqueness
 The USER_IND_COLUMNS view contains the index name, the table name, and the column name.
Example:
     SELECT ic.index_name,ic.column_name, ix.uniqueness
     FROM user_indexes ix, user_ind_columns ic
     WHERE ic.index_name=ix.index_name
     AND ic.table_name=‘EMPLOYEES’;
Function-Based Indexes:
The index expression is built from table columns, constants, SQL functions, and user-defined functions.
    CREATE INDEX upper_dept_name_idx
    ON departments(UPPER(department_name));
 A function based index is an index based on expressions


Removing an Index:
Remove an index from the data dictionary by using the DROP INDEX command.
    DROP INDEX index;
 To drop an index, you must be the owner of the index or have the DROP ANY INDEX privilege
Example :
    DROP INDEX upper_last_name-idx;

Index Selectivity:

B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table. As a general guideline, we should create indexes on tables that are often queried for less than 15% of the table's rows. This value may be higher in situations where all data can be retrieved from an index, or where the indexed columns can be used for joining to other tables.
The ratio of the number of distinct values in the indexed column / columns to the number of records in the table represents the selectivity of an index. The ideal selectivity is 1. Such a selectivity can be reached only by unique indexes on NOT NULL columns.


Example with good Selectivity

A table having 100'000 records and one of its indexed column has 88000 distinct values, then the selectivity of this index is 88'000 / 10'0000 = 0.88.
Oracle implicitly creates indexes on the columns of all unique and primary keys that you define with integrity constraints. These indexes are the most selective and the most effective in optimizing performance. The selectivity of an index is the percentage of rows in a table having the same value for the indexed column. An index's selectivity is good if few rows have the same value.

Example with bad Selectivity:

lf an index on a table of 100'000 records had only 500 distinct values, then the index's selectivity is 500 / 100'000 = 0.005 and in this case a query which uses the limitation of such an index will return 100'000 / 500 = 200 records for each distinct value. It is evident that a full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table.


How to Measure Index Selectivity ?

Manually measure index selectivity
The ratio of the number of distinct values to the total number of rows is the selectivity of the columns. This method is useful to estimate the selectivity of an index before creating it.
select count (distinct job) "Distinct Values" from emp;Distinct Values---------------            5
select count(*) "Total Number Rows" from emp;Total Number Rows-----------------           14
Selectivity = Distinct Values / Total Number Rows            = 5 / 14        = 0.35


Automatically measure index selectivity:

We can determine the selectivity of an index by dividing the number of distinct indexed values by the number of rows in the table.
create index idx_emp_job on emp(job); analyze table emp compute statistics;
select distinct_keys from user_indexes where table_name = 'EMP' and index_name = 'IDX_EMP_JOB';
DISTINCT_KEYS-------------          5
select num_rows from user_tables where table_name = 'EMP';
NUM_ROWS---------      14 Selectivity = DISTINCT_KEYS / NUM_ROWS = 0.35


How to choose Composite Indexes ?

A composite index contains more than one key column. Composite indexes can provide additional advantages over single column indexes.

BETTER SELECTIVITY  Sometimes two or more columns, each with poor selectivity, can be combined to form a composite index with good selectivity.


LEADING PORTION OF COMPOSITE INDEX:

An SQL statement can use an access path involving a composite index if the statement contains constructs that use a leading portion of the index. A leading portion of an index is a set of one or more columns that were specified first and consecutively in the list of columns in the CREATE INDEX statement that created the index. Consider this CREATE INDEX statement:
CREATE INDEX idx_composite ON EMP (x, y, z);
These combinations of columns are leading portions of the index: X, XY, and XYZ. These combinations of columns are not leading portions of the index: YZ and Z.


SQL Optimization techniques:

TIP 1: SQL cannot be shared within Oracle unless it is absolutely identical.  Statements must have match exactly in case, white space and underlying schema objects to be shared within Oracle's memory.  Oracle avoids the parsing step for each subsequent use of an identical statement

Writing efficient query:

TIP 2:  Beware of WHERE clauses which do not use indexes at all.  Even if there is an index over a column that is referenced by a WHERE clause included in this section, Oracle will ignore the index.  All of these WHERE clauses can be re-written to use an index while returning the same values.  In other words, don't perform operations on database objects referenced in the WHERE clause.

Do Not Use :
SELECT account_name, trans_date, amount  FROM transaction  WHERE SUBSTR(account_name,1,7) = 'CAPITAL';

In case of above use the below:

SELECT account_name, trans_date, amount  FROM transaction  WHERE account_name LIKE 'CAPITAL%';

Do not use:

SELECT account_name, trans_date, amount  FROM transaction  WHERE TRUNC (trans_date) = TRUNC (SYSDATE);

In case of above use the below:

SELECT account_name, trans_date, amount  FROM transaction  WHERE trans_date BETWEEN TRUNC (SYSDATE) AND TRUNC (SYSDATE) + .99999;

Do not use:
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount + 3000 < 5000;
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount != 0;
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount  NOT  NULL;
 In case of above use the following;
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount < 2000;
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount > 0;
SELECT account_name, trans_date, amount  FROM transaction  WHERE amount > 0;


TIP 3:  Avoid including a HAVING clause in SELECT statements.  The HAVING clause filters selected rows only after all rows have been fetched.  Using a WHERE clause helps reduce overheads in sorting, summing, etc.  HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause.

Do not use;
SELECT region, AVG (loc_size)  FROM location  GROUP BY region  HAVING region != 'SYDNEY'  AND region != 'PERTH';
Use the below:
SELECT region, AVG (loc_size)  FROM location  WHERE region != 'SYDNEY'  AND region != 'PERTH';  GROUP BY region;

TIP 4:  Minimize the number of table lookups (subquery blocks) in queries, particularly if your statements include subquery SELECTs or multicolumn UPDATEs.

Do not use:

SELECT emp_name  FROM emp  WHERE emp_cat = (SELECT MAX (category)  FROM emp_categories)  AND emp_range = (SELECT MAX (sal_range)  FROM emp_categories)  AND emp_dept = 0020;


Use the below:
SELECT emp_name  FROM emp  WHERE (emp_cat, sal_range)         = (SELECT MAX (category), MAX (sal_range)                    FROM emp_categories)  AND emp_dept = 0020;


TIP 5:  Consider whether a UNION ALL will suffice in place of a UNION.  The UNION clause forces all rows returned by each portion of the UNION to be sorted and merged and duplicates to be filtered before the first row is returned.  A UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter.  If your tables are mutually exclusive (include no duplicate records), or you don't care if duplicates are returned, the UNION ALL is much more efficient.

UNION
SELECT acct_num, balance_amt  FROM debit_transactions  WHERE tran_date = '31-DEC-95'  UNION  SELECT acct_num, balance_amt  FROM credit_transactions  WHERE tran_date = '31-DEC-95

VS UNION ALL

SELECT acct_num, balance_amt  FROM debit_transactions  WHERE tran_date = '31-DEC-95'  UNION ALL  SELECT acct_num, balance_amt  FROM credit_transactions  WHERE tran_date = '31-DEC-95';

TIP 6:  Oracle automatically performs simple column type conversions (or casting) when it compares columns of different types.  Depending on the type of conversion, indexes may not be used.  Make sure you declare your program variables as the same type as your Oracle columns, if the type is supported in the programming language you are using

Tip 7:Try to avoid to many joins in the where predicate.


Writing efficient query contd:

Note these considerations that apply to the cost-based and rule-based approaches:

The optimizer first determines whether joining two or more of the tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables.

For join statements with outer join conditions, the table with the outer join operator must come after the other table in the condition in the join order. The optimizer does not consider join orders that violate this rule.

Tip 8: Avoid writing correlated subquery if possible.
With a correlated subquery, the database must run the subquery for each evaluation because it is based on the outer query’s data. 
E.g. select  book_key,store_key,quantity from sales s where quantity < (select max(quantity) from sales where book_key = s.book_key);


Tip 9:JOIN-columns should be indexed. JOIN columns or Foreign Key columns may be indexed since queries based on these columns can be expected to be very frequent.

Tip 10:Do not create indexes for small tables. It increases database overheads

HINTS  are the comments used in a SQL statement to pass instructions to the Oracle optimizer. The optimizer uses these hints as suggestions for choosing an execution plan for the statement.
Hints are compiler directives that change the execution plan for SQL statements. Of course, if the Oracle cost-based optimizer (CBO) and rule-based optimizer (RBO) always made optimal execution plans, hints would not be necessary.


When to use Hints?

You can use hints to specify the following:
The optimization approach for a SQL statement
The goal of the cost-based optimizer for a SQL statement
The access path for a table accessed by the statement
The join order for a join statement
A join operation in a join statement


Classes of Hints:

There are two classes of hints:
General mode hints—This class of hints changes the overall execution mode for Oracle queries. These hints include /*+ rule */ and /*+ first_rows */.
Detailed directive hints—This class of hints directs specific access paths, such as the use of a hash join over a nested loop join, /*+ use_hash */, and the use of a specific index to access a table

No comments: