IBM’s Federated Database Technology:
In large organizations, usually there are many units working independently on different modules of an application. Each unit may use different relational or even a non-relational databases for the development. There must be a technology that addresses the communication issues between these different database management systems. This is exactly where IBM’s Federated Database Technology comes into the picture.
Formally, federation technology can be described briefly as:
1. Federation allows the users and applications to access data from more than one RDBMSs in a single request.
2. Its a special type of distributed database management system that allows the distributed request across the multiple databases at the same time.
3. Federation consists of a DB2 instance, a database or federated database and one or more remote databases of same or different type.
To get more insights on federation, developerWorks article IBM Federated Database Technology is recommended.
Working of Federated System:
Applications can communicate with a federated server with any supported interface like JDBC or ODBC etc.
The federated server communicates with the data sources by means of software modules called Wrappers. Once the wrapper is installed federated system need to told where the wrapper is located this is done by using the create wrapper statement.
db2 “create wrapper net8″
For the multiple sources of same type only single wrapper is needed. For example multiple oracle data source can use the single wrapper for the communication.
Every remote data source is identified by the federated system by using the create server statement. For multiple oracle data sources we will need multiple create server statement.
db2 “create server ora_server type oracle version 11.1.0 wrapper net8 options (node ‘ORCL’)”
Once the wrapper and server is created the data at the remote data source can be accessed by creating the nicknames for the equivalent table in the remote data source in case of RDBMS.
Understanding the Federated Query Execution:
Before understanding the query performance we need to look at the compilation steps included in the Query compilation.
a) The input query is parsed and semantics are checked for the query.
b) Based on the remote data source information available in the wrappers and using the push down analysis, optimizer re-writes the query and then chose the plan with lowest cost.
c) Remote SQL generation for the data source takes place. This is important because filtering or grouping of the data at the remote data source is always critical for the performance rather than to bring whole data to the local server and then performing the filtering operation which causes bottlenecks due to the network constraints.
d) Once the remote SQL is generated, local data is gathered using the local SQL generation.
Performance issues are very common in the federated database. Following are the few bottlenecks:
1) Network performance:
This is one of the severe bottleneck in the performance of a federated query since the data is fetched from the remote data source and is brought to the local federated system over the network.
We can check the round trip time of the database by using the ping command from the federated database.
2) Federated Server Performance:
Performance of the federated server in terms of the CPU utilization, I/O Utilization, paging space utilization needs to be checked. If high, it can hamper the performance.
3) Size of the data in remote data source:
Size of the data in the remote data source is huge then the time required for the filtering, grouping and bringing it on local will be expensive.
4) Operators that are not pushed down can also significantly affect query performance:
If operations such as GROUP BY/ORDER BY can not be performed on the remote data source by any case then we will need to bring whole data on local server and then perform those operation which will be very expensive. This might occur due to the limitations like some functionality is not provided by the remote data source in that case you need to bring whole data set to the federated system over the network.
How to improve the Performance:
There are different ways to improve the performance of the federated database application.
1) Improving performance at query level
2) Improving performance at configuration level
In this post, I am focusing on the Query Level performance improvements. configuration level performance tips will be discussed in the next edition of the same topic.
1. Use of MQTs:
Using MQTs for caching the data locally will help to improve the performance of the query to a great extent. It will also ensure the data availability in case if the remote server is down for some reason.
2. Filtering and grouping results at the data source:
As mentioned earlier, Filtering and grouping the data at the remote data source and bringing only the required data instead of full data will save the network overhead and will help to improve the performance of the query.
3. Index Information/Statistics:
Updated statistics of the remote objects as well as the local will help the optimizer to choose the most effective plan for execution.
4. Tuning of the sort heap and the bufferpools associated with the temporary tablespace:
Higher the value, more is the need to investigate into the Sorting related setting of the federated database. When sort overflows occur, data is written to the temporary tablespaces that are associated with a buffer pool. When the buffer pool fills, data is overflowed to disk. This can result in poor query performance. This can result in poor query performance. For this we need to Tune the SORTHEAP and/or SHEAPTHRES configuration parameters to minimize the percent of sort overflows.
More on Using MQTs:
Benefits of Using MQTs:
A materialized query table (MQT) is a table that materializes the pre-computed result of a query involving one or more tables or nicknames.
Caching data using MQTs and allowing the optimizer to make the cost based decision to use the MQT can be useful to improve the performance of the Query in the federated system. Creating the MQT which involves nicknames makes the remote data locally available which consequently reduces the network cost of the bringing the data from the remote host over the network. If the remote data source from where the data is needed is unavailable, in that case also we can execute the query since the data is locally populated in MQT. Use of MQTs also reduces the consumption of the CPU and also avoids the overloading of the data on the local federated server.
Tips for Using MQTs:
1. Use of the design advisor (db2advis) to create the MQT that will help the query workload to improve the performance.
db2advis –d -i –m M –o
2. When we create MQT, it goes in the check-pending state. Use set integrity statement to bring the MQT out of check-pending state.
set integrity for all immediate unchecked;
3. REFRESH IMMEDIATE option will not be available for the MQTs which are referring to the nicknames.
4. Do not create too many MQTs. because it will take more time for the optimizer to choose the MQT from the pool and will increase the execution time. Sometimes optimizer may not choose the best MQT for the execution.
5. If the MQT is created at the lower isolation level than that of the query’s isolation level, then isolation level at which MQT is created will be ignored.
Enable MQT to be considered for the optimization:
1. Use CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION Special Register
2. DFT_MTTB_TYPES database configuration parameter.
3. Set the REFRESH AGE special register or setting the DFT_REFRESH_AGE database configuration parameter