Statistics – Understanding Basic Concepts and Dispersion

In the last post titled, Statistics – Understanding the Levels of Measurement, we have seen what variables are, and how do we measure them based on the different levels of measurement. In this post, we will talk about some of the basic concepts that are important to get started with statistics and then dive deep into the concept of dispersion.

hist1
fig: Histogram and Distribution Curve

Histogram:

A histogram is a graphical representation of the distribution of numerical data. We know the basic bar graph, but in a histogram, all the bars involved are connected or they touch each other – meaning that there is no gap between the points. Eg: Consider we have some data points (i.e values of the variable that we measured), we create this histogram by plotting the data points against their corresponding frequency of occurrence in our random sample. We then draw the distribution curve by connecting the midpoints of the bars in the histogram. So the important point to remember here is that there are bars sitting below that curve, and the process of drawing the distribution curve is – numbers -> bars -> curve.

Continue reading

Statistics – Understanding the Levels of Measurement

One of the most important and basic step in learning Statistics is understanding the levels of measurement for the variables. Let’s take a step back and first look at what a variable is? A variable is any quantity that can be measured and whose value varies through the population. For example, if we consider a population of students, the student’s nationality, marks, grades, etc are all the variables defined for the entity student, and their corresponding value will differ for each student. Looking at the larger picture, if we want to compute the average salary of the US citizens, we can go out and record the salary of each and every person to compute the average or choose a random sample from the entire population and compute the average salary for that sample, and then use the statistical tests to derive conclusions for a wider population.

The type of statistical test that can be used to derive a conclusion about the wider population depends upon the level of measurement of the variable under consideration. The level of measurement of a variable is nothing but the mathematical nature of a variable or, how a variable is measured.

Broadly, there are 4 levels of measurement for the variables –

Continue reading

Working with DB2 Database Grants

Being a database administrator and developer, one often needs to restore databases from one database server to another. In DB2, when a database is created, by default, the DB2 instance owner becomes the owner of the database. So when you restore a database backup across different database servers where both, the source and the target database servers have the same instance owner names, it works fine. But when the source and target database servers have different instance owners, the database gets restored successfully, but there can be privilege/authorization issues when you try to access the database objects. The most common being, SQL0551N and SQL1092N.

This problem can be solved by creating the source database server’s instance owner on target database server, connecting to the database through it and then granting the DBADM privileges to the new instance owner. But there is a more elegant and simpler way to do this using the DB2_RESTORE_GRANT_ADMIN_AUTHORITIES DB2 registry variable.

Continue reading

IDUG NA 2015 : Lessons from a First Time Speaker

Finally, there it was, the most exciting and awaited week of the year. I got a chance to live my dream, to be at the International DB2 User Group Tech Conference (IDUG), Philadelphia. I must confess, Its not about being at a conference, its about the whole new experience with which I will live ever after”. Being a first time attendee and speaker, the blogs on IDUG experience from Ember Crooks (at db2commerce.com) and Pavan Kristipati (at db2talk.com) really helped me. Also, slide-decks from the past IDUG conferences served a good reference and a starting point. I reached out for tips to the people I admire from the DB2 Community – Mike Krafick, Ember Crooks and Pavan Kristipati. They happily extended their helping hand towards me and Mike actually reviewed my presentation and helped me to improve it. Thank you Mike! Here is a summary of tips that these experts offered – I hope it will be of help to the first time attendees and speakers.

Continue reading

Federated Database Performance (Part-2)

In part-1 of Federated Database Performance, we discussed what federation is, how it works and a few tips at the query level to improve the federated database performance.

In part-2, we will discuss few configuration parameters that will help us in performance tuning of the federated systems.

The performance of the query which includes data access on the remote databases mainly depends on the PUSHDOWN mechanism. Almost all parameters that IBM provides for improving the performance of the SHIP operation is dependent on this mechanism.

PUSHDOWN Mechanism:

Using pushdown mechanism, DB2 optimizer tries to push as many operations as possible to the remote database server so that filtering of data can be done at the remote server and only the required data is transferred to the federated server. So using pushdown, we can filter the data remotely instead of bringing all data to the local DB2 server and then perform filtering. This will improve the performance of the database system by reducing the network traffic and data transfer cost and time.

SYSCAT.SERVEROPTIONS view has some interesting parameters which can be helpful in federated environments. SYSCAT.SERVEROPTIONS view catalogs the federated server settings for each remote data source. Each row specifies the server-specific options and corresponding values.

Few basic and important parameters are discussed below.

Continue reading

Federated Database Performance (Part-1)

IBM’s Federated Database Technology:

In large organizations, usually there are many units working independently on different modules of an application. Each unit may use different relational or even a non-relational databases for the development. There must be a technology that addresses the communication issues between these different database management systems. This is exactly where IBM’s Federated Database Technology comes into the picture.

Formally, federation technology can be described briefly as:

1. Federation allows the users and applications to access data from more than one RDBMSs in a single request.

2. Its a special type of distributed database management system that allows the distributed request across the multiple databases at the same time.

3. Federation consists of a DB2 instance, a database or federated database and one or more remote databases of same or different type.

To get more insights on federation, developerWorks article IBM Federated Database Technology is recommended.

Continue reading

Manipulating bulk data using simple Stored Procedures

Being a developer of DB2 LUW, I had written many Migration scripts where I performed data manipulation on bulk data. Most known problem for such operations on bulk data is that most of the time we face transaction log full problem:

“SQL0964C The transaction log for the database is full”

This problem occurs frequently when the amount of data is huge. For example, consider a performance database or production database table having millions of customers and we need to manipulate all the customer data for some value.

To resolve this problem, DBAs usually configure the database parameters such as LOGBUFSZ, LOGFILSIZ, LOGPRIMARY and LOGSECOND. This approach is trail and error approach where we need to set the particular values and try for the operations. Understanding these parameters is not in the scope of this post.

You can get a brief overview of Managing DB2 Transaction Log Files in Ember Crook’s post. You can understand different parameters, settings and the potential issues from this post.

I will discuss the ways to perform such bulky operations using the PL/SQL Stored Procedures.

Continue reading