Friday, May 28, 2010

Research Methodology

MASTER OF BUSINESS ADMINISTRATION
(INDUSTRY INTEGRATED)

TWO YEAR FULL TIME INDUSTRY INTEGRATED
M.B.A PROGRAMME

RESEARCH METHODOLOGY










Detailed Curriculum
Annamalai University Courses

Lesson – 1
NATURE AND SCOPE OF RESEARCH
Research - Meaning -- Types - Nature and scope of research - Problem formulation ¬statement of research objective - value and cost of information - Decision theory ¬Organizational structure of research. Research process - research designs - exploratory ¬Descriptive - Experimental research.
OBJECTIVE
To equip the students with the basic understanding of the research methodology and provide an insight into the application of modern analytical tools and techniques for the purpose of management decision making.
STRUCTURE
 Value of Research
 Scope of Research
 Types of Research
 Structure of Research
LEARNING OBJECTIVES
• To understand the importance of business research as a management decision making tool
• To define business research
• To understand the difference between basic and applied research
• To understand when business research in needed and when it should be conducted
• To identify various topics for business research.
INTRODUCTION
The task of business research is to generate accurate information for use in decision making. The emphasis of business research is shifting the decision makers from intuitive information that is based on own judgment and gathering information into systematic and objective investigation.
DEFINITION
The business research is defined as the systematic and objective process of gathering, recording and analyzing data for aid in making business decisions. Literally, research means to "search again". It connotes patient study and scientific investigation where in the research takes more careful look to discover to know about the subject of study. The data collected and analyzed are to be accurate, the research need to be very objective. Thus, the role of researcher is to be detached and impersonal rather than engaging in biased attempt. This means without objectivity the research is useless. The definition is restricted to take decision in the aspects of business alone. This generates and provides the necessary qualitative and quantitative information upon which, as base, the decisions are taken. This information reducing the uncertainly of decisions and reduces the risk of making wrong decisions. However research should be an "aid" to managerial judgment, not a substitute for it. There is more to management than research. Applying research remains a managerial art.
The study of research methods provides with the knowledge and skills that need to solve the problems and meet the challenges of the fast-paced decision making environment. There are two important factors stimulate an interest in scientific approach to decision making. They are as follows
1. There is an increased need for more and better information, and
2. The availability of technical tools to meet this need.
During the last decade, we have witness dramatic changes in the business environment. These changes have created new knowledge needs for the manager to consider when evaluating any decision. The trend toward complexity has increased the risk associated with business decision, making it more important to have sound information base. The following are the few reasons which makes the researcher to lookout for newer and better information based on which the decisions are taken.
• There are more variables to consider in every decision
• More knowledge exists in every field of management
• The quality and theory and models to explain the tactics and strategic results are improving.
• Better arrangement of information
• Advance in computer allowed to create better database.
• The power and ease of use of today’s computer have given to capability to analyze the data to solve managerial problems
The development of scientific method in business research lags behind the similar developments in physical science research which is more rigorous and much more advanced. But business research is of recent origin and moreover the finding cannot be patented that of physical science research. Business research normally deals with topics such as human attitudes, behavior and performance. Even with these hindrances, business research is making strides in the scientific arena. Hence, the managers who are not proposed for this scientific application in business research will be at severe disadvantage.
VALUE OF BUSINESS RESEARCH
The prime value of business research is that it reduces uncertainty by providing information that improves decision making process. The decision making process is associated with the development and implementation of a strategy involves the following

1. IDENTIFYING PROBLEMS OR OPPORTUNITIES
Before any strategy can be developed, an organization must determine where it wants to go and how it will get there. Business research can help managers to plan strategies by determining the nature of situations or by identifying the existence of problems of opportunities that are present in the organisation. Business research may be used as a scanning activity to provide information about what business happening or its environment. Once it defines and indicates problems and opportunities, managers may evaluate alternatives very easily and clear enough to make a decision.
2. DIAGNOSING AND ASSESSING PROBLEMS AND OPPORTUNITIES
The important aspect of business research is the provision of diagnostic information that clarifies the situation. It there is a problem, they need to specify what happened and why. If an opportunity exists, they need to explore, clarify and refine the nature of opportunity. This will help in developing alternative courses of action that are practical.
3. SELECTING AND IMPLEMENTING A COURSE OF ACTION
After the alternative course of action has been clearly identified, business research is conducted to obtain scientific information which will aid in evaluating the alternatives and selecting the best course of action.
NEED FOR RESEARCH
When a manager faced with two or more possible course if action, the researcher carefully need to take decision whether or not to conduct the research. Hence, the following are the determinants, to be thrown lights on it.
1. TIME CONSTRAINTS
In most of the business environment, the decisions most must be made immediately, but conducting research systematically takes time. There will not be much time to relay on research. As a consequence, the decisions are sometimes made without adequate information and through understanding of the situation.
2. AVAILABILITY OF DATA
Often managers possess enough information with no research. When they lack adequate information, research must be considered. The managers should think whether the research will able to generate information needed to answer the basic question about which the decision is to be taken.
3. NATURE OF INFORMATION
The value of research will depend upon the nature of decisions to be made. A routine decision does not require substantial information or warrants any research. However, for important and strategic decision, more likely research needs to be conducted.
4. BENEFITS VS. COSTS
The decision to conduct research boils down to these important questions.
1. Will the rate of return be worth the investment?
2. Will the information improve the quality of the decision? and
3. Is the research expenditure the best use of available funds?
Thus, the cost of information should not exceed the benefits i.e., value of information.
WHAT IS GOOD RESEARCH?
Good research generates dependable data can be used reliable for making managerial decisions. The following are the tips of good research.
• Purpose clearly defined, i.e., understanding problems clearly
• The process described in sufficient details
• The design carefully planned to yield results
• Careful consideration must be given and maintain high ethical standards
• Limitations properly revealed
• Adequate analysis of the data and appropriate tools used
• Presentation of data should be comprehensive, early understood and presented unambiguously
• Conclusion should base on the data obtained and justified.
SCOPE OF RESEARCH
The scope of research on management is limited to business. Certainly research in the Production, Finance, Marketing, HR or management areas as research. A researcher conducting research within an organization may be referred as a "marketing researcher" or "organizational researcher". Although business researcher are specialized and the term encompasses all the above functional areas. The different functional areas may investigate different phenomenon, but they are comparable to one another because they use similar research methods. There are many kinds of areas are resembled in the business environment like forecasting, trends, environment, capital formation, portfolio analysis, cost analysis, risk analysis, TQM, job satisfaction, organizational effectiveness, climate, culture, market potential, segmentation, sales analysis, distribution channel, computer information needs analysis, social values and establish and etc.,
TYPES OF RESEARCH
Research is to develop and evaluate concepts and theories. In broader sense research can be classified as.
1) BASIC RESEARCH OR PURE RESEARCH
It does not directly involve the solution to a particular problem. Although basic research generally cannot be implemented, but this is conducted to verify the acceptability of a given theory or to discuss more about a certain concept.
2) APPLIED RESEARCH
It is conducted when a decision must be made about a specific real-life problem. It encompasses those studies undertaken to answer question to specific problems or to make decision about particular course of action.
However, the procedures and techniques utilized by both researcher to not differ substantially. Both employ scientific method to answer questions. Broadly, the scientific method refers to techniques and procedures that help researcher to know and understand business phenomenon. The scientific method requires systematic analysis and logical interpretation of empirical evidence (facts from observation or experimentation) to confirm or dispose prior conceptions. In basic research, it first tests the prior conceptions or assumptions or hypothesis and then makes inferences and conclusions. In the applied research the use of scientific method assures objectivity in gathering facts and taking decision.
At the outset it may be noted that there are several ways of studying and tackling a problem. There is no single perfect design. Research designs have been classified by authors in different ways. Different types of research designs have emerged on the account of the different perspectives from which the problem or opportunity is viewed. However, the research designs broadly classified into three categories – exploratory, descriptive, and causal research. The research can be classified on the basis of either technique or function. Experiment, surveys and observation are few common techniques. The technique may be qualitative or quantitative. Based on the nature of the problems or purpose of study the above three are used invariably used in management parlance.
3) EXPLORATORY RESEARCH
The focus is mainly on discovering of ideas. An exploratory research is generally based on secondary data that are already available. It is to be understood that this type of study is conducted to classify ambiguous problems. These studies provide information to use in analyzing situations. This will helps to crystallize a problem and identify information needs for further research. The purpose of exploration is usually to develop hypotheses or question for further research. The exploration may be accomplished with different techniques. Both qualitative and quantitative techniques are applicable although exploration studies relies on more heavily a qualitative technique like experience survey, focus group.
4) DESCRIPTIVE RESEARCH
The major purpose is to describe characteristics of a population or phenomenon. Descriptive research seeks to determine to answers to who, what, when, where and how questions. Unlike explorative, these studies are based on some previous understanding of the nature of the research problem. Descriptive studies can be divided into two broad categories - Cross sectional and Longitudinal. The former type is more frequently used. A cross section study is concerned with the sample of elements from a given population. It is carried out once and represents one point at time. The longitudinal studies are based on panel data or panel methods. A panel is a sample of represents who are interviewed and then re-interviewed from time to time. That is, longitudinal studies are repeated over an extended period.
5) CAUSAL RESEARCH
The main goal of causal research is the identify cause and effect relationship among variables. It attempts to establish that when we do one thing what another thing will happen. Normally explorative and descriptive studies precede causal research.
However, based on the breadth and depth of study, another method is frequently used is management called case study research. This places more emphasis on a full contextual analysis of fever events or conditions and their interrelations. An emphasis on details provides valuable insight for problem solving, evaluation and strategy. The detail is gathered from multiple sources of information.
VALUE OF INFORMATION AND COST
Over the past decade many cultural, technological are competitive factors have created a variety of new challenges, problems and opportunities for today's decision makers in business. First, the rapid advances in interactive marketing communication technologies have increased the need for database management skills. Moreover, advancements associated with the so called information super high ways have created greater emphasis on secondary data collection, analysis and interpretation. Second, there is a growing movement emphasizes quality improvements. This placed more importance on cross¬ sectional information then over before. Third, is the expansion of global markets which introduces a new set of multicultural problem and question?
These three factors that influence the research process and it take steps into seeking new information in management perspective. There may be situations where management is sufficiently clear that no additional information is likely to change its decision. In such cases, it is obvious that the value of information is negligible. In contrast, there are situations where the decisions look out for information which is not available easily. Unless the information collected does not lead to change or modify a decision, the information has no value. Generally information is most valuable in cases i) where there is unsure of what is to be done and ii) where extreme profits or losses involved. A pertinent question is'- how much information should be collected in a given situation? Since the collected information involves a cost, it is necessary to ensure that the benefit from the information is more than the cost involved in its collection.
DECISION THEORY
With reference to the above discussion, an attempt is needed to see how information can be evaluated for setting up a limit. The concept of probability is the basis of decision maker under conditions of uncertainly. These are three basic sources of assigning probabilities
1) Based on a logic / deduction: For e.g., when a coin is tossed, the probability of getting a head or tail is 0.5.
2) Past experience / Empirical evidence: The experience gained in the process resolving these problems in the past. On the basis of its past experience, it may be in a better positive to estimate the probability of new decisions.
3) Subjective Estimate: The most frequently used method, it is based on the knowledge and information with respect to researcher for the probability estimates.
The above discussion was confined to single stage problem wherein the researcher is required to select the best course of action on the basis of information available at a point of time. However, there are problems with multiple stages wherein a sequence of decisions involved. Each decision leads to a chance event which in turn influences the next decision. In those cases, a Decision Tree Analysis i.e. graphical derives depicting the sequence of action-event combination, will be useful in making a choice between two alternatives. If the decision tree is not helpful, more sophisticated technological known as Bayesian Analysis can be used. Here, the probabilities can be revised on account of the availability of new information using prior, posterior and pre-posterior analysis.
There is a great deal between budgeting and value assessment in management decision to conduct research. An appropriate research study should help managers avoid losses and increases sales or profits; otherwise, research can be wasteful. The decision maker wants a cost-estimate for a research project and equally precise assurance that useful information will result from the research. Even if the researcher can give good cost and information estimates, the decision maker or manager still must judge whether the benefits out-weigh the costs.
Conceptually, the value of research information is not difficult to determine. In business situation the research should provide added revenues or reduce expenses. The value of research information may be judge in terms of "the difference between the results of decisions made with the information and the result that would be made without it". It is simple to state, in actual application, it presents difficult measurement problems.
GUIDELINE FOR APPROXIMATELY THE COST-TO-VALUE OF RESEARCH
1. Focus on the most important issues of the project: Identify certain issues as important and others as peripheral to the problems. Unimportant issues are only to drain resources.
2. Never try to do much: There is a limit to the amount of information that can be collected. The researcher must take a trade-off between the number of issues that can be dealt with and the depth of each issue. Therefore it is necessary to focus on those issues of greatest potential value.
3. Determine whether secondary, primary information or combination is needed: The most appropriate must be selected that should address the stated problem.
4. Analyze all potential methods of collecting information: Alternative data sources and research designs are available that will allow detailed investigation of issues at a relatively low cost.
5. Subjectively asses the value of information: The researcher need to ask some fundamental questions relating to objections. For e.g.,
a) Can the information be collected at all?
b) Can the information tell something more that already what we have?
c) Will the information provide significant insights?
d) What benefits will be delivered from this information?
STRUCTURE OF RESEARCH
Business research can take many forms, but systematic inquiry is a common thread. Systematic inquiry requires an orderly investigation. Business research is a sequence of highly interrelated activity. The steps research process overlap continuously. Nevertheless, research on management often follows a general pattern. The styles are
1. Defining the problem
2. Planning a research design
3. Planning a sample
4. Collecting data
5. Analyzing the data
6. Formulating the conclusions and preparing the report.
SUMMARY
This paper outlined the Importance of Business research. Difference between basic and applied research have been dealt in detail. This chapter has given the Meaning, scope, types, structure of the Research
KEY TERMS
• Research
• Value of research
• Need for research
• Good Research
• Scope of research
• Types of research
• Basic research
• Applied research
• Scientific method
• Exploratory research
• Descriptive research
• Cross - sectional and longitudinal
• Causal research
• Decision theory
• Structure
QUESTIONS
1. What are some examples of business research in your particular field of interest?
2. What do you mean by research? Explain its significance in modem times.
3. What is the difference between applied and basic research?
4. What is good research? Discuss the structure of business research.
5. Discuss: Explorative, descriptive and causal research.
6. Discuss the value of information and cost using decision theory.





LESSON – 2
DEFINING RESEARCH PROBLEM
OBJECTIVES
• To discuss the nature of decision makers objectives and the role they play in defining the research
• To understand that proper problem definition is essential for effective business research
• To discuss the influence of the statement of the business problem on the specific research objectives
• To state research problem in terms of clear and precise research objectives.
STRUCTURE
 Problem definition
 Situation analysis
 Measurable Symptoms
 Unit of analysis
 Hypothesis and Research objectives
PROBLEM DEFINITION
Before choosing a research design, manager and researcher need a sense of direction for the investigation. It is extremely important to define the business problem carefully because the definition determines the purposes of the research and ultimately the research design. Well defined problem is half solved problems - hence, researcher must understand how to define problem. The formal quantitative research process should not begin until the problem has been clearly defined. However, properly and completely defining a business problem is easier said than done.
• Determination of research problem consists of three important tasks namely,
• Classifying in argument information needs. Redefining research problem and
• Establishing hypothesis & research objectives.
Step 1: To ensure that appropriate information created through this process, researcher must assist decision maker in making sure the problem or opportunity has been clearly defined and the decision maker is aware of the information requirements. This include the following activities namely,
i) Purpose
ii) Understanding the situation
iii) Identifying and separating measurable symptoms
iv) Determining unit of analysis and
v) Determining relevant variables.
I) PURPOSE
Here, the decision maker holds the responsibility of addressing a recognized decision problem or opportunity. The researcher begins the process by asking the decision maker to express his or her reasons for thinking there is a need to undertake research. By this questioning process, the researcher can develop insights as what they believe to be the problems. One method that might be employed to familiarize the decision maker about the iceberg principle. The dangerous part of many problems, like submersed portion of the iceberg, is neither visible nor understood by managers. If the submerged position of the problem is omitted from the problem definition, then result may be less than optimal.
II) SITUATION ANALYSIS
To gain the complete understanding, both should perform a basic situation analysis of the circumstances surrounding the problem area. A situational analysis is a popular tool that focuses on the informal gathering of background information to familiarize the overall complexity of the decision. A situation analysis attempts to identify the event and factors that have led to the current decision problem situation. To objectively understand the client's domain (i.e., industry, competition, product line, markets etc) the researchers not rely only on the information provided by client but also others. In short the researcher must develop expertise in the client's business.
III) MEASURABLE SYMPTOMS
Once the researcher understands the overall problem situation, they must work with decision maker to separate the problems from the observable and measurable symptoms that may have been initially perceived as the being the decision problem.
IV) UNIT OF ANALYSIS
The researcher must be able to specify whether data should be collected about individual, households, organizations, departments, geographical areas, specific object or some contributions of these. The unit of analysis will provide direction in later activities such as scale measurement development and drawing appropriate sample of respondents.
V) VARIABLES
Here, the focus is on the identifying the different independent or dependent variables. It is determination of type of information (i.e., facts, estimates, predictions, relationships) and specific constructs (Le. concepts or ideas about an object, attributes, or phenomenon that are worthy measurement)
STEP 2
Once the problem is understood and specific information requirements are identified, then the researcher must redefine the problem in more specific terms. In reframing the problems and questions as information research questions, they must use their scientific knowledge expertise. Establishing research questions specific to problems will force the decision maker to provide additional information that is relevant to the actual problems. In other situations, redefining problems as research problems can lead to the establishment of research hypothesis rather than questions.
STEP 3: (HYPOTHESIS & RESEARCH OBJECTIVE)
A hypothesis is basically an unproven started of a research question in a testable format. Hypothetical statement can be formulated about any variable and can express a possible relationship between two or more variables. While research questions and hypothesis are similar in their intent to express relationship.
The hypotheses are tending to be more specific and declarative, whereas research question are more interrogative. In other words hypotheses are statement that can be empirically tested.
Research objectives are precise statements of what a research project will attempt to achieve. It indirectly represents a blueprint of research activities. Research objectives allow the researcher to document concise, measurable and realistic events that either increase or decrease the magnitude of management problems. More importantly it allows for the specification of information required to assist management decision making capabilities.
SUMMARY
Natures of decision maker’s objective and the role they play in defining the research have been dealt in detail. This chapter has given the steps involved in defining the problem.
KEY TERMS
• Problem definition
• Iceberg principle
• Situation analysis
• Unit of analysis
• Variables
• Hypotheses
• Research objectives
QUESTIONS
1. What is the task of problem definition?
2. What is the iceberg principle?
3. State a problem in your field of interest, and list some variables that might be investigated to solve this problem.
4. What do you mean by hypothesis?
5. What is a research objective?




LESSON – 3
RESEARCH PROCESS
OBJECTIVES
• To list the stages in the business research process
• To identify and briefly discuss the various decision alternatives available to the researcher during each stage of the research process.
• To classify business research as exploratory, descriptive, and causal research
• To discuss categories of research under exploratory, descriptive and causal research.
STRUCTURE
 Research process
 Research design
 Types of research design
 Explorative research
 Descriptive research
 Casual research
RESEARCH PROCESS
Before discussing the phases and specific steps of research process, it is important to emphasize the need for information and when the research is conducted or not. In this context, the research process may be called as information research process that would be more appropriate in business parlance. The information research is used to reflect the evolving changes occurring within the management and the rapid changes facing many decision makers regarding how firms conduct both internal and external activities. Hence, understanding the process of transforming raw data into usable information from broader information and expanding the applicability of the research process in solving business problems and opportunities is very important.
Overview: The research process has been described anywhere from 6 to 11 standardized stages. Here, the process consist of four distinct inter related phases that have logical, hierarchical ordering depicted below.
Diagram: Four phases of Research Process
DETERMINATION OF RESEARCH PROBLEM
DEVELOPMENT OF APPROPRIATE RESEARCH DESIGN
EXECUTE THE RESEARCH DESIGN
COMMUNICATION OF RESULTS
PHASE I PHASE II PHASE III PHASE IV
However, each phase should be viewed as a separate process that consists of combination of integrated steps and specific procedures. The four phases and corresponding step guided by the principles of scientific method, which involves formalized research procedures that can be characterized as logical, objective, systematic, reliable, valid and ongoing.
INTEGRATIVE STEPS WITHIN RESEARCH PROCESS
The following exhibit represents the interrelated steps of the 4 phases of the research process. Generally researchers should follow the steps in order. However, the complexity of the problem, the level of risk involved and management needs will determine the order of the process.
EXHIBIT
PHASE 1: DETERMINATION OF RESEARCH PROBLEM
Step 1: Determining Management information needs
Step 2: Redefining the decision problem as research problem.
Step 3: Establishing Research Objectives.
PHASE 2: DEVELOPMENT OF RESEARCH DESIGN
Step 4: Determining to evaluate research design.
Step 5: Determining the Data Source.
Step 6: Determining the sample plan and sample size.
Step 7: Determining the measurement scales.
PHASE 3: EXECUTION OF THE RESEARCH DESIGN
Step 8: Data Collection and Processing Data.
Step 9: Analysing the data.
PHASE 4: COMMUNICATION OF THE RESULTS.
Step 10: Preparing and presenting the final report to management.
STEP 1
Before the researcher becomes involved usually, the decision maker has to make a formal statement of what they believe is the issue. At this point, the researcher's responsibility is to make sure management has clearly and correctly specified the opportunity or question. It is important for the decision maker and the researcher to agree on the definition of the problem so that the result of the research process will produce useful information. Actually the researcher should assist the decision maker in determining whether the referred problem is really a problem or just a symptom or a yet ¬unidentified problem. Finally the researchers list the factors that could have a direct or indirect impact on the defined problem or opportunity.

nce the researcher and decision makers have identified the specific information needs, the researcher must redefine the problem in scientific terms since the researcher feel more comfortable using a scientific framework. This is very critical because, it influences other steps. It is the researcher's responsibility to state the initial variables associated with the problem in the form of one or more question formats (how, what, where, when or why). In addition, the researcher need to focus on determining what specific information is required (i.e., facts, estimates, predictions, relationships or some combination) and also of quality of information which includes the value of information.
STEP 3
The research objective should follow from the definition of research problem established in Step 2. Formally stated objective provides the guidelines for determining other steps to be undertaken. The undertaking assumption is that, if the objectives are achieved, the decision maker will have information to solve the problem.
STEP 4
The research design serves as a master plan of methods and procedures that should be used to collect and analyze the data needed by the decision maker. In this master plan, the researcher must consider the design technique (survey, observation, and experiment), the sampling methodology and procedures, the schedule and the budget. Although every problem is unique, but most of the objectives can be met using one of three types of research designs: exploratory, descriptive, and casual. Exploratory focuses on collecting either secondary or primary data and using unstructured formal or informal procedures to interpret them. It is often used simply classify problems or opportunity and it is not intended to provide conclusive information. Some examples of exploratory studies are focus group interview, expensive surveys and pilot studies. Descriptive studies that describe the existing characteristics which generally allow drawing inference and can lead to a course of action. Causal studies are designed to collect information about cause and effect relationship between two or more variables.
STEP 5
This can be classified as being either secondary or primary. Secondary data can usually be gathered faster and at less cost than primary data. Secondary data are historical data previously collected and assembled for some research problem other than the current situation. In contrast primary data represent firsthand data, and yet to have meaningful interpretation and it employs either surveyor observation.
STEP 6
To make inference or prediction about any phenomenon we need to understand where or who is supplying the raw data and how representative those data are. Therefore, researchers need to identify the relevant defined target population. The researcher can choose between a sample (small population) and census (entire population). To achieve this research objective, the researcher needs to develop explicit sampling plan which will serve as a blueprint for defining the target population. Sampling plans can be classified into two general types: probability (equal chance) and non probability. Since sampling size affects quality and general ability, researchers, must think carefully about how many people to include or how many objects to investigate.
STEP 7
This step focuses on determining the dimensions of the factors being investigated and measuring the variables that underlie the defined problem. This determines how much raw data can be collected and the amount of data to be collected. The level of information (nominal, ordinal, interval, and ratios), the reliability, the validity and dimension (unit vs. multi) will determine the measurement process.
STEP 8
There are two fundamental approaches to gather raw data. One is to ask questions about variables and phenomena using trained interviewers or .questionnaires. The other is to observe variables or phenomena using professional observers or high tech mechanical devices. Self - administered surveys, personal interviews, computer simulations, telephone interviews are some of the tools to collect data. The questioning allows a wider variety of collective of data about not only past, present but also the state of mind or intentions. Observation can be characterized as natural, contrived, disguised or undisguised, structured or unstructured, direct or indirect, human or mechanical, and uses the devices like video camera, tape recorders, audiometer, eye camera, psychogalvemo meter or pupil meter. After the raw data collected, a coding scheme is needed so that the raw data can be entered into computers. It is assigning logical numerical description to all response categories. The researcher must then clean the raw data of either coding or data¬ entry error.

STEP 9
Using a variety of data analysis technique, the researcher can create new, complex data structure by continuing two or more variables into indexes, ratios, constructs and so on. Analysis can vary from simple frequency distribution (percentage) to sample statistic measures (mode, median, mean, standard deviation, and standard error) to multivariate data analysis.
STEP 10
This step is to prepare and present the final research report to management. The report should contain executive summary, introduction, problem definition and objectives, methodology, analysis, results and finding, finally suggestions and recommendation. It also includes appendix. Any researcher is expected not only submit well produced written report but also oral presentation.






LESSON -4
SAMPLING METHODS
OBJECTIVES
• To Define the terms Sampling, sample, Population
• To Describe the various sampling designs
• To Discuss the importance of confidence and Precision
• To Identify and use the appropriate sampling designs for different research purposes.
STRUCTURE
 Universe/ population
 Sampling
 Sampling techniques
 Importance of sample design and sample size.
INTRODUCTION
Sampling Methods is a process of selecting sufficient number of elements from the population. This is also understood as the process of obtaining information about entire population by examining only a part of it. So sample can be defined as a subset of the population. In other words some, but not all elements of the population would form the sample.
Basically sampling helps a researcher in variety of ways as follows:
 It saves time and money. A sample study usually is less expensive than a population Survey.
 It also helps the researcher to obtain the accurate results.
 Sampling is only way when the population is very large in size.
 It enables to estimate the sampling error so; this assists in obtaining information and in Convening the characteristics of population.
To understand the sampling process the researcher also should understand the following terms.
1. UNIVERSE / POPULATION
In any of the research, the interest of the researcher is mainly in studying the various characteristics relating to items or individuals belonging to a particular group. This group of individuals under study is known as the population or universe.
2. SAMPLING
A finite subset selected from a population with the objective of investigating its properties is called a sample.
The number of units in the sample is known as the sample size. This is the important role of any research which enables to draw conclusions about characteristics of the population.
3. PARAMETER & STATISTICS
The statistical constants used for further analysis of Data collected such as Mean ( ) Variance (2) Skewness (1) Kurtosis (2) Correlation ( ) can be computed for the sample drawn from the population.
Sampling is the important part of any research before data collection. So the sampling process should be done in a careful way to obtain the exact samples and sample size of the population on which the research is to be done.
Example: A researcher who would like to study the customer satisfaction for a health drink namely Horlicks should identify the population are consuming Horlicks. If the consumers are varying in age, genders all over the state or country he should be able to decide to particular consumers are going to be focused. Again if the number is on one to survey he has to decide on how much individuals he targets for his study.
HENCE THE EFFECTING SAMPLING PROCESS SHOULD HAVE THE FOLLOWING STEPS:
Define the population
(Elements, units, extent, and time)

Specify the sample frame
(The mean of representing the elements of population map, city directory)

Specifying the sampling unit
(Sampling unit contain more population elements)

Specifying the sampling method
(The method by which sampling units are to be selected)

Determine the sample size
(The no. of elements of the population to decided)

Sampling plan
(Procedure for selected sampling unit)

Select the sample
(The effective and field work revision for section of samples)
Hence a sample design is a definite plan for obtaining a sample from a given population. So whenever samples has to be decided for the study the following can be considered
 Out line the universe
 Define a sampling unit
 Sampling frame
 Size of the sample.
SAMPLING TECHNIQUES
Sampling Techniques can be divided into two types:
1. Probability or representative sampling
2. Non probability or judgmental sampling.
There are many types of Probability or representative sampling.
1. Simple random sampling
2. Stratified random sampling
3. Systematic sampling
4. Cluster sampling
5. Multistage sampling
Non probability or judgment sampling includes Quota and Purposive Sampling.
OTHER METHODS
• Snow ball sampling
• Spatial sampling
• Saturation sampling
PROBABILITY SAMPLING
This is a scientific technique of drawing samples from the population according to some laws of change according to which each unit in the universe has some definite pre- assigned probability of being selected in the sample.
SIMPLE RANDOM SAMPLING
In this technique, sample is drawn in such a way that every elements or unit in the population has an equal and independent chance of being included in the sample.
The unit selected in any draw from the population is not preplanned in population before making the next draw is known as simple random sampling without replacement.
If the unit is replaced back before making the next draw the sampling plan is called as simple random sampling with replacement.
STRATIFIED RANDOM SAMPLING
When the population is heterogeneous with respect to the variable or characteristics under the study this sampling method is used. Stratification means division into homogenous layers or groups. Stratified random sampling involves stratifying the given population into a number of sub-groups or sub- populations known as strata.
THE CHARACTERISTICS OF STRATIFIED SAMPLES ARE AS FOLLOWS
 The units within each stratum are as homogenous as possible.
 The differences between various strata are as marked as possible.
 Each and every unit in the population belongs to one and only one stratum.
The population can be stratified according to geographical, sociological or economic character tics. Some of the commonly used stratifying factors are age, sex, income, occupation, education level, geographic area, economic status etc. To decide on the no. of samples or items drawn from the different strata, will be wept proportional to the sizes of the strata.
Example: If pi represents the proportion of the population included in stratum i and n represents the total sample size.
Here the number of elements selected from stratum i is (n – pi)
Example: Suppose we need a sample size of n = 30 to be drawn from a population of size N= 6000 which is divided into three strata of size N1 = 3000, N2 = 1800, N3 = 1200
For strata with N1 = 3000
P1 = 3000/ 6000
Hence n1 = N. P1
= 30 (3000/6000) = 15
Similarly for strata with N2 = N2. P2
P2 = (1800 / 6000)
= 30 (3/10)
= 90 / 10 = n2 = 9
For the strata with N3 = N3. P3
P3 = (1200 / 6000)
= 30 (1/ 5) = n3 = 6
Hence using the proportional allocation, the sample size for different strata are 15,9,6 which are proportionate to the sizes of strata with 3000 , 1800, & 1200.
SYSTEMATIC SAMPLING
This sampling is a slight variation of simple random sampling in which only the first sample unit is selected at random while remaining units are selected automatically in a definite sequence at equal spacing from one another. This kind of sampling is recommended only when if a complete and up to date list of sampling units is available and the units are arranged in a systematic order as alphabetical, chronological, geographical etc.
Systematic sampling can be taken as an improvement over a simple random sampling since it spreads more evenly over the entire population. This method is one of the easier and less costly methods of sampling and can be conveniently used in case of large population.
CLUSTER SAMPLING
If the total area of interest happens to be a big one, a convenient way in which a sample can be taken is to divide the area into a number of smaller non- overlapping areas and then to randomly select a number of these smaller areas.
In cluster sampling, the total population is divided into a number of relatively small sub divisions which are themselves clusters of still smaller units and some of these clusters are randomly selected for inclusion in overall sample. Cluster sampling reduces the cost by concentrating surveys in clusters. But this type of sampling is less accurate than random sampling.
MULTISTAGE SAMPLING
It is a further development of the principle of cluster sampling. In case of investigating the working efficiency of nationalized banks in India, a sample of few banks may be taken for the purpose of investigation.
Here to select the banks as a first step the main states in a country are selected from the states from which the banks to be included for the study will be selected. This represents the two stage sampling. Even further from the district certain towns may be selected where in the banks is selected which may represent three stages sampling.
Even thereafter, instead taking census from all the banks. The banks in all the towns we have selected once again the banks may be selected randomly for the survey. Hence the random selection at all levels (various levels) is known as multistage random sampling design.
SEQUENTIAL SAMPLING
This is one of the complex sampling designs. The ultimate size of sample in this technique is not fixed in advance but it is determined according to the mathematical decision rules on the basis of information yielded in the survey.
This method is adopted when sampling plan is accepted in context of Statistical Quality Control.
Example: When a lot is to be accepted or rejected on the basis of single sample, it is known as single sampling; when the decision is to be taken on the basis of two samples it is known a s Double sampling and in case if the decision is based on the more than two samples, but the number of samples is certain and decided in advance, the sampling is known as multi sampling. In case when the number of samples is more than two but it is neither certain, nor decided in advance, this type of system is often referred to as Sequential Sampling. So in case of Sequential Sampling, one can go on taking samples one after another as long as one desires to do so.

QUOTA SAMPLING
This is stratified –cum-purposive or judgment sampling and thus enjoys the benefits of both. It aims at making the best use of stratification without incurring the high costs involved in probabilistic methods. There is considerable saving in time and money as the simple units may be selected that they are close together. If carefully experienced by skilled and experienced investigators who are aware of the limitations of judgment sampling and if proper controls are imposed on the investigators, this sampling method may give reliable results.
PURPOSIVE OR JUDGMENT SAMPLING
A desired number of sampling units are selected deliberately so that only important items representing the true characteristics of the population are included in the sample. A major disadvantage of this sampling is that it is highly subjective, since the selection of the sample depends entirely on the personal convenience and beliefs.
Ex: In the case of Socio economic survey on the standard of living from people in Chennai. If the researcher wants to show that the standard has gone down, he may include only individuals from the low income stratum of the society in the samples and exclude people from rich areas.
OTHER FORMS OF SAMPLING
A) SNOW BALL SAMPLING
This method is used in the cases where information about units in the population is not available. If a researcher wants to study the problem of the weavers in a particular region, he may contact the weavers who are known to him. Form the he may collect the addresses of other weavers in the various parts of the region he selected. From them again he may collect the information on other known weavers to him. By repeating like this for several times, he will be able to identify and contact the majority of weavers from a selected region. He could then draw a sample from this group. This method is useful only when individuals in the target group have contact with one another, and also willing to reveal the names of others in the group.
B) SPATIAL SAMPLING
Some populations are not static and moving from place to place but staying at one place when an event is taking place.
In such case the whole population in a particular place is taken into the sampling and studied, Ex: The number of people living in Dubai may vary depending on many factors.
C) SATURATION SAMPLING
Sometimes if all members of population is need to be studied so as to get a picture of entire population. The sampling method that requires a study of entire population is called Saturation Sampling. This technique is more familiar in Socio metric studies where in distorted results will be produced even if one person is left out.
Example: In case of analyzing the student’s behavior of one particular class room, all the students in the class room must be examined.
From the above discussion on sampling methods, normally one may resort to Simple random sampling since biasness is generally eliminated in this type of sampling.
Same time Purposive sampling is considered more appropriate when the universe happens to be small and a known characteristic of it is to be studied intensively. In situations where random sampling is not possible then it is advisable to use necessarily a sampling design other than random sampling.
DETERMINATION OF SAMPLE SIZE
Determination of appropriate sample size is crucial part of any Business Research. The decision on proper sample size tremendously requires the use of statistical theory. When a business research report is been evaluated, the evaluation start with the question of How big the Sample size?
Having discussed various sampling designs it is important to focus the attention on Sample Size. Suppose we select a sample size of 30 from the population of 3000 through a simple random sampling procedure. Will we able to generalize the findings to the population with confidence? So in this case what is the sample size that would be required to carry out the research?
It is the known fact that larger the sample size, the more accurate the research is. In fact this is the fact based on the statistics. According to this fact, increasing the sample size decreases the width of the confidence interval at a given confidence level. When the Standard deviation of the population is unknown, a confidence interval is calculated by using the formula
Confidence Interval ยต = X ± KSX
Whereas = S / √n
In sum, choosing the appropriate sampling plan is one of the important research design decisions the researcher has to make. The choice of a specific design will depend broadly on the goal of research, the characteristics of the population, and considerations of cost.
ISSUES OF PRECISION AND CONFIDENCE IN DETERMINING SAMPLE SIZE
We now need to focus attention on the second aspect of the sampling design issue—the sample size. Suppose we select 30 people from a population of 3,000 through a simple random sampling procedure. Will we be able to generalize our findings to the population with confidence? What is the sample size that would be required to make reasonably precise generalizations with confidence? What do precision and confidence mean?
A reliable and valid sample should enable us to generalize the findings from the sample to the population under investigation. No sample statistic (X, for instance) is going to be exactly the same as the population parameter (SX), no matter how sophisticated the probability sampling design is. Remember that the very reason for a probability design is to increase the probability that the sample statistics will be as close as possible to the population parameters!
PRECISION
Precision refers to how close our estimate is to the true population characteristic. Usually, we would estimate the population parameter to fall within a range, based on the sample estimate.
Example: From a study of a simple random sample of 50 of the total 300 employees in a workshop, we find that the average daily production rate per person is 50 pieces of a particular product (X = 50). We might then (by doing certain calculations, as we shall see later) be able to say that the true average daily production of the product (X) would lie anywhere between 40 and 60 for the population of employees in the workshop. In saying this, we offer an interval estimate, within which we expect the true population mean production to be (ยต = 50 ± 10). The narrower this interval, the greater the precision. For instance, if we are able to estimate that the population mean would fall anywhere between 45 and 55 pieces of production (ยต = 50 ± 5) rather than 40 and 60 (ยต = 50 ± 10), then we would have more precision. That is, we would now estimate the mean to lie within a narrower range, which in turn means that we estimate with greater exactitude or precision.
Precision is a function of the range of variability in the sampling distribution of the sample mean. That is, if we take a number of different samples from a population, and take the mean of each of these, we will usually find that they are all different, are normally distributed, and have a dispersion associated with them. Even if we take only one sample of 30 subjects from the population, we will still be able to estimate the variability of the sampling distribution of the sample mean. This variability is called the standard error, denoted by .S’. The standard error is calculated by the following formula:
= S / √n
Where S is the standard deviation of the sample, n is the sample size, and SX indicates the standard error or the extent of precision offered by the sample.
In sum, the closer we want our sample results to reflect the population characteristics, the greater will be the precision we would aim at. The greater the precision required, the larger is the sample size needed, especially when the variability in the population itself is large.
CONFIDENCE
Whereas precision denotes how close we estimate the population parameter based on the sample statistic, confidence denotes how certain we are that our estimates will really hold true for the population. In the previous example of production rate, we know we are more precise when we estimate the true mean production (ยต) to fall somewhere between 45 and 55 pieces, than somewhere between 40 and 60.
In essence, confidence reflects the level of certainty with which we can state that our estimates of the population parameters, based on our sample statistics will hold true. The level of confidence can range from 0 to 100%. A 95% confidence is the conventionally accepted level for most business research, most commonly expressed by denoting the significance level as p = .05. In other words, we say that at least 95 times out of 100, our estimate will reflect the true population characteristic,
IN SUM, THE SAMPLE SIZE, N, IS A FUNCTION OF
1. The variability in the population
2. Precision or accuracy needed
3. Confidence level desired
4. Type of sampling plan used—for example, sample random sampling versus stratified random sampling
It thus becomes necessary for researchers to consider at least four points while making decisions on the sample size needed to do the research
(1) Much precision is really needed in estimating the population characteristics interest—that is, what is the margin of allowable error?
(2) How much confidence is really needed—that is, how much chance can we take of making errors in estimating the population parameters?
(3) To what extent is there variability in the population on the characteristics investigated?
(4) What is the cost—benefit analysis of increasing the sample size?
DETERMINING THE SAMPLE SIZE
Now that we are aware of the fact that the sample size is governed by the extent of precision and confidence desired, how do we determine the sample size required for our research? The procedure can be illustrated through an example.
Suppose a manager wants to be 95% confident that the expected monthly withdrawals in a bank will be within a confidence interval of ± $500.Example of a sample of clients indicates that the average withdrawals made by them have a standard deviation of $3,500. What would be the sample size needed in this case?
We noted earlier that the population mean can be estimated by using the formula:
ยต= X ± K
Since the confidence level needed here is 95%, the applicable K value is 1.96 (t table). The interval estimate of ± $500 will have to encompass a dispersion of (1.96 x standard error). That is,
500 = 1.96 ×
500/1.96 = 255.10
We already know that
= S / √n
255.10= 3500/ √n
n = 188
The sample size needed in the above was 188. Let us say that this bank has the total clientele of only 185. This means we cannot sample 188 clients. We can in this case apply the correction formula and see what sample size would be needed to have the same level of precision and confidence given the fact that we have a total of only 185 clients. The correction formula is as follows:
= S/√n × √N – n /N-1
Where N is the total number of elements in the population, n is the sample size to be estimated, Sx is the standard error of estimate of the mean, and S is the standard deviation of the sample mean.
Applying the correlation formula, we find that
255.10 = 3500/√n × √185-n/184
n = 94
We would now sample 94 of the total 185 clients.
To understand the impact of precision and/or confidence on the sample size, let us try changing the confidence level required in the bank withdrawal exercise which needed a sample size of 188 for a confidence level of 95%. Let us say that the bank manager now wants to be 99% sure that the expected monthly withdrawals will be within the interval of ±$500. What will be the sample size now needed?
will now be 500/2.576 = 194.099
194.099 = 3500/√n
n = 325
The sample has now to be increased 1.73 times (from 188 to 325) to increase the confidence level from 95% to 99%.It is hence a good idea to think through how much precision and confidence one really needs, before determining the sample size for the research project.
So far we have discussed sample size in the context of precision and confidence with respect to one variable only. However, in research, the theoretical framework has several variables of interest, and the question arises how one should come up with a sample size when all the factors are taken into account.
Krejcie and Morgan (1970) greatly simplified size decision by providing a table that ensures a good decision model. The Table provides that generalized scientific guideline for sample size decisions. The interested Student is advised to read Krejcie and Morgan (1970) as well as Cohen (1969) for decisions on sample size.
TABLE ON SAMPLE SIZE FOR A GIVEN POPULATION SIZE
N S N S N S
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
110
120
130
140
150
160
170
180
190
200
210 10
14
19
24
28
32
36
40
44
48
52
56
59
63
66
70
73
76
80
86
92
97
103
108
113
118
123
127
132
136 220
230
240
250
260
270
280
290
300
320
340
360
380
400
420
440
460
480
500
550
600
650
700
750
800
850
900
950
1000
1100 140
144
148
152
155
159
162
165
169
175
181
186
191
196
201
205
210
214
217
226
234
242
248
254
260
265
269
274
278
285 1200
1300
1400
1500
1600
1700
1800
1900
2000
2200
2400
2600
2800
3000
3500
4000
4500
5000
6000
7000
8000
9000
10000
15000
20000
30000
40000
50000
75000
100000 291
297
302
306
310
313
317
320
322
327
331
335
338
341
346
351
354
357
361
364
367
368
370
375
377
379
380
381
382
384
IMPORTANCE OF SAMPLING DESIGN AND SAMPLE SIZE
It is now possible to see how both sampling design and the sample size are important to establish the representativeness of the sample for generalizability. If the appropriate sampling design is not used, a large sample size will not, in itself, allow the findings to be generalized to the population. Likewise, unless the sample size is adequate for the desired level of precision and confidence, no sampling design, however sophisticated, can be useful to the researcher in meeting the objectives of the study.
Hence, sampling decisions should consider both the sampling design and the sample size. Too large a sample size, however (say, over 500) could also become a problem in as much as we would be prone to committing Type II errors. Hence, neither too large nor too small sample sizes help research projects.
ROSCOE (1975) PROPOSES THE FOLLOWING RULES OF THUMB FOR DETERMINING SAMPLE SIZE
1. Sample sizes larger than 30 and less than 500 are appropriate for most research.
2. Where samples are to be broken into sub samples; (male/females, juniors/ seniors, etc.), a minimum sample size of 30 for each category is necessary.
3. In multivariate research (including multiple regression analyses), the sample size should be several times (preferably 10 times or more) as large as the number of variables in the study.
4. For simple experimental research with tight experimental controls (matched pairs, etc.), successful research is possible with samples as small as 10 to 20 in size.
KEY TERMS
• Sampling Design
• Sample size
• Non-Probability Sampling
• Probability Sampling
• Universe
• Population
• Preciseness
• Confidence
QUESTIONS
1. “What is Sample Design”? What all are the points to be considered to develop a sample design.
2. Explain the various sampling methods under probability Sampling.
3. Discuss the non probability sampling methods.
4. What are the importance of sample size and sampling design?
5. Discuss the other sampling methods.
6. Explain why cluster sampling is a probability sampling design.
7. What are the advantages and disadvantages of cluster sampling?
8. Explain what precision and confidence are and how they influence sample size.
9. The use of a convenience sample used in organizational research is correct because all members share the same organizational stimuli and go through almost the same kinds of experience in their organizational life. Comment.
10. Use of a sample of 5,000 is not necessarily better than one of 500. How would you react to this statement?
11. Non-probability sampling designs ought to be preferred to probability sampling designs in some cases. Explain with an example.






LESSON – 5
SOURCES OF DATA

SOURCES OF DATA – PRIMARY – SECONDARY DATA

OBJECTIVES
• To explain the difference between secondary and primary data
• To discuss the advantages and disadvantages of secondary data
• To learn the nature of secondary data
• To understand the evaluation of secondary data sources
• To learn the sources of secondary data
STRUCTURE
 Value of secondary data
 Disadvantage of secondary data
 Nature and scope of secondary data
 Sources of secondary data
INTRODUCTION
The availability of data source is very much needed to solve the problem and there are many ways by which the data is collected. The task of data collection begins after a research problem has been defined and research designed is prepared. Thus the data to be collected can be classified as being either secondary or primary. The determination of data source is based on three fundamental dimensions as given below.
1. The extent of data already exist in some type,
2. The degree to which the data has been interpreted by someone, and
3. The extent to which the researcher or decision maker understands the reasons and why the data was collected and researched.
The primary data are data gathered and assembled specifically for the project at hand. It is “finished” raw data, and has yet to receive any type of meaningful interpretation. The primary data are a fresh and since it is for the first time collected hence it happens to the original in character. On the other hand, Secondary data are those which have already been collected by someone else and also passed through statistical process and interpretation. Secondary data are historical data structure of variables previously collected and assembled for some research problem other than the current situation.
The sources of primary data tend to be the output of conducting some type of exploration, descriptive or casual research that employs surveys, experiments and / or observation as technique of collecting the needed data. The greater insights underlying primary data will be discussed in the chapter “methods of data collection”. The pros & cons of primary data also discussed with reference to various techniques involved the process.
The source of secondary data can be found inside a company at public libraries, and universities, on World Wide Web (www) sites or purchased from a firm specializing in providing secondary information. Here, evaluation and source of data are discussed.
THE VALUE / ADVANTAGES OF SECONDARY DATA
More and more companies are interested in using the existing data as a major tool in the management decisions. As more and more such data become available, many companies are realizing that they can be used to make sound decisions. Data of this nature are more readily available, often more highly valid and usually less expensive to secure than primary data.
“Nowhere in science do we start from scratch” – this quote explains the value of secondary data. Researchers are able to build on the past research – a body of business knowledge. The researchers use other’s experience and data when it is available as secondary data. The primary advantage of secondary data is that obtaining data is almost always less expensive and in addition the data can usually be obtained rapidly. The major advantage and disadvantages are discussed here under.
ADVANTAGES
1. It is more economical as the cost of collecting original data is saved. In the collection of primary data, a good deal of effort is required which includes preparation of data collection forms, designing and printing of forms, persons appointed to collect data in turn involves travel plan need to verify and finally data to be tabulated. All these need large funds which can be utilized elsewhere if secondary data can serve the purpose.
2. Another advantage is that the use of secondary data saves much of the time of the researcher. This also leads to prompt completion of research project.
3. Secondary data are helpful not only because it is useful but the familiarity with the data indicates deficiencies and gaps. As a result, the researcher can make the primary data collection more specific and more relevant to the study.
4. It also helps in gaining new insights to the problem, then can be used to fine tune the research hypothesis and objectives.
5. Finally, secondary data can be used as a basis of comparison with primary data that has been collected for this study.
DISADVANTAGES OF SECONDARY DATA
An inherent disadvantage of secondary data is that they were not designed specifically to meet the researcher’s needs. Secondary data quickly become outdated in our rapidly changing environments. Since the purpose of the most of the studies is to predict the future, the secondary data must be timely. Hence the most common problems with secondary data are 1. Outdated information
2. Variation in definition of terms or classifications. The unit of measurement may cause problems if they are not identical to the researcher’s needs. Even though original units were comparable, the aggregated or adjusted units of measurements are not suitable for the present study. When the data are reported in a format that does not exactly meet the researchers needs, the data conversion may be necessary. 3. Another disadvantages of secondary data is that the user has to control over their accuracy even though it is timely & pertinent, they may be inaccurate.
THE NATURE AND SCOPE OF SECONDARY DATA
Focusing on the particular business or management problem, the researcher needs to determine whether useful information already exists, of exists how relevant the information. Since existing information are more widespread than one might expect. The secondary data exists in three forms:
1. Internal secondary data: The data collected by the individual company for some purpose and reported periodically. This is also called as primary sources. The primary sources are original work of research or raw data without interpretation or pronouncements that represent an official opinion or position. Memo’s, complete interviews, speeches, laws, regulations, court decisions, standards, and most government data, including census, economic and labor data. Primary sources are always the most authoritative because the information is not filtered. It also includes inventory records, personnel records, process charts and similar data.
2. External Secondary data: It consists of data collected by outside agencies such as government, trade associations of periodicals. This is called as secondary sources. Encyclopedia, text books, handbooks, magazine and newspaper articles and most news crafts are considered to secondary sources. Indeed all reference materials fall into this category.
3. Computerized data sources: It includes internal and external data usually collected by specific companies with online information sources. This can be called as territory sources. These are represented by indexes, bibliography and other finding aids. E.g.: Internet search engines.
EVALUATION OF SECONDARY DATA SOURCES
The emphasis on secondary data will increase it an attempt is made to establish a set of procedures to evaluate the secondary data regarding the quality of information obtained via secondary data sources. Specifically, if secondary data are to be used to assist in the decision process, then they should be assessed according to the following principles.
1. Purpose: Since most secondary data are collected for purpose other than the one at hand, the data must be carefully evaluated on how they relate to the current research objectives. Many times the original collection of data is not consistent with the particular research study. These inconsistencies usually result from the methods and units of measurement.
2. Accuracy: When observing secondary data researchers need to keep in mind what was actually measured. For e.g.: If actual purchase in a test market were measured, were they measure first – time trial purchases or repeat purchases. Researchers must also asses the generaliasability of the data.
3. The questions like i) were the data collected from certain groups only or randomly? ii) were the measure developed properly? iii) were the data presented as the total of responses from all respondents or were they categorized by age, sex or socio economic status?
4. In addition to the above dimensions, researchers must assess, when the data were collected. This factor not only damages the accuracy of the data but also may be useless for interpretation. Researchers also must keep in mind that the flaws in the research design and methods will alter the current research in process.
5. Consistency: When evaluating any source of secondary data, a good strategy is to seek out multiple sources of the same data to assure consistency. For e.g. when evaluating the economic characteristics of a foreign market, a researchers may try to gather the same information from government sources, private business publications and specially import or export trade publications.
6. Credibility: Researcher should always question the credibility of the secondary data source. Technical competence, service quality, reputation, and tracing and expertise of personnel representing the organization are some measures of credibility.
7. Methodology: The quality of secondary data is only as good as the methodology employed to gather them. Flaws in methodological procedures could produce results that are invalid, unreliable or not generaliasable beyond the study itself. Therefore, researchers must evaluate the size and description of the sample, the response date, the questionnaire, and the overall procedure for collecting the data (telephone, mail, or personal interview).
8. Bias: Researchers must try to determine the underlying motivation or hidden agenda, if any, behind the secondary data. It is not uncommon to find many secondary data sources published to advance the interest of commercial, political or other intersect groups. Researchers should try to determine if the organization reporting the report is motivated by certain purpose.


SOURCES OF SECONDARY DATA
A. INTERNAL SOURCES
Generally, internal data will consist of sales or cost information. Data of this found in internal accounting or financial records. The two most useful sources of information are sales invoices and accounts receivable reports; quarterly sales reports and sales activity reports are also useful.
The major sources of internal secondary data are given below.
1. Sales invoices
2. Accounts receivable reports
3. Quality sales reports
4. Sales activity reports
5. Other types
a. customer letters
b. Customer comment cards
c. Mail order forms
d. Credit applications
e. Cash register receipts
f. Sales person expense reports
g. Employee exit interviews
h. Warranty cards
i. Post marketing research studies.
B. EXTERNAL SOURCES
When undertaking the search for secondary data researchers must remember that the numbers of resources are extremely large. The researcher needs to connect the sources by common theme.
The key variables most often sought by the researchers are given below
 demographic dimensions
 Employment characteristics
 Economic characteristics
 Competitive characteristics
 Supply characteristics
 Regulations characteristics
 International market characteristics
The external secondary data do not originate in the firm and are obtained from outside sources. It may be noted that secondary data can be collected from the originating sources or from secondary sources. For e.g.: the office of economic advisor, GOI is the originating source for wholesale prices. In contrast a publication such as RBI bulletin on wholesale price is a secondary source.
These data may be available through Government publications, non-governmental publications or syndicated services. Some examples are given below:
A) GOVERNMENT PUBLICATIONS
1. Census by Registrar General of India
2. National Income by Central Statistical Organization also statistical abstract, annual survey of industries.
3. Foreign trade by Director General of Commercial Intelligence.
4. Wholesale price index by Office of Economic Advisor
5. Economic Survey – Dept of Economic Affairs.
6. RBI Bulletin –RBI
7. Agri. Situation in India – Ministry of Agriculture
8. Indian labor year book – Labor Bureau
9. National Sample Survey – Ministry of Planning
B) NON- GOVERNMENT PUBLICATIONS
Besides official agencies, there are number of private organizations which bring out statistics in one form or another on a periodical basis of course industry and trade associations are important like the following
1. Indian cotton mills federation OR Confederation of Indian Textile Industry – about textile industry.
2. Bombay mill owners association – statistics of workers of mills.
3. Bombay stock exchange – on financial accounts & ratios.
4. Coffee board – coffee statistics
5. Coir board – coir & coir goods
6. Rubber board – Rubber statistics
7. Federation of Indian chambers of commerce & Industry (FICCI)
C) SYNDICATED SERVICES
Syndicated services are provided by certain organization which collect and tabulate information on continuous basis. Reports based on marketing information are sent periodically to subscribers. A number of research agencies offer customized research services to their clients like consumer research, advertising research etc.
D) PUBLICATION BY INTERNATIONAL ORGANIZATION
There are several International organizations that publish statistics in their respective areas.
SUMMARY
In this chapter the importance of the secondary data have been outlined. Disadvantage of secondary data have been dealt in detail in this chapter. Sources of secondary data has been outlined in this chapter.
KEY TERMS
• Primary data
• Secondary data
• Advantages and disadvantages
• Evaluation of secondary data
• Sources of secondary data
• Governmental publications
• Syndicated services
QUESTIONS
1. Discuss the difference between primary and secondary data.
2. Explain the advantages and disadvantages of secondary data.
3. Write short notes on nature and scope of secondary data.
4. How will you evaluate the secondary data sources?
5. Discuss the internal and external sources of secondary data.






LESSON – 6
METHODS OF DATA COLLECTION
OBJECTIVES
• To Know the different types of Data and the sources of the same
• To Learn the different data collection methods and its merits , demerits
• To understand the difference between Questionnaire and Interview Schedule.
• To Apply the suitable data collection method for the research
STRUCTURE
 Primary data
 Secondary data
 Interview
 Questionnaire
 Observation
INTRODUCTION
After defining the research problem and drawing the research design, the important task of the researcher is Data collection. While deciding on the research method, the method of data collection to be used for the study also should be planned.
The same of information and the manner in which data are collected could well make a big difference to the effectiveness of the research project.
In data collection, the researcher should be very clear on what type of data is to be used for the research. There are two types of data namely primary data and secondary data. The method of collecting primary and secondary data differ since primary data are to be originally collected while in case of secondary data, it is merely compilation of the available data.
PRIMARY DATA
Primary data are generally, information gathered by the researcher for the purpose of the project at hand. When the data are collected for the first time using experiments, surveys which is known as primary data. So, in case of primary data it is always the responsibility of the researcher to decide on further processing of data.
There are several methods of data collection each with its advantages and disadvantages.
The data collection methods include the following:
1. Interview - Face to face interview
Telephone interview
Computer assigned interview
Interviews through electronic media.
2. Questionnaire - These are personally administered sent through
the mail, or electronically administered
3. Observation - Of individuals and events with or without
videotaping or audio recording. Hence
interviews, questionnaires and observation
methods are three main data collection methods.
SOME OF THE OTHER DATA COLLECTION METHODS USED TO COLLECT PRIMARY DATA ARE:
1. Warranty cards
2. Distributor Audits
3. Pantry Audits
4. Consumer Panels
SECONDARY DATA
As already mentioned, it the researcher who decides to collect secondary data for his research that can be collected through various sources. In the case of secondary data the researcher may not face severe problems that are usually associated with primary data collection.
Secondary data may either be published or unpublished data. Published data may be available with the following sources:
• Various publications of the central, state or local governments.
• Various publications of foreign governments or of international bodies.
• Technical and Trade Journals.
• Books, Magazines, Newspapers.
• Reports and publication from various associations connected with industry and business.
• Public records and statistics.
• Historical documents.
Though there are various sources for secondary data, it is the responsibility of the researcher that he should make a minute scrutiny of data in order to involve the data more suitable and adequate.
INTERVIEWS
An interview is a purposeful discussion between two or more people. Interview can help to gather valid and reliable data. There are several types of interviews.
TYPES OF INTERVIEWS
Interviews may be conducted in a very formal manner, using structured and standardized questionnaire for each respondent.
Interviews also may be conducted informally through unstructured conversation.
Based on formally and structure, the interviews are classified as follows:
1. Structured interviews
2. Semi – structured interviews
3. Un structured interviews.
STRUCTURED INTERVIEWS
These interviews involve the use of a set of predetermined questions and of highly standardized techniques of recording. So, in this type of interview a rigid proved method is followed.
SEMI STRUCTURED INTERVIEWS
These interviews may have a structured questionnaire but the technique of interviewing may not have a pakka proved method. These interviews have more scope for discussion and recording of respondent’s opinion and views.
UNSTRUCTURED INTERVIEWS
These interviews neither follow a system of pre determined questions nor a standardized technique of recording information. But the unstructured interviews need an in depth knowledge and greater skill on the part of the interviewers.
All the three types of interviews may be conducted by the interviewer by asking questions generally in face to face contact. These interviews may take a form of direct personal investigation or may be indirect oral investigation. In case of direct personal investigation, the interviewer has to collect the information personally. So, it is the duty of interviewer to be there in the spot to meet the respondents to collect data.
When this is not possible the interviewer may have to cross examine others who are supposed to have knowledge about the problem and here information may be recorded.
Ex: Commissions and committee appointed by Government.
Depending on the approaches of the interviewer, the interviews may be classified as:
1. Non directive interviews
2. Directive interviews
3. Focused interviews
4. In depth interviews
NON – DIRECTIVE INTERVIEWS
In these types of interviews, the interviewer is very much free to arrange the form and order of questions. The questionnaire for these kinds of interviews also may contain open ended questions where in the respondent also feel free to respond to the questions.
DIRECTIVE INTERVIEW
This is also a type of structured interview. In this method a predetermined questionnaire is used and the respondent is express to limit the answers only to the question asked. Market surveys and interviews by news paper correspondents are the suitable examples.
FOCUSED INTERVIEWS
These methods of interviews are in between directive and non – directive. Here the methods are neither fully standardized nor non-standardized. Here the objective is to focus the attention of the respondents on a specific issue or point. Example :A detective questioning a person regarding a crime committed in an area.
IN DEPTH INTERVIEWS
In these interview methods, the respondents are encouraged to express his thoughts on the topic of the study. In depth interviews are conducted to get important aspects of psycho – social situations, which are otherwise not readily evident.
The major strength of these kinds of interviews is their capacity to uncover the basic and complete answers to the questions asked.
ADVANTAGES & DISADVANTAGES OF INTERVIEWS
ADVANTAGES
Despite the variations in interviews techniques the following are the advantages of the same.
1. Interviews may help to collect more information and also in depth information.
2. The unstructured interviews are more advantages since, there is always an opportunity for the interviewers to restructure the questions.
3. Since the respondents are contacted for the information, there are always greater advantages of creating support and collecting personal information also.
4. Interviews help the researcher to collect all the necessary information, here the evidence no response will be very low.
5. It is also possible for the interviewer to collect additional information about the environment, name, behavior and attitude of the respondents.
DISADVANTAGES
1. Interviews are expensive methods especially in case of widely spread geographical samples are taken.
2. There may be a possibility for the barriers in the case of both interviewer and the respondent.
3. These methods are also time consuming especially when the large samples are taken for the study.
4. There may be a possibility for the respondent to hide the real opinion, so genuine data may not be collected.
5. Sometimes there will be great difficult in adopting interview methods became fixing appointment with the respondent itself may not be possible.
Hence, for successful implementation of the interview method, the interviewer should be carefully selected, trained and briefed.
HOW TO MAKE INTERVIEWS SUCCESSFUL?
1. As mentioned above, the interviewers must be carefully selected and trained properly.
2. The interviewer should have the knowledge of exploring to collect the needed information from the respondent.
3. Honesty and integrity of the interviewer also determines the outcomes of the interview.
4. The support with the respondent should be created by the interviewer.
5. Qualities such as politeness, courteousness friendly and conversational are necessary to make the interview successful.
TELEPHONIC INTERVIEWS
A part from all the above the telephonic interviews are also conducted to collect the data. Respondents are contacted over the phone to gather data.
Telephonic interviews are more flexible in timing – it is faster than other methods and this method is cheaper method also. For these sorts of interviews no field’s staff are required and also the information can be recorded without causing embarrassment to respondents especially when very personal questions are asked. But these methods are much restricted to the respondents who have telephone facility. Possibility for the biased replies is relatively more and since there is not personal touch by both there is a greater possibility for non answered questions.
COLLECTION OF DATA THROUGH QUESTIONNAIRES
Questionnaires are widely used for data collection in social sciences, research particularly in surveys. This is been accepted as a reliable not for gathering data from large, diverse and scattered social groups. Bogardus describes the questionnaire as a list of questions sent to a number of persons for their answers and which obtains standards results that can be tabulated and treated statistically.
There are two types of Questionnaire also which are structured and unstructured.
The design of the questionnaire may vary based on the way it is administered.
The questionnaire methods are most extensively used in economic and business surveys.
STRUCTURED QUESTIONNAIRE
These contain definite concrete and preordained questions. The answer collected using this structured questionnaire is very precise and there is no vagueness and ambiguity.
The structured questionnaire may have the following types:
1. Closed – form questionnaire: Questions are set in such a way that leaves only few alternative answers. Ex: Yes or no type questions.
2. Open – ended questionnaire: Here the respondent has the choice of using his own style, expression of language, length and perception. The respondents are not restricted in his replies.
UN STRUCTURED QUESTIONNAIRE
The Questions in this questionnaire are not structured in advance. These sorts of questionnaire give more scopes for variety of answers. Mainly to conduct interviews where in different responses are expected, these type of questionnaire are used.
The researcher should be much clear on when to use the questionnaire. These can be mostly used in case of descriptive and explanatory type of research.
Ex: Questionnaire on attitude, opinion, organizational practices enable the researcher to identify and describe the variability.
THE ADVANTAGES OF THE QUESTIONNAIRE
• The cost involved is low in case of widely spread geographical sample.
• It is more appreciable one because it more free from the subjectivity of the interviews.
• Respondents also may find adequate time to give well thought answers.
• This is more advantages in the case when respondents are not reachable.
But the rate of return of questionnaire and the fulfillment at needed data for the study may be doubtful. This can be used only when the respondents are educated and will know to read the language in which questionnaire is prepared. Possibilities for ambiguous replies, omission of replies are more. This method is more time consuming.
Before sending the final questionnaire to the respondents it is always more important to conduct the Pilot study for resting the questionnaire. Pilot study is just a rehearsal of the main survey such survey conducted with help of experts brings more strength to the questionnaire.
DATA COLLECTION THROUGH SCHEDULES
This method is very much like data collection through questionnaires. Schedules are a proforma containing set of questions which are filled in by the enumerators who are specially appointed for this purpose. In this method the enumerators are expected to perform well and they must be knowledgeable and must possess the capacity of cross examination in order to find the truth. These methods are usually expensive and are conducted by bigger and Government organizations.






DIFFERENCE BETWEEN QUESTIONNAIRE AND SCHEDULE
QUESTIONNAIRE SCHEDULE
USAGE
USUALLY FILLED UP ONLY BY RESPONDENTS THEMSELVES. ANSWERS ARE RECORDED BY THE . ENUMERATORS.
COST THE COST IS RELATIVELY CHEAPER SINCE IT IS SENT MOSTLY THROUGH THE MAIL. COSTLIER IN TERMS OF APPOINTMENTS OF ENUMERATOR, TRAINING AND MEETING THE EXPENSES TO REACH THE PLACE OF RESPONDENTS.
DEGREE OF RESPONSE ALL THE RESPONDENTS MAY NOT RESPOND RELATIVELY GOOD DEGREE OF RESPONSE AS THE ENUMERATOR FOCUSES THE RESPONDENTS.
QUALITY OF RESPONSE
NOT GOOD, AS THE ANSWERS ARE GIVEN AS THE WAY QUESTIONS ARE UNDERSTOOD. BETTER QUALITY, SINCE THE ENUMERATOR GUIDES TO CLARIFY THE QUESTIONS.
TIME
MORE TIME CONSUMABLE AND NO CONTROL OVER TIME. POSSIBLE TO CONTROL THE TIME SINCE THE PERSONAL TOUCH OF THE ENUMERATOR IS INVOLVED.
PERSONAL CONTACT NO PERSONAL CONTACT MORE PERSONAL CONTACT

SAMPLE COVERAGE POSSIBLE TO COVER WIDE RANGE OF SAMPLE POSSIBLE TO COVER ONLY LESS NUMBER OF SAMPLES IN A PARTICULAR TIME PERIOD.
OBSERVATIONAL METHODS OF DATA COLLECTION
Observation is one of the cheaper and effective techniques of data collection. The observation is understood as a systematic process of recording the behavioral patterns of people, objects and occurrence as they are witnessed.
Using this observational method of data collection the following data related to movements, work habits, the statements mad and meetings conducted,[by human begins] facial expressions, body language and other emotions such as joy, anger and sorrowness of the human beings can be collected.
Also other environmental factors includes layout, workflow patterns, physical arrangements can also be noted. In this method of data collection the investigator collects the data without interacting the respondents.
Example: Instead of asking about the brands of shirts or the program they watch on just observing their pattern.
The observation would be classified based on the role of researcher as
1. Non participant observer
2. Participant observer
It also can be classified based on the methods as
1. Structured observation
2. Un structured observation.
NON PARTICIPANT OBSERVER
The role of the investigator here would be an external agent who sits in a corner without interacting with the samples which is to be observed.
PARTICIPANT OBSERVER
Joins with the group and work along with them in way the work is done but many not be asking questions related to the research / investigation.
STRUCTURED OBSERVATION
In this case the researcher may have a predetermined set of categories of activities or phenomena planned to be studied.
Example: To observe the behavior pattern of individual when he/she go for purchasing to be planned in such a way where the frequency of purchasing, and the interest during the purchase and the way goods are preferred / selected. Any researcher would be having a plan on the observation to be made.
UNSTRUCTURED OBSERVATION
The researcher may not do the data collection based on the specific ideas. These sorts of methods are used more qualitative research studies.
Example: To observe the behavior pattern of individual when he/she go for purchasing to be planned in such a way where the frequency of purchasing, and the interest during the purchase and the way goods are preferred / selected.
MERITS OF OBSERVATIONAL STUDIES
• The data collected in this method are generally reliable and they are free from respondent’s bias.
• It is easier to observe the busy people rather meeting and collecting the data.
DEMERITS
• The observer has to be present in the situation where the data to be collected.
• This method is very slow and expensive.
• Since the observer is going to collect the data by consuming more time and observing the sample there is a possibility for biased information.
SURVEY METHOD
Survey is a popular method of “fact-finding” study which involves collection of data directly from the population at a particular time. Survey can be defined as a research technique in which information may be gathered from a sample of people by use of questionnaire or interview.
Survey is considered as field study which is conducted in a natural setting which collects information directly from the respondents. Surveys are been conducted for many purposes. Population census, socio economic surveys, expenditure surveys and marketing surveys. The purpose of these surveys would be providing information to the government planners or business enterprises. The surveys are also conducted to explain phenomena where in causal relationship between two variables to be assessed.
Surveys are been used to compare the demographic groups such as to compare the high and low income groups, to compare the preference based on age. These surveys are been conducted in the care of descriptive type of research and in which large samples are focused.
The surveys are more appropriate in the social and behavioral science. Surveys are more convened with formulating the hypothesis and testing the relationship between non- manipulated variables. The survey research requires skillful workers to gather data.
The subjects for surveys may be classified as social surveys and economic surveys.
SOCIAL SURVEYS WHICH INCLUDES
• Demographic characteristics of a group of people.
• The social environment of people
• People’s opinion & attitudes.
ECONOMIC SURVEYS
• Economic conditions of people.
• Operations of economic system.
THE IMPORTANT STAGES IN SURVEY METHODS
SELECTING THE UNIVERSE OF THE FIELD

CHOOSING SAMPLES FROM THE UNIVERSE

DECIDING ON TOOLS USED FOR DATA COLLECTION

ANALYZING THE DATA
OTHER METHODS OF DATA COLLECTION
WARRANTY CARDS
A type of post card with few focused typed questions may be used by the dealers / retailers to collect information from the customers. The dealers / researchers may request customers to fill in the required data.
DISTRIBUTOR AUDITS
These sorts of data may be collected by distributors to estimate the market size, market share and seasonal purchasing pattern. This information is collected by observational methods.
Example: Auditing the provisional stores and collecting data on inventories recorded by copying information from store records.
PANTRY AUDITS
This method is used to estimate the consumption of the basket of the goods at the consumer level. Here the researcher collects the inventory of types, quantities and prices of commodities consumed. Thus pantry audit data are recorded from the consumption of consumer’s pantry. An important limitation of pantry audit is that, sometimes the audit data alone may not be sufficient to identify the consumer’s preferences.
CONSUMER PANELS
An extension pf Pantry Audit approach on regular basis is known as Consumer Panels. A set of consumers are arranged to come to an understanding to maintain a daily records of their consumption and the same is made available to researcher on demand.
In other words Consumer Panel is a sample of consumers interviewed repeatedly over a period of time.
FIELD WORKS
To collect the primary data any researcher or investigator may are different methods wherein he/ she goes to door to door, use telephone to collect data.
SUMMARY
Since there are various methods of data collection, the researcher must select the appropriate data. Hence, the following factors to be kept in mind by the researcher:
1. Nature, Scope and Object of the Enquiry
2. Availability of funds
3. Time factor
4. Precision required
KEY TERMS
• Primary data
• Secondary data
• Interview
• Questionnaire
• Schedule
• Observation
• Survey
• Warranty cards
• Pantry Audits
• Distributor Audits
• Consumer Panels
QUESTIONS
1. Describe the different data sources, explaining their advantages and disadvantages.
2. What is bias and how it can be reduced during interviews?
3. “Every data collection method has its own built in biases. Therefore resorting to multi methods of data collection is only going to compound the biases”. How would you evaluate the statement?
4. Discuss the role technology in data collection.
5. What is your view on using the warranty cards and Distributor audits in data collection?
6. Differentiate the questionnaire and Interview schedules to decide the best one.
7. Discuss the main purposes for which Survey methods are used.



















LESSON – 7
QUESTIONNAIRE DESIGN
OBJECTIVES
• To recognize the importance and relevance of questionnaire design
• To recognize that the type of information will influence the structure of questionnaire
• To understand the role data collection method in designing questionnaire
• To understand how to plan and design without mistakes, and improve its layout
• To know importance of pretesting.
STRUCTURE
 Questionnaire design
 Phrasing question
 Act of asking question
 Layout of traditional questionnaires.
Many experts in survey research believe that improving the wording of questions can contribute far more to accuracy than can improvements in sampling. Experiments have shown that the range of error due to vague questions or use of imprecise words may be as high as 20 or 30 percent. Consider the following example, which illustrates the critical importance of selecting the word with the right meaning. The following questions differ only in the use of the words should, could, and might:
• Do you think anything should be done to make it easier for people to pay doctor or hospital bills?
• Do you think anything could be done to make it easier for people to pay doctor or hospital bills?
• Do you think anything might be done to make it easier for people to pay doctor or hospital bills?
The results from the matched samples: 82 percent replied something should be done, 77 percent replied something could be done, and 63 percent replied something might be done. Thus, a 19 percent difference occurred between the two extremes, should and might. Ironically, this is the same percentage point error as in the Literary Digest Poll, which is a frequently cited example or error associated with sampling.
The chapter outlines procedure for questionnaire design and illustrates that a little bit of research knowledge can be a dangerous thing.
A SURVEY IS ONLY AS GOOD AS THE QUESTIONS IT ASKS
Each stage of the business research process is important because of its interdependence with other stages of the process. However, a survey is only as good as the questions it asks. The importance of wording questions is easily overlooked, but questionnaire design is one of the most critical stages in the survey research process.
“A good questionnaire appears as easy to compose as does a good poem. But it is usually the result of long, painstaking, word.” Business people who are inexperienced in business research frequently believe that constructing a questionnaire in a matter of hours, unfortunately, newcomers who naively believe that common sense and good grammar are all that are needed to construct a questionnaire generally learn that their hasty efforts are inadequate.
While common sense and good grammar are important in question writing, more is required in the art of questionnaire design. To assume that people will understand the questions is a common error. People simply may not know what is being asked. They may be unaware of the product or topic interest, they may confuse the subject with something else, or the question may not mean the same thing to everyone interviewed. Respondents may refuse to answer personal questions. Further, properly wording the questionnaire is crucial, as some problems may be minimized or avoided altogether if a skilled researcher composes the questions.
QUESTIONNAIRE DESIGN: AN OVERVIEW OF THE MAJOR DECISIONS
Relevance and accuracy are the two basic criteria a questionnaire must meet if it is to achieve the researcher’s purpose. To achieve these ends, a researcher who systematically plans a questionnaire’s design will be required to make several decisions- typically, but not necessarily, in the order listed below :
1. What should be asked?
2. How should each question be phrased?
3. In what sequence should the questions be arranged?
4. What questionnaire layout will best serve the research objectives?
5. How should the questionnaire be pretested? Does the questionnaire need to be revised?
WHAT SHOULD BE ASKED?
During the early stages of the research process, certain decisions will have been made that will influence the questionnaire design. The preceding chapters stressed the need to have a good problem definition and clear objectives for the study. The problem definition will indicate which type of information must be collected to answer the manager’s questions; different types of questions may be better at obtaining certain types of information than others. Further, the communication mediums used for data collection - telephone interview, personal interview, or self-administered survey-will have been determined. This decision another forward linkage that influences the structure and content of the questionnaire. The specific questions to be asked will be a function of the pervious decisions later stages of the research process also have an important impact o questionnaire wording. For example, determination of the questions that should be asked will be influenced by the requirements for data analysis. As the questionnaire is being designed, the researcher should be thinking about the types of statistical analysis that will be conducted.
QUESTIONNAIRE RELEVANCY
A questionnaire is relevant if no unnecessary information is collected and if the information that is needed to solve the business problem is obtained. Asking the wrong or an irrelevant question is a pitfall to be avoided. If the task is to pinpoint compensation problems, for example, questions asking for general information about morale may be inappropriate. To ensure information relevancy, the researcher must be specific about data needs, and there should be a rationale for each item of information.
After conducting surveys, many disappointed researchers have discovered that some important questions were omitted. Thus, when planning the questionnaire design, it is essential to think about possible omissions. Is information being collected on the relevant demographic and psychographic variables? Are there any questions that might clarify the answers to other questions? Will the results of the study provide the solution to the manager’s problem?
QUESTIONNAIRE ACCURACY
Once the researcher has decided what should be asked, the criterion of accuracy becomes the primary concern. Accuracy means that the information is reliable and valid while experienced researchers generally believe that one should use simple, understandable, unbiased, unambiguous, nonirritating words, no step-by-step procedure to ensure accuracy in question writing can be generalized across projects. Obtaining accurate answers from respondents is strongly influenced by the researcher’s ability to design a questionnaire that facilitates recall and that will motivate the respondent to cooperate.
Respondent tend to be most cooperative when the subjects of the research is interesting. Also, if questions are not lengthy, difficult to answer, or ego threatening, there is higher probability of obtaining unbiased answers, question wording and sequence substantially influence accuracy. These topics are treated in subsequent sections of this chapter.
PHRASING QUESTIONS
There are many ways to phrase question, and many standard question formats have been developed in previous research studies. This section presents a classification of question types and provides some helpful guidelines to researchers who must write questions.
OPEN-ENDED RESPONSE VERSUS FIXED-ALTERNATIVE QUESTIONS
Questions may be categorized as either of two basic types, according to the amount of freedom respondents are given in answering them. Response questions pose some problem or topic and ask the respondent to answer in his or her own words. For example:
What things do you like most about your job?
What names of local banks can you think of offhand?
What comes to mind when you look at this advertisement?
Do you think that there are some ways in which life in the United States is getting worse? How is that?
If the question is asked in a personal interview, the interviewer may probe for more information by asking such questions as: Anything else? Or could you tell me more about your thinking on that? Open-ended response questions are free-answer questions. They may be contrasted to the fixed-alternative question, sometimes called a “closed question,” in which the respondent is given specific, limited-alternative responses and asked to choose the one closest to his or her own viewpoint. For example:
Did you work overtime or at more than one job last week?
Yes ________ No_________
Compared to ten years ago, would you say that the quality of most products made in Japan is higher, about the same, or not as good?
Higher __________ about the name __________ Not as good _____
Open-ended response questions are most beneficial when the researcher is conducting exploratory research, especially if the range of responses is not known. Open-ended questions can be used to learn what words and phrases people spontaneously give to the free-response questions. Respondents are free to answer with whatever is uppermost in their thinking. By gaining free and uninhibited responses, a researcher may find some unanticipated reaction toward the topic. As the responses have the “flavor” of the conversational language that people use in talking about products or jobs, responses to these questions may be a source for effective communication.
Open-ended response questions are especially valuable at the beginning of an interview. They are good first questions because they allow respondents up warm up to the questioning process.
The cost of open-ended response questions is substantially greater than that of fixed-alternative questions, because the job of coding, editing and analyzing the data is quite extensive. As each respondent’s answer is somewhat unique, there is some difficulty in categorizing and summarizing the answers. The process requires an editor to go over a sample pf questions to classify the responses in to some sort of scheme, then all the answers are received and coded according to the classification scheme.
Another potential disadvantage of the open-ended response question is that interviewer no as may influence the responses. While most instructions state that the interviewer is to record answers verbatim, rarely can even the best interviewer get every word spoken by the respondent. There is a tendency for interviewer to take short cuts in recording answers – but changing even a few of the respondents’ words may substantially influence the results. Thus, the final answer often is a combination of the respondent’s and the interviewer’s ideas rather than the respondent’s ideas alone.
The simple-dichotomy, or dichotomous-alternative, question requires the respondent to choose one of two alternatives. The answer can be a simple “yes” or “no” or a choice between “this” and “that” for example:
Did you make any long-distance calls last week?
Yes No
Several types of questions provide the respondent with multiple-choice alternatives. The determinant-choice requires the respondent to choose one and only one – response from among several possible alternatives. For example:
Please give us some information about your flight. In which section of the aircraft did you sit?
First class Business class Coach class
The frequency –determination question is a determinant-choice question that asks for an answer about general frequency of occurrence. For example:
How frequency do you watch the MTV television channel?
Every day ………………………………………………………..
5 – 6 times a week ………………………………………………
2 – 4 times a week ……………………………………………..
Once a week …………………………………………………….
Less than once a week ………………………………………..
Never …………………………………………………………….
Attitude rating scales, such as the Likert scale, semantic differential, and Staple scale, are also fixed-alternative questions.
The checklist question allows the respondent to provide multiple answers to a single question. The respondent indicates past experience, preference, and the like merely by checking off an item. In many cases the choices are adjectives that describe a particular object. A typical checklist follows:
Please check which of the following sources of information about investments you regularly use, if any.

Personal advice of your brokers(s)
Brokerage newsletters
Brokerage research reports
Investment advisory service(s)
Conversations with other investors
Reports on the Internet
None of these
Other (please specify) ________________________
Most questionnaires include a mixture of open-ended and closed questions. Each form has unique benefits; in addition, a change of pace can eliminate respondent boredom and fatigue.
PHRASING QUESTIONS FOR SELF-ADMINISTERED, TELEPHONE, AND PERSONAL INTERVIEW SURVEYS
The means of data collection (personal interview, telephone, mail, or Internet questionnaire) will influence the question format and question phrasing. In general, questions for mail and telephone surveys must be less complex than those utilized in personal interviews. Questionnaires for telephone and personal interviews should be written in a conversational style. Consider the following question from a personal interview:
There has been a lot of discussion about the potential health threat to nonsmokers from tobacco smoke in public building, restaurants, and business offices. How serious a health threat to you personally is the inhaling of this secondhand smoke, often called passive smoking: Is it a very serious health threat, somewhat serious, not too serious, or not serious at all?
1. Very serious
2. Somewhat serious
3. Not too serious
4. Not serious at all
5. (Don’t know)
THE ART OF ASKING QUESTIONS
In develop a questionnaire, there are no hard-and-fast rules. Fortunately, however, some guidelines that help to prevent the most common mistakes have been developed from research experience.
1. AVOID COMPLEXITY: USE SIMPLE, CONVERSATIONAL LANGUAGE
Words used in questionnaires should be readily understandable to all respondent. The researcher usually has the difficult task of adopting the conversational language of people from the lower educational levels without talking down to better-educated respondents. Remember, not all people have the vocabulary of a college student. A substantial number of Americans never go beyond high school.
Respondents can probably tell an interviewer whether they are married, single, divorced, separated, or windowed, but providing their “marital status” may present a problem. Also, the technical jargon of corporate executives should be avoided when surveying retailers, factory employees, or industrial users. “Marginal analysis,” “decision support systems,” and other words from the language of the corporate staff will not have the same meaning to-or be understood by – a store (owner-operator in a retail survey. The vocabulary in following question (from an attitude survey on social problems) is probably confusing for many respondents:
When effluents from a paper mill can be drunk and exhaust from factory smokestacks can be breather, then humankind will have done a good job in saving the environment … Don’t you agree that what we want is zero toxicity: no effluents?
This lengthy question is also a leading question.
2. AVOID LEADING AND LOADED QUESTIONS
Leading and loaded questions are a major source of bias in question wording. Leading Questions suggest or imply certain answers. In a study of the dry-cleaning industry, this question was asked:
Many people are using dry cleaning less because of improved wash-and-wear clothes. How do you feel wash-and-wear clothes have affected your use of dry-cleaning facilities in the past 4 years?
_____ Use less ______ No change ______ use more
The potential “bandwagon effect” implied in this question threatens the study’s validity.
Loaded questions suggest a socially desirable answer or are emotionally charged. Consider the following:
In light of today’s farm crisis, it would be in the public’s best interest to have the federal government require labeling of imported meat.
_____ Strongly ______ Agree ______ Uncertain ______ Disagree
______ Strongly Disagree
Answers might be different if the loaded portion of the statement, “farm crisis” had another wording suggesting a problem of less magnitude than a crisis. A television station produced the following 10-second spot asking for viewer feedback:
We are happy when you like programs on Channel 7. We are sad when you dislike programs on Channel 7. Write us and let us know what you think of or programming.
Most people do not wish to make others sad. This question is likely to elicit only positive comments. Some answers to certain questions are more socially desirable than others. For example, a truthful answer to the following classification question might be painful

Where did you rank academically in your high school graduating class?
Top quarter 2nd quarter 3rd quarter 4th quarter
When taking personality tests, respondents frequently are able to determine which answers are most socially acceptable, even though those answers do not portray their true feelings.
3. AVOID AMBIGUITY: BE A SPECIFIC AS POSSIBLE
Items on questionnaires are often ambiguous because they are too general. Consider indefinite words such as often, usually, regularly, frequently, many, good, fair, and poor. Each of these words has many meanings. For one per-son, frequent reading of Fortune magazine may be reading six or seven issues a year; for another it may be two issues a year. The word fair has a great variety of meanings; the same is true for many indefinite words.
Questions such as the following should be interpreted with care:
How often do you feel that you can consider all of the alternatives before making a decision to follow a specific course of action?
_______Always ________Fairly _________Occasionally _________ Seldom
_______ Never Often
In addition to utilizing words like occasionally, this question asks respondents to generalize about their decision-making behavior. The question is not specific. What does consider mean? The respondents may have a tendency to provide stereotyped “good” management responses rather than to describe their actual behavior. People’s memories are not perfect. We tend to remember the good and forget the bad.
4. AVOID DOUBLE- BARRELED ITEMS
A question covering several issues at once is referred to as double-barreled and should always be avoided. It’s easy to make the mistake of asking two questions rather than one. For example, “Please indicate if you agree or disagree with the following statement: ‘I have called in sick or left work to golf.” Which reason is it: calling in sick or leaving work (perhaps with permission) to play golf?
When multiple questions are asked in one question, the results may be exceedingly difficult to interpret. For example, consider the following question from a magazine survey entitled “How Do You Feel about Being a Woman?”
Between you and your husband, who does the housework (cleaning, cooking, dishwashing, laundry) over and above that done by any hired help?
I do all of it
I do almost all if it
I do over half of it
We split the work fifty-fifty
My husband does over half of it
The answers to this question do not tell us if the wife cooks and the husband dries the dishes
5. AVOID MAKING ASSUMPTIONS
Consider the following question:
Should Mary’s continue its excellent gift-wrapping program?
Yes No
The question contains the implicit assumption that people believe the gift-wrapping program is excellent. By answering yes, the respondent implies that the program is, in fact, excellent and that things are just fine as they are. By answering no, he or she implies that the store should discontinue the gift wrapping. The researcher should not place the respondent in that sort of bind by including an implicit assumption in the question.
6. AVOID BURDENSOME QUESTIONS THAT MAY TAX THE RESPONDENT’S MEMORY
A simple fact of human life is that people forget. Researchers writing questions about past behavior or events should recognize that certain questions may make serious demands on the respondent’s memory. Writing questions about prior events requires a conscientious attempt to minimize the problem associated with forgetting.
It many situations, respondents cannot recall the answer to a question. For example, a telephone survey conducted during the 24-hour period following airing of the Super Bowl might establish whether the respondent watched the Super Bowl and then ask” “Do you recall any commercials on that program?” If the answer is positive, the interviewer might ask: “what brands were advertised?” These two questions measure unaided recall, because they give the respondent no clue as to the brand of interest.
WHAT IS THE BEST QUESTION SEQUENCE?
The order of questions, or the question sequence, may serve several functions for the researcher. If the opening questions are interesting, simple to comprehend, and easy to answer, respondents’ cooperation and involvement can be maintained throughout the questionnaire. Asking easy-to-answer questions teaches respondents their role and builds confidence; they know this is a researcher and not another salesperson posing as an interviewer. If respondents’ curiosity is not aroused at the outset, they can become disinterested and terminate the interviewer. A mail research expert reports that a mail survey terminate the interview. A mail research expert reports that a mail survey among department store buyers drew an extremely poor return. However, when some introductory questions related to the advisability of congressional action on pending legislation of great importance to these buyers were placed first on the questionnaire, a substantial improvement in response rate occurred. Respondents completed all the questions, not only those in the opening section.
In their attempts to “warm up” respondents toward the questionnaire dependent researchers frequently ask demographic or classification questions at the beginning of the questionnaire. This is generally not advisable. May embarrass or threaten respondents. It is generally better to ask embarrassing questions at the middle or end of the questionnaire, after rapport has been established between respondent and interviewer.
Sequencing specific questions before asking about broader issues is a common cause of order bias. For example, bias may arise if questions about a specific clothing store are asked prior to those concerning the general criteria for selecting a clothing store. Suppose a respondent who indicates in the first portion of a questionnaire that the shops at a store where parking needs to be improved. Later in the questionnaire, to avoid appearing inconsistent, she may state that parking is less important a factor than she really believes it is. Specific questions may thus influence the more general ones. Therefore, it is advisable to ask general questions before specific questions to obtain the freest of open ended responses. This procedure, known as the funnel technique, allows the researcher to understand the respondent’s frame of reference before asking more specific questions about the level of the respondent’s information and the intensity of his or her opinions.
One advantage of Internet surveys is the ability to reduce order bias by having the computer randomly order questions and/or response alternatives. With complete randomization, question order is random and respondents see response alternatives in random positions. Asking a question that doesn’t apply to the respondent or that the respondent is not qualified to answer may be irritating or may cause a biased response because the respondent wishes to please the interviewer or to avoid embarrassment. Including a filter question minimizes the chance of asking questions that are inapplicable. Asking “where do you generally have check-cashing problems in Delhi” may elicit a response even though the respondent has not had any check-cashing problems and may simply wish to please the interviewer with an answer. A filter question such as
Do you ever have a problem cashing a check in Delhi?
_____ Yes ______ No
would screen out the people who are not qualified to answer. Exhibit 15.2 gives an example of a flowchart plan for a questionnaire that uses filter questions.
Another form of filter question, the pivot question, can be used to obtain income information and other data that respondents may be reluctant to provide. For example, a respondent is asked.
“Is your total family income over Rs. 50,000?” IF UNDER , ASK,
“Is it over or under Rs. 25,000?” IF OVER, ASK, “Is it over or under Rs.75,000?”
1. Under Rs. 25,000
2. Rs. 25,001 - Rs. 50,000
3. Rs. 50,001 - Rs. 75,000
4. Over Rs. 75,000
Structuring the order of questions so that they are logical will help to ensure the respondent’s cooperation and eliminate confusion or indecision. The researcher maintains legitimacy by making sure that the respondent can comprehend the relationship between a given question and the overall purpose of the study.
WHAT IS THE BEST LAYOUT?
Good layout and physical attractiveness are crucial in mail, Internet, and other self-administered questionnaires. For different reasons it is also important to have a good layout in questionnaires designed for personal and telephone interviews.
LAYOUT OF TRADITIONAL QUESTIONNAIRES
The layout should be neat and attractive, and the instructions for the interviewer (all boldface capital letters should easy to follow. Often rate of return can be increased by using money that might have been spent on an incentive to improve the attractiveness and quality of the questionnaire. Mail questionnaires should never be overcrowded. Margins should be of decent size, white space should be used to separate blocks of print, and any unavoidable columns of multiple boxed should be kept to a minimum. Questionnaires should be designed to appear as brief and small as possible. Sometimes it is advisable to use a booklet form of questionnaire, rather than a large number of pages stapled together. In situations where it is necessary to conserve space on the questionnaire or to facilitate entering the data into a computer or tabulating the data, a multiple-grid layout may be used. In this type of layout, a question is followed by corresponding response alternatives arranged in a grid or matrix format.
Experienced researchers have found that is pays to phrase the title of the questionnaire carefully. In self-administered and mail questionnaires a carefully constructed title may by itself capture the respondent’s interest, underline the important of the research (“Nationwide Study of Blood donors”), emphasize the interesting nature of the study (“Study of Internet Usage”), appeal to the respondent’s ego (“Survey among Top Executives”), or emphasize the confidential nature of the study (“A Confidential Survey among . . .”). The researcher should take steps to ensure that the wording of the title will not bias the respondent in the same way that a leading question might.
When an interviewer is to administer the questionnaire, the analyst can design the questionnaire to make the job of following interconnected questions much easier by utilizing instruction, directional arrows, special question formats, and other tricks of the trade.
SUMMARY
Many novelists write, rewrite, and revise certain chapters, paragraphs, and even sentences of their books. The research analyst lives in a similar world. Rarely does one write only a first draft of a questionnaire. Usually, the questionnaire is tried out on a group that is selected on a convenience basis and that is similar in makeup to the one that ultimately will be sampled. Researchers should select a group that is not too divergent from the actual respondents. (e.g., business students as surrogates for businesspeople), but it is not necessary to get a statistical sample for protesting. The protesting process allows the researchers to determine if the respondents have any difficulty understanding the questionnaire and whether there are any ambiguous or biased questions. This process is exceedingly beneficial. Making a mistake with 25 or 50 subjects can avert the disaster of administering an invalid questionnaire to several hundred individuals.
KEY TERMS
• Open ended response questions
• Fixed alternative questions
• Leading question
• Loaded question
• Double-barreled question
• Funnel technique
• Filter question
• Pretesting
QUESTIONS
1. What is the difference between leading question and loaded question?
2. Design an open end question to measure a reaction to an particular advertisement.
3. Design a complete a questionnaire to evaluate job satisfaction.
4. Develop a checklist to consider in questionnaire construction.


LESSON – 8
TRAINING OF THE FIELD INVESTIGATORS
OBJECTIVES
• To recognize that field work can be performed by many different parties.
• To understand the importance of training for new interviewers
• To understand the principle tactics of asking questions
STRUCTURE
 Interviewing
 Training for Inter viewing
 The major principle for asking question
 Probing
 Recording the response
INTRODUCTION
A personal interviewer administering a questionnaire door to door, a telephone interviewer calling from a central location, an observer counting pedestrians in a shopping mall, and others involved in the collection of data and the supervision of that process are all Field Workers in the field. The activities of the field workers may vary in nature. This lesson help us to understand the interview methods in data collection process of the research and field work management.
WHO CONDUCTS THE FIELD WORK?
Data collection is rarely carried out by the person who designs the research project. However, the data-collecting stage is crucial, because the research project is no better than the data collected in the field. Therefore, it is important that the research administrator select capable people who may be entrusted to collect the data. An irony of business research is that highly educated and trained individuals design the research, but the people who collect the data typically have little research training or experience. Knowing the vital importance of data collected in the field, research administrators must concentrate on carefully selecting field workers.
INTERVIEWING
Interviewing process is establishing rapport with the respondent Interviewer bias may enter in if the fieldworker’s clothing or physical appearance is unattractive or unusual. Suppose that a male interviewer, wearing a dirty T-shirt, interviews subjects in an upper-income neighborhood. Respondents may consider the interviewer slovenly and be less cooperative than they would be with a person dressed more appropriately.
Interviewers and other fieldworkers are generally paid an hourly rate or a per-interview fee. Often interviewers are part-time workers—housewives, graduate students, secondary school teachers—from diverse backgrounds. Primary and secondary school teachers are an excellent source for temporary interviewers during the summer, especially when they conduct interviews outside the school districts where they teach. Teachers’ educational backgrounds and experiences with the public make them excellent candidates for fieldwork.
TRAINING FOR INTERVIEWERS
The objective of training is to ensure that the data collection instrument is administered uniformly by all field investigators. The goal of training sessions is to ensure that each respondent is provided with common information. If the data are collected in a uniform manner from all respondents, the training session will have been a success. After personnel are recruited and selected, they must be trained.
Ex: A woman who has just sent her youngest child off to first grade is hired by an interviewing firm. She has decided to become a working mother by becoming a professional interviewer. The training that she will receive after being selected by a company may vary from virtually no training to a 3-day program if she is selected by one of the larger survey research agencies. Almost always there will be a briefing session on the particular project. Typically, the recruits will record answers on a practice questionnaire during a simulated training interview.
MORE EXTENSIVE TRAINING PROGRAMS ARE LIKELY TO COVER THE FOLLOWING TOPICS
1. How to make initial contact with the respondent and secure the interview
2. How to ask survey questions
3. How to probe
4. How to record responses
5. How to terminate the interview
MAKING INITIAL CONTACT AND SECURING THE INTERVIEW
Interviewers are trained to make appropriate opening remarks that will convince the person that his or her cooperation is important.
FOR EXAMPLE
GOOD AFTERNOON, MY NAME IS _____ AND I’M FROM A NATIONAL SURVEY RESEARCH COMPANY. WE ARE CONDUCTING A SURVEY CONCERNING ____. I WOULD LIKE TO GET A FEW OF YOUR IDEAS.
Much fieldwork is conducted by research suppliers who specialize in data collection, When a second party is employed, the job of the study designed by the parent firm is not only to hire a research supplier but also to establish supervisory controls over the field service.
In some cases a third party is employed. For example, a firm may contact a survey research firm, which in turn subcontracts the fieldwork to a field se vice. Under these circumstances it is still desirable to know the problems that might occur in the field and the managerial practices that can minimize them.
ASKING THE QUESTIONS
The purpose of the interview is, of course, to have the interviewer ask questions and record the respondent’s answers- Training in the art of stating questions can be extremely beneficial, because interviewer bias can be a source of considerable error in survey research.
THERE ARE FIVE MAJOR PRINCIPLES FOR ASKING QUESTIONS
1. Ask the questions exactly as they are worded in the questionnaire.
2. Read each question very slowly.
3. Ask the questions in the order in which they are presented in the questionnaire.
4. Ask every question specified in the questionnaire.
5. U Repeat questions those are misunderstood or misinterpreted.
Although interviewers are generally trained in these procedures, when working in the field many interviewers do not follow them exactly. Inexperienced interviewers may not understand the importance of strict adherence to the instructions. Even professional interviewers take shortcuts when the task becomes monotonous. Interviewers may shorten questions or rephrase unconsciously when they rely on their memory of the question rather than reading the question as it is worded. Even the slightest change in wording can distort the meaning of the question and cause some bias to enter into a study. By reading the question, the interviewer may be reminded to concentrate on avoiding slight variations in tone of voice on particular words phases in the question.
PROBING
General training of interviewers should include instructions on how to probe when respondents give no answer, incomplete answers, or answers that require clarification. Probing may be needed for two types of situations. First, it is necessary when the respondent must be motivated to enlarge on, clarify or explain his or her answer. It is the interviewer’s job to probe for complete, unambiguous answers. The interviewer must encourage the respondent to clarify or expand on answers by providing a stimulus that will not suggest the interviewer’s own ideas or attitudes. The ability to probe with neutral stimuli is the mark of an experienced interviewer. Second, probing may be necessary in situations in which the respondent begins to ramble or lose track of the question. In such cases the respondent must be led to focus on the specific content of the interview and to avoid irrelevant and unnecessary information.
THE INTERVIEWER HAS SEVERAL POSSIBLE PROBING TACTICS TO CHOOSE FROM, DEPENDING ON THE SITUATION
Repetition of the question: The respondent who remains completely silent may not have understood the question or may not have decided how to answer it. Mere repetition may encourage the respondent to answer in such cases. For example, if the question is “What is there that you do not like about your supervisor?” and the respondent does not answer, the interviewer may probe: “Just to check, is there anything you do not like about your supervisor?”
An expectant pause: If the interviewer believes the respondent has more to say, the “silent probe:’ accompanied by an expectant look, may motivate the respondent to gather his or her thoughts and give a complete response Of course, the interviewer must be sensitive to the respondent so that the silent probe does not become an embarrassed silence.
Repetition of the Respondent’s Reply: Sometimes the interviewer may repeat the verbatim of the respondent. This may help the respondent to expand the answer.
RECORDING THE RESPONSES
The analyst who fails to instruct fieldworkers in the techniques of recording answers for one study rarely forgets to do so in the second study. Although the concept of recording an answer seems extremely simple, mistakes can be made in the recording phase of the research. All fieldworkers should use the same mechanics of recording.
Example: It may appear insignificant to the interviewer whether she uses a pen or pencil, but to the editor who must erase and rewrite illegible words, using a pencil is extremely important.
The rules for recording responses to closed questionnaires vary with the specific questionnaire. The general rule, however, is to place a check in the box that correctly reflects the respondent’s answer. All too often interviewers don’t bother recording the answer to a filter question because they believe that the subsequent answer will make the answer to the filter question obvious. However, editors and coders do not know how the respondent actually answered a question.
The general instruction for recording answers to open-ended-response questions is to record the answer verbatim, a task that is difficult for most people. Inexperienced interviewers should be given the opportunity to practice
The Interviewer’s Manual of the Survey Research Center provides the instructions on the recording of interviews. Some of its suggestions
• Recording answers to open-ended-response questions follow:
• Record the responses during the interview.
• Use the respondent’s own words.
• Do not summarize or paraphrase the respondent’s answer.
• Include everything that pertains to the question objectives.
• Include all of your probes;
THE BASICS OF EFFECTIVE INTERVIEWING
Interviewing is a skilled occupation; not everyone can do it, and even few can do it extremely well. A good interviewer observes the following principles:
1. Have integrity and be honest.
2. Have patience and tact.
3. Pay attention to accuracy and detail.
4. Exhibit the real enquiry at hand, but keep your own opinions to yourself.
5. Be a good listener.
6. Keep the inquiry and respondent’s responses confidential. Respect other’s rights
TERMINATING THE INTERVIEW
The final aspect of training deals with instructing the interviewers on how to close the interview. Fieldworkers should not close the interview before pertinent information has been secured. The interviewer whose departure hasty will not be able to record those spontaneous comments responds sometimes offer after all formal questions have been asked. Avoiding hasty departures is also a matter of courtesy.
Fieldworkers should also answer to the best of their ability any quest the respondent, concerning the nature and purpose of the study. Beat the fieldworker may be required to reinterview the respondent at some his time, he or she should leave the respondent with a positive feeling about having cooperated in a worthwhile undertaking. It is extremely important thank the respondent for his or her cooperation.
FIELDWORK MANAGEMENT
Managers of the field operation select, train, supervise, and control fieldworkers. Our discussion of fieldwork principles mentioned selection and training. This section investigates the tasks of the fieldwork manager in greater detail.
BRIEFING SESSION FOR EXPERIENCED INTERVIEWERS
After interviewers have been trained in fundamentals, and even when they have become experienced, it is always necessary to inform workers about the individual project. Both experienced and inexperienced fieldworkers must be instructed on the background of the sponsoring organization, sampling techniques, asking questions, callback procedures, and other matters specific to the project.
If there are special instructions—for example, about using show cards or video equipment or about restricted interviewing times—these should also be covered during the briefing session. Instructions for handling certain key questions are always important. For example, the following fieldworker instructions appeared in a survey of institutional investors who make buy-and sell decisions about stocks for banks, pension funds, and the like.
A briefing session for experienced interviewers might go like, All interviewers report to the central office, where the background of the firm and the general aims of the study are briefly explained. Interviewers are not provided with too much information about the p of the study, thus ensuring that they will not transmit any preconceived notions to respondents. For example, in a survey about the banks in a community, the interviewers would be told that the research is a banking study, but not the name of the sponsoring bank. To train the interviewers about the questionnaire, a field supervisor conducts an interview with another field supervisor who acts as a respondent. The trainees observe the interviewing process, after which they each interview and record the responses of another field supervisor. Additional instructions are given to the trainees after the practice interview.
TRAINING TO AVOID PROCEDURAL ERRORS IN SAMPLE SELECTION
The briefing session also covers the sampling procedure. A number of research projects allow the interviewer to be at least partially responsible for selection of the sample. When the fieldworker has some discretion in the selection of respondents, the potential for selection bias exists. This is obvious in the case of quota sampling, but less obvious in other cases.
Example: in probability sampling where every nth house is selected, the fieldworker uses his or her discretion in identifying housing units. Avoiding selection error may not be as simple as it sounds.
Example: in an older, exclusive neighborhood, a mansion’s coach house or servants’ quarters may have been converted into an apartment that should be identified as a housing unit. This type of dwelling and other unusual housing units (apartments with alley entrances only, lake cottages, rooming houses) may be overlooked, giving rise to selection error. Errors may also occur in the selection of random-digit dialing samples Considerable effort should be expended in training and supervisory control to minimize these errors.
The activities involved in collecting data in the field may be performed by the organization needing information, by research suppliers, or by third-party field service organizations. Proper execution of fieldwork is essential for producing research results without substantial error.
Proper control of fieldwork begins with interviewer selection. Fieldworkers should generally be healthy, outgoing, and well groomed. New fieldworkers must be trained in opening the interview, asking the questions, probing for additional information, recording the responses, and terminating the interview. Experienced fieldworkers are briefed for each new project so that they are familiar with its specific requirements. A particular concern of the briefing session is reminding fieldworkers to adhere closely to the prescribed sampling procedures.
Careful supervision of fieldworkers is also necessary. Supervisors gather and edit questionnaires each day. They check to see that field procedures are properly followed and that interviews are on schedule. They also check to be sure that the proper sampling units are used and that the proper people are responding in the study. Finally, supervisors check for interviewer cheating and verify a portion of the interviews by reinterviewing a certain percentage of each fieldworker’s respondents.
SUMMARY
This paper outlined the importance of training for new interviewers. In this chapter five major principles for asking questions have been dealt in detail.
KEY TERMS
• Field worker
• Probing
• Field interviewing
• Briefing session
• Training Interview
• Reinterviewing
QUESTIONS
1. What Qualities should a field worker possess?
2. What is the proper method of asking Questions?
3. When should an interviewer Probe? Give examples of how probing should be done?
4. How should an Interviewer terminate the interview?
5. What are the Qualities of the Interviewer make him more effective?
REFERENCES
1. Ramanuj Majumdar, Marketing Research, Wiley Eastern Limited New Delhi (1991)
2. Cochran, W.G., Sampling Techniques, 2nd ed. New York: John Wiley and Sons.
3. Chaturvedi, J.C., Mathematical Statistics, Agra: Nok Jhonk Karyalaya 1953.






LESSON – 9
TABULATION OF DATA
STRUCTURE
 Table
 Relations frequency table
 Cross tabulation and stub-and-banner tables
 Guideline for cross tabulation
INTRODUCTION
To get meaningful information from the data it is arranged in the tabular form. Frequency tables, histograms are simple form of tables.
FREQUENCY TABLES
Frequency table or frequency distribution is a better way to arrange data. It helps in compressing data. Though some information is lost, compressed data show a pattern clearly. For constructing a frequency table, the data are divided into groups of similar values (class) and then record the number of observations that fall in each group.
TABLE 5.2.1 FREQUENCY TABLE ON AGE WISE CLASSIFICATION OF RESPONDENTS
AGE OF RESPONDENTS
CLASS (GROUP OF SIMILAR VALUES) NUMBER OF RESPONDENTS
FREQUENCY
21 TO 30 15
31 TO 40 28
41 TO 50 45
51 TO 60 62
The data of collection days are presented in the following table as a frequency table. The number of classed can be increased by reducing the size of the class. The choice of class intervals is mostly guided by practical consideration rather than by rules. Class intervals in such a way that measurements are uniformly distributed over the class and the interval is not very large. Otherwise, the mid value will either overestimate or underestimate the measurement.
RELATIVE FREQUENCY TABLES
Frequency is total number of data points that fall within that class. Frequency of each value can also be expressed as a fraction or percentage of the total number of observations. Frequencies expressed in percentage terms are known as relative frequencies. A relative frequency distribution is presented in table 5.2.2.
It may be observed that the sum of all relative frequencies is 1.00 or 100 percent because frequency of each class has been expressed as a percentage of the total data.
TABLE: RELATIVE FREQUENCY TABLE ON OCCUPATION WISE CLASSIFICATION OF RESPONDENTS
OCCUPATION FREQUENCY RELATIVE FREQUENCY
STUDENT 6 0.24
EMPLOYEE 12 0.48
ENTREPRENEUR 4 0.16
OTHERS 3 0.12
TOTAL 25 1.00
CUMULATIVE FREQUENCY TABLES
Frequency or one-way tables represent the simplest method for analyzing categorical data. They are often used as one of the exploratory procedures to review how different categories of values are distributed in the sample.
For example, in a survey of spectator interest in different sports, we could summarize the respondents' interest in watching football in a frequency table as follows:
TABLE CUMULATIVE FREQUENCY TABLE ON STATISTICS ABOUT FOOTBALL WATCHERS
FOOTBALL: "WATCHING FOOTBALL"
CATEGORY FREQUENCY CUMULATIVE
FREQUENCY PERCENTAGE CUMULATIVE
PERCENTAGE
ALWAYS : ALWAYS INTERESTED
USUALLY : USUALLY INTERESTED
SOMETIMS: SOMETIMES INTERESTED
NEVER : NEVER INTERESTED 39
16
26
19 39
55
81
100 39.00
16.00
26.00
19.00 39.00
55.00
81.00
100.00
The table above shows the number, proportion, and cumulative proportion of respondents who characterized their interest in watching football as either (1) Always interested, (2) Usually interested, (3) Sometimes interested, or (4) Never interested.
APPLICATIONS
In practically every research project, a first "look" at the data usually includes frequency tables. For example, in survey research, frequency tables can show the number of males and females who participated in the survey, the number of respondents from particular ethnic and racial backgrounds, and so on. Responses on some labeled attitude measurement scales (e.g., interest in watching football) can also be nicely summarized via the frequency table. In medical research, one may tabulate the number of patients displaying specific symptoms; in industrial research one may tabulate the frequency of different causes leading to catastrophic failure of products during stress tests (e.g., which parts are actually responsible for the complete malfunction of television sets under extreme temperatures?). Customarily, if a data set includes any categorical data, then one of the first steps in the data analysis is to compute a frequency table for those categorical variables.
CROSS TABULATION AND STUB-AND-BANNER TABLES
Managers and researchers frequently are interested in gaining a better understanding of the differences that exist between two or more subgroups. Whenever they try to identify characteristics common to one subgroup but not common to other subgroups, (i.e. they are trying to explain differences between the subgroups). Cross tables are used to explain the difference between the subgroups.
Cross tabulation is a combination of two (or more) frequency tables arranged such that each cell in the resulting table represents a unique combination of specific values of cross tabulated variables. Thus, cross tabulation allows us to examine frequencies of observations that belong to specific categories on more than one variable.
By examining these frequencies, we can identify relations between cross tabulated variables. Only categorical variables or variables with a relatively small number of different meaningful values should be cross tabulated. Note that in the cases where we do want to include a continuous variable in a cross tabulation (e.g., income), we can first recode it into a particular number of distinct ranges (e.g., low, medium, high).
GUIDELINES FOR CROSS TABULATION
The most commonly used method of data analysis is cross tabulation. The following guidelines will helpful to design proper cross tabulation.
1. THE DATA SHOULD BE IN CATEGORICAL FORM
Cross tabulation is applicable to data I which both the dependent and the independent variables• appear in categorical form. There are two types of categorical data.
One type (assume type A) consists of variables that can be measured only in classes or categories. Like marital status, gender, occupation variables can be measured in categories not quantifiable (i.e. no measurable number).
Another type (say type B) variables that can be measured in numbers, such as age, income. For this type the different categories are associated with quantifiable numbers that show a progression from smaller values to larger values.
Cross tabulation is used on both types of categorical variables. However when construction a cross tabulation using type B categorical variables, researchers find it helpful to use several special steps to make such cross tabulations more effective analysis tools
2. CROSS TABULATE AN IMPORTANT DEPENDENT VARIABLE WITH ONE OR MORE ‘EXPLAINING’ INDEPENDENT VARIABLES
Researchers typically cross tabulate a dependent variable of importance to the objectives of the research project (such as heavy user versus light user or positive attitude versus negative attitude) with one or more independent variables that the researchers believe can help explain the variation observed in the dependent variable. Any two variables can be used in a cross tabulation so long as they both are in categorical form, and they both appear to be logically related to one another as dependent and independent variables consistent with the purpose and objectives of the research project.
3. SHOW PERCENTAGE IN A CROSS TABULATION
In a cross tabulation researchers typically show the percentage as well as the actual count s of the number of responses falli9ng into the different cells of the table. The percentages more effectively reveal the relative sizes of the actual counts associated with the different cells and make it easier for researchers to visualize the patterns of differences that exist in the data.
CONSTRUCTING AND INTERPRETING A CROSS TABULATION
After drawing the cross table the interpretations has to be drawn form the table. It should convey the meaning and findings from the table. In management research interpretations has more value. From the interpretations and findings managers take decisions.
2X2 TABLES
The simplest form of cross tabulation is the 2 by 2 table where two variables are "crossed," and each variable has only two distinct values. For example, suppose we conduct a simple study in which males and females are asked to choose one of two different brands of soda pop (brand A and brand B); the data file can be arranged like this:
TABLE ON SUMMARY OF RAW DATA
GENDER SODA
CASE1
CASE2
CASE3
CASE4
CASE5
MALE
FEMALE
FEMALE
FEMALE
MALE
... A
B
B
A
B
...
The resulting cross tabulation could look as follows.
TABLE CROSS TABULATION ABOUT THE PREFERENCE OF SODA
SODA: A SODA: B
GENDER: MALE 20 (40%) 30 (60%) 50 (50%)
GENDER: FEMALE 30 (60%) 20 (40%) 50 (50%)
50 (50%) 50 (50%) 100 (100%)
Each cell represents a unique combination of values of the two cross tabulated variables (row variable Gender and column variable Soda), and the numbers in each cell tell us how many observations fall into each combination of values. In general, this table shows us that more females than males chose the soda pop brand A, and that more males than females chose soda B. Thus, gender and preference for a particular brand of soda may be related (later we will see how this relationship can be measured).
MARGINAL FREQUENCIES
The values in the margins of the table are simply one-way (frequency) tables for all values in the table. They are important in that they help us to evaluate the arrangement of frequencies in individual columns or rows. For example, the frequencies of 40% and 60% of males and females (respectively) who chose soda A (see the first column of the above table), would not indicate any relationship between Gender and Soda if the marginal frequencies for Gender were also 40% and 60%; in that case they would simply reflect the different proportions of males and females in the study. Thus, the differences between the distributions of frequencies in individual rows (or columns) and in the respective margins inform us about the relationship between the cross tabulated variables.
Column, Row, and total Percentages. The example in the previous paragraph demonstrates that in order to evaluate relationships between cross tabulated variables, we need to compare the proportions of marginal and individual column or row frequencies. Such comparisons are easiest to perform when the frequencies are presented as percentages.
EVALUATING THE CROSS TABLE
Researchers find it useful to answer the following three questions when evaluating cross tabulation that appears to explain differences in a dependent variable.
1. Does the cross tabulation show a valid or a spurious relationship?
2. How many independent variables should be used in the cross tabulation?
3. Are the differences seen in the cross tabulation statistically significant, or could they have occurred by chance due to sampling variation?
Each of this is discussed below.
DOES THE CROSS TABULATION SHOW A VALID EXPLANATION?
If it is logical to believe that changes in the independent variables can cause changes in the dependent variables, then the explanation revealed by the cross tabulation is thought to be a valid one.
DOES THE CROSS TABULATION SHOW A VALID OR A SPURIOUS RELATIONSHIP?
An explanation is thought to be a spurious one if the implied relationship between the dependent and independent variables does not seem to be logical.
Example: family size, income seem appear to be logically related to the household consumption of certain basic food products. However it may not be logical to relate the number of automobiles owned with the brand of toothpaste preferred, or to relate the type of family pet with the occupation of the head of the family. If the independent variable does not logically have an effect or influence on the dependent variable, the relationship that a cross tabulation seems to show may not be a valid cause and effect relationship, and therefore may be a spurious relationship.
HOW MANY INDEPENDENT VARIABLES SHOULD BE USED?
When cross tabulating an independent variable that seems logically related to the dependent variable, what should researchers do if the results do not reveal a clear-cut relationship?
Two possible courses of actions are available.
1. Try another cross tabulation, but this time using one of the other independent variable hypothesized to be important when the study was designed.
2. A preferred course of action is to introduce each additional independent variable simultaneously with rather than as an alternative to the first independent variable tried in the cross tabulation. By doing so it is possible to study the interrelationship between the dependent variable and two or more independent variables.
SUMMARY
The data can be summarized in the form of table. Cross table given the meaning full information from the raw data. The way of constructing cross tables and interpreting and evaluating is very important.
KEY WORDS
Class
Frequency
Relations frequency
Cumulative frequency
Marginal frequency
Internal
Cross table
REVIEW QUESTIONS
4. Why do we use cross tables?
5. How do you evaluate the cross table?
6. Define the guidelines for constructing the cross table.






LESSON – 10
STATISTICAL TECHNIQUES
OBJECTIVES
• To Know the nature of Statistical study
• Recognize the importance of Statistics as also its limitations
• To differentiate descriptive Statistics from inferential Statistics.
STRUCTURE
 Major characteristics of statistics
 Description statistics
 Inferential statistics
 Control tendency of data
 One of different average
 Types of frequency distribution
 Measure of dispersion
INTRODUCTION
Business researchers edit and code data to provide input that results in tabulated information that will answer research question. With this input, the results can be produced statistically and logically. Aspects of Statistics are important if the quantitative data are to serve their purpose. If Statistics, as a subject, is inadequate and consists of poor methodology, we would not know the right procedure to extract from the data the information they contain. On the other hand, if our figures are defective in the sense that they are inadequate or inaccurate, we would not reach the right conclusions even though our subject is well developed. With this brief introduction, let us first see how Statistics has been defined.
MAJOR CHARACTERISTICS OF STATISTICS
1. Statistics are aggregates of facts. This means that a single figure is not Statistics. For example, national income of a country for a single year is not Statistics but the same for two or more years is.
2. Statistics are affected by a number of factors. For example, sale of a product depends on a number of factors such as its price, quality, competition, the income of the consumers, and so on.
3. Statistics must be reasonably accurate. Wrong figures, if analyzed, will lead to erroneous conclusions. Hence, it is necessary that conclusions must be based on accurate figures.
4. Statistics must be collected in a systematic manner. If data are collected in a haphazard manner, they will not be reliable and will lead to misleading conclusions.
5. Finally, Statistics should be placed in relation to each other. If one collects data unrelated to each other, then such data will be confusing and will not lead to any logical conclusions. Data should be comparable over time and over space.
SUBDIVISIONS IN STATISTICS
The statisticians commonly classify this subject into two broad categories: the Descriptive statistics and inferential statistics.
DESCRIPTIVE STATISTICS
As the name suggests descriptive statistics includes any treatment designed to describe or summaries the given data, bringing out their important features. These statistics do not go beyond this. This means that no attempt is made to infer anything that pertains to more than the data themselves. Thus, if someone compiles the necessary data and reports that during the financial year 2000—2001, there were 1500 public limited companies in India of which 1215 earned profits and the remaining 285 companies sustained losses, his study belongs to the domain of descriptive Statistics. He may further calculate the average profit earned per company as also average loss sustained per company. This set of calculations too is a part of descriptive Statistics.
Methods used in descriptive Statistics may be called as descriptive methods. Under descriptive methods, we learn frequency distribution, measures of central tendency. that is, averages, measures of dispersion and skewness.
INFERENTIAL STATISTICS
Although descriptive Statistics is an important branch of Statistics and it continues to be so, its recent growth indicates a shift in emphasis towards the methods of Statistical inference. A few examples may be given here. The methods of Statistical inference are required to predict the demand for a product such as tea or coffee for a company for a specified year or years. Inferential Statistics are also necessary while comparing the effectiveness of a given medicine in the treatment of any disease.
Again, while determining the nature and extent of relationship between two or more variables like the number of hours studied by students and their performance in their examinations, one has to take recourse to inferential Statistics.
Each of these examples is subject to uncertainty on account of partial, incomplete, or indirect information. In such cases, the Statistician has to judge the merits of all possible alternatives in order to make the most realistic prediction or to suggest the most effective medicine or to establish a dependable relationship and the reasons for the same. In this text, we shall first discuss various aspects of descriptive Statistics. This will be followed by the discussion on different topics in inferential Statistics. The latter will understandably be far more comprehensive than the former.
CENTRAL TENDENCY OF DATA
In many frequency distributions, the tabulated values show small frequencies at the beginning and at the end and very high frequency at the middle of the distribution. This indicates that the typical values of the variable lie near the central part of the distribution and other values cluster around these central values. This behavior of the data about the concentration of the values in the central part of the distribution is called central tendency of the data. We shall measure this central tendency with the help of mathematical quantities. A central value which ‘enables’ to comprehend in a single effort the significance of the whole’ is known as Statistical Average or simply Average. In fact, an average of a statistical series is the value of the variable which is representative of the entire distribution and, therefore, gives a measure of central tendency. Measures of Central tendency
There are three common measures of central tendency
I. Mean
II. Median
III. Mode
The most common and useful measure of central tendency is, however, the Mean. In the following articles the method of calculation of various measures of central tendency will be discussed. In all such discussion we need a very useful notation known as Summation.
CHOICE OF A SUITABLE AVERAGE
The different statistical average has different characteristics. There is no all-purpose average. The choice of a particular average is usually determined by the purpose of investigation. Within the framework of descriptive statistics, the main requirement is to know what each average means and then select the one
that fulfils the purpose at hand. The nature of distribution also determines the type of average to be used.
GENERALLY THE FOLLOWING POINTS SHOULD BE KEPT IN MIND WHILE MAKING A CHOICE OF AVERAGE FOR USE
1. OBJECT
The average should be chosen according to the object of enquiry. If all the values in a series are to be given equal importance, then arithmetic mean will be a suitable choice. To determine the most stylish or most frequently occurring item, mode should be found out. If the object is to determine an average that would indicate its position or ranking in relation to all the values, naturally, median should be the choice. If small items are to be given greater importance than the big items, geometric mean is the best mean.
2. REPRESENTATIVE
The average chosen should be such that it represents the basic characteristics of the distribution.
3. NATURE OF FORM OF DATA
If the frequently distribution is symmetrical or nearly symmetrical X, M or Mo may be used almost interchangeably. If there are open — end class intervals, mean cannot be calculated definitely. In a closed frequently distribution of unequal class intervals, it is impossible to determine mode accurately. If there are a few values, it may not be possible to determine mode. Mean will not give a representative picture, if there are few extremely large or small values at either end of the array, and yet the great majority of the values concentrate around a narrow band. In a variable of non-continuous type, median or mode may give a value that actually exists in the data.
Davies’ Test: Arithmetic mean is considered as an appropriate average for use for data which has a symmetrical distribution or even if it has a moderate degree of asymmetry. Prof. George Davis has devised a test which is:

If this coefficient works out to be more than + 0.20, the distribution is symmetrical enough to use arithmetic mean.
4. CHARACTERISTICS OF AVERAGE
While choosing a suitable average for a purpose, the merits and demerits of various averages should always be considered and that average which fits into the purpose most should be preferred over others. The following points should be given due consideration in the process of selection of an average.
(i) In certain commonly encountered applications, the mean is subject to less sampling variability than the median or mode.
(ii) Given only the original observations, the median is sometimes easiest to calculate. Sometimes when there is no strong advantage for the mean, this advantage is enough to indicate the use of the median.
(iii) Once a frequency distribution has been formed, the mode and the median are mode quickly calculated than the mean. Moreover, when some classes are open-ended the mean cannot be calculated from the frequency distribution.
(iv) The median is not a good measure when there are very few possible values for the observations as with number of children or size of family.
(v) The mode and the median are relatively little affected by ‘extreme’ observations.
(vi) Calculations of geometric mean and harmonic mean is difficult as it involves the knowledge of logarithms and reciprocals.
Hence “the justification of employing them (averages) must be determined by an appeal to all the facts and in the light of the peculiar characteristics of the different types”.
USES OF DIFFERENT AVERAGES
Different averages, due to their inherent characteristics are appropriate in
different circumstances. Thus, their use may be guided by the purpose at hand qi or circumstances in which one is. Here a brief discussion is being made of uses, e of different statistical averages:
1. ARITHMETIC AVERAGE
The arithmetic average used in the study of a social, economic or commercial problem like production, income, price, imports, exports, etc. The central tendency if these phenomena can best be studied by taking out an arithmetic average. Whenever we talk of an ‘average income’ or ‘average production’ or ‘average price’ we always mean arithmetic average of all these things. Whenever there is no indication about the type of the average to be used, arithmetic average is computed.
2. WEIGHTED ARITHMETIC AVERAGE
When it is desirable to give relative importance to the different items of a series, weighted arithmetic average is computed. If it is desired to compute per capita consumption of a family, due weights should be assigned to children, males and females. This average is also useful in constructing numbers. The weighted average should be used in the following cases:
a) If it is desired to have an average of whole group, which is divided into a number of sub-classes, widely divergent from each other?
b) When items falling in various sub-classes change in such a way that the proportion which the items bear among themselves also undergoes a change.
c) When combined average has to be computed.
d) When it is desired to calculate to find an average of ratios, percentages or rates.
3. MEDIAN
Median is especially applicable to cases which are not capable of precise quantitative studies such as intelligence, honesty, etc. It is less applicable in economic or business statistics, because there is lack of stability in such data.
4. MODE
The utility of mode is being appreciated more and more day by day. In r the sciences of Biology and Meteorology it has been found to be of great value.
In commerce and industry it is gaining very great importance. Whenever as shop-keeper wants to stock the goods he sells, he always looks to the modal size of that goods. Model size of shoes, is of great importance to the businessman dealing in ready-made garments or shoes. Many problems of production are related with mode. Many business establishments these days are engaging their attention in keeping statistics of their sale ascertain the particulars of the modal articles sold.
5. GEOMETRIC MEAN
Geometric mean can advantageously be used in the construction of index numbers. It makes the index numbers reversible and gives equal weight to equal ratio of changes. This average is also useful in measuring the growth of population, because increases in geometric progression. When there is wide dispersion in a serious, geometric mean is a useful average.
6. HARMONIC MEAN
This average is useful in the cases where time, rate and prices are involved. When it is desired to give the largest weight to the smallest item, this average is used.
Summation Notation ()
The symbol (read: sigma) means summation.
If x1,x2, x3,……………………….., xn be the n values of a variable x, then their sum x1 +x2+ x3+………………….+ xn is shortly written as
Or simply x (sometimes, x1)
Similarly, the sum w1x1+w2x2+……………+ WnXn is denoted by
Or simply wx (sometimes, w1x1).
SOME IMPORTANT RESULT
1. =
2. = = Na (A is a constant)
3. = Ax1 +Ax2+…………………. +Axn
= A(x1+x2+………………..xn = A
There are three types of mean
1. Arithmetic Mean (A.M.)
2. Geometric Mean (G.M.)
3. Harmonic Mean (H.M.)
Of the three means, Arithmetic Mean is most commonly used. In fact, if no specific mention be made, by Mean we shall always refer to Arithmetic Mean (A.M.) and calculate accordingly.
SIMPLE ARITHMETIC MEAN
Definition : The Arithmetic Mean ( ) of a given series of values, say x1,x2, ………….xn, is defined as the sum of these values divided by their total number ; thus

Ex 1. Find the Arithmetic Mean of 3, 6, 24, and 48.
Sol: The required A.M
WEIGHTED ARITHMETIC MEAN
Definition: If x1, x2,………….., xn be n values of a variable x and if f1, f2, ………fn he their respective weights (or frequencies), then the weighted arithmetic mean ( ) is defined by

Where N= = total frequency.
SKEWNESS
A frequency distribution is said to be symmetrical when the values of the variable equidistant from their mean have equal frequencies. If a frequency distribution is not symmetrical, it is said to be asymmetrical or skewed. Any deviation from symmetry is Called Skewness.
In the words of Riggleman and Frisbee: “Skewness is the lack of symmetry. When a frequency distribution is plotted on a chart, skewness present in the items tends to be dispersed more on one side of the mean than on the other.”
Skewness may be positive or negative. A distribution is said to be positively skewed if the frequency curve has a longer tail towards the higher values of x, ie, if the ‘frequency curve gradually slopes down towards the high values of x. For a positively skewed distribution.
MEAN > MEDIAN > MODE
(M) (ME) (MO)
A distribution is said to be negatively skewed if the frequency curve has a longer tail towards the lower values of x. For a negatively skewed distribution,
MEAN For a symmetrical distribution,
MEAN = MEDIAN = MODE
MEASURES OF SKEWNESS
The degree of skewness is measured by its coefficient. The common measures of skewness are:
1. Pearson’s first measure

2. Pearson’s second measure

3. Bowley’s Measure

Where Q1, Q2, Q3, are the first, second and third quartiles respectively.
4. Moment Measure
Skewness =
Where m2 and m3 are the second and third central moments and  is the S.D.
All the four measure of Skewness defined above are independent of the units of measurement.
Calculate the Pearson’s measure of Skewness on the basis of Mean, Mode, and Standard Deviation.
X: 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5
F: 35 40 48 100 125 87 43 22
KEY
Pearson’s first measure of Skewness if

Here mid-values of the class- intervals are given.
Assuming a continuous series, we construct the following table.
CLASS INTERVALS X F D = X-18.5 FD FD2
14-15 14.5 35 -4 -140 560
15-16 15.5 40 -3 -120 360
16-17 16.5 48 -2 -96 192
17-18 17.5 100 -1 -100 100
18-19 18.5 125 0 0 0
19-20 19.5 87 1 87 87
20-21 20.5 43 2 86 172
21-22 21.5 22 3 66 198
TOTAL - 500 =N - -217 FD 1669 = FD2


Mean = A + = 18.5 - = 18.5 – 0.43 = 18.07
S.D = - =
= = 1.775


 Skewness =
TYPES OF FREQUENCY DISTRIBUTIONS
In general, frequency distributions that form a balanced pattern are called symmetrical distributions, and those that form an unbalanced pattern are called skewed or asymmetrical distributions. In a symmetrical distribution frequencies go on increasing up to a point and then begin to decrease in the same fashion. A special kind of symmetrical distribution is the normal distribution; the pattern formed is not only symmetrical but also has the shape of a bell.
In a symmetrical distribution, mean median and mode coincide and they lie at the centre of distribution. As the distribution departs from symmetry these three values are pulled apart. A distribution, in which more than half of the area under the curve is to the right side of the mode, is a positively skewed distribution. In a positively skewed distribution, its right tail is longer than its left tail. Under such a distribution mean is greater than the median, and the median is greater than the mode (X > M> Mo), and the difference between upper quartile and median is greater than the difference between median and lower quartile (Q3 -- M> M -— Qi). In a negatively skewed distribution, more than half of the area under the distribution curve is to the left side of the mode. In such a distribution the elongated tail us to the left and mean is less than the median and median is less than the mode ( X The following table will also show these facts of Position of Average on Various Distributions
The following table will also show these facts.
POSITION OF AVERAGE ON VARIOUS DISTRIBUTIONS
SIZE (A) (B) (C)
FREQUENTLY FREQUENTLY FREQUENTLY
5 1 1 1
10 3 9 2
15 5 5 3
20 7 4 4
25 5 3 5
30 3 2 9
35 1 1 1
SKEWNESS (SYMMETRY)
NO. OF SKEWNESS POSITIVE SKEWNESS NEGATIVE SKEWNESS
AVERAGE =M=MO=20
MMO
16.81510 MMO
23.82530
QUARTILES ( Q3- M) = (M- Q1) ( Q3- M) = (M- Q1) ( Q3- M) = (M- Q1)
CURVE NORMAL SKEWED TO THE RIGHT SKEWED TO THE LEFT
TEST OF SKEWNESS
In order to find out whether a distribution is symmetrical or skewed, the following facts should be noticed:
1. RELATIONSHIP BETWEEN AVERAGES
If in a distribution mean, median and mode are not identical, then it is a skewed distribution. The greater is the difference between mean and mode more will be the skewness in the distribution.
2. TOTAL OF DEVIATIONS
If the sum of positive deviations from median or mode is equal to the sum of negative deviation, then is no skewness in the distribution. The extent of difference between the sums of positive and negative deviations from median or mode will determine the extent of skewness in the data.
3. THE DISTANCE OF PARTITION VALUES FROM MEDIAN
In a symmetrical distribution Q1 and Q2, D1 and D9 and P10 and P90 are equidistant from median. In an asymmetrical distribution it is not so.
4. THE FREQUENCIES ON EITHER SIDE OF THE MODE
In an asymmetrical distribution, frequencies on either side of the mode are not equal.
5. THE CURVE
When the data are plotted on a graph paper, the Curve will not be bell- shaped, or when cut along a vertical line through the centre, two halves will not be identical.
Conversely stated, in the absence of skewness in the distribution:
(i) Values of mean, median and mode will coincide.
(ii) Sum of the positive deviations from the median or mode will be equal to the sum of negative deviations.
(iii) The two quartiles, decile one and nine, and percentile ten and ninety will be equidistant from the median.
(iv) Frequencies on the either side of the mode will be equal.
(v) Data when plotted on a graph paper will take a bell-shaped form.
MEASURES OF SKEWNESS
To find out the direction and the extent of symmetry in a series statistical measures of skewness are calculated, these measures can be absolute or relative. Absolute measures of skewness tell us the extent of asymmetry and whether it is positive or negative. The absolute skewness can be known by taking the, difference between mean and mode. Symbolically,
Absolute SK = X -— Mo
If the value of mean is greater than the mode (X>Mo) skewness will be positive. In case the value of mean is less than the mode (X Thus, the difference between the mean and the mode, whether positive or negative, indicates that the distribution is asymmetrical. However, such an absolute measure of skewness is unsatisfactory, because:
(1) It cannot be used for comparison of skewness in tow distributions if they are in different units, because the difference between the mean and the mode will be in terms of the units of distribution.
(2) The difference between the mean and mode may be more in one series and less in another, yet the frequency curves of the two distributions may be similarly skewed. For comparison, the absolute measures of skewness are changed to relative measures, which are called Coefficient of Skewness.
There are four measures of relative skewness. They are:
1. The Karl Pearson’s Coefficient of Skewness.
2. The Bowley’s Coefficient of Skewness.
3. The Kelly’s Coefficient of Skewness.
4. Measure of skewness based on moments.
MEASURES OF SKEWNESS
1. KARL PEARSON’S COEFFICIENT OF SKEWNESS
Karl Pearson has given a formula, for relative measure of Skewness. It is known as Karl Pearson’s Coefficient of Skewness or Pearsonian Coefficient of Skewness. The formula is based on the difference between the mean and mode, divided by the standard deviation. The coefficient is represented by J.
or
If in a particular frequency distribution, it is difficult to determine precisely the mode, is ill-defined, the coefficient of Skewness can be determined by the following changed formula.
or
This is based on the relationship between different averages in a moderately asymmetrical distribution. In such a distribution:
Mo = + (M- ) Or
Mo = -3( -M)
As Mo = -3( -M)
( -Mo) = 3( - M) that is why ( - Mo) is replaced by (3( -M)
The Pearsonian coefficient of skewness has the interesting characteristic that it will be positive when the mean is larger than the mode or median, am ten will be negative when the arithmetic mean is smaller than the mode or median the In a symmetrical distribution, the value of Pearsonian coefficient of skewness “ will be zero.
There is no theoretical limit to this measure, however, in practical the value given by this formula is rarely very high and usually lies between ± at The direction of the skewness is given by the algebraic sign of the measure, if it is plus then the skewness is positive, if it is minus, the skewness is negative. The degree of skewness is obtained by the numerical figure such as .9, 0.4, etc.
Thus this formula gives both the direction as well as the degree of skewness. There is another relative measure of skewness also based on the position of averages. In this, the difference between two averages is divided by the mean deviation. The formula is:

These formulas are not very much used in practice, because of demerits of mean deviation.
DISPERSION
MEASURES OF DISPERSION
An average nay give a good idea of the type of data, but it alone can’t reveal all the characteristics of data. It cannot tell us in what manner all the values of the variable are scattered I dispersed about the average.
MEANING OF DISPERSION
The Variation or Scattering or Deviation of the different values of a
variable from their average is known as Dispersion. Dispersion indicates the
extent to which the values vary among themselves. Prof. W.I. King defines the
term, ‘Dispersion’ as it is used to indicate the facts that within a given group,
the items differ from another in size or in other words, there is lack of uniformity in their sizes. The extent of variability in a given set of data is measured by
comparing the individual values of the variable with the average all the values and then calculating the average of all the individual differences.
OBJECTIVES OF MEASURING VARIATIONS
1. To serve as a basis for control of the variability itself.
2. To gauge the reliability of an average
3. To serve as a basis for control of the variability itself
TYPES OF MEASURES OF DISPERSION
There are two types of measures of dispersion. The first, which may be referred to as ‘Distance Measures’, describe the spread of data in terms of distance between the values of selected observations. The second are those which are in terms of an average deviation from some measure of central tendency.
ABSOLUTE AND RELATIVE MEASURES OF DISPERSION
Measures of absolute dispersion are in the same units as the data whose scatter they measure. For example, the dispersion of salaries about an average is measured in rupees and the variation of time requires for workers to do a job is measured in minutes or hours. Measures of absolute dispersion cannot be used to compare the scatter in one distribution with that in another distribution when the averages of the distributions differ in size or the units of measure differ in kind. Measures of relative dispersion show some measure of scatter as a percentage or coefficient of the absolute measure of dispersion. They are generally used to compare the scatter in one distribution with the scatter in other. Relative measure of dispersion is called coefficient of dispersion.
METHODS OF MEASURING DISPERSION
There are two meanings of dispersion, as explained above. On the basis of these two meanings, there are two mathematical methods of finding dispersion, i.e. methods of limits and methods of moment. Dispersion can also be studied graphically. Thus, the following are the methods of measuring dispersion:
I. Numerical Methods
1. Methods of Limits
i. The Range
ii. The Inter-Quartile Range
iii. The Percentile Range
2. Methods of Moments
i. The first moment of dispersion or mean deviation
ii. Second moment of dispersion from which standard
deviation is computed.
iii. Third moment of dispersion
3. Quartile Deviation
II. Graphic Method
Lorenz Curve
RANGE
The simplest measure of dispersion is the range of the data. The range is determined by the two extreme values of the observations and it is simply the differences between the largest and the smallest value in a distribution. Symbolically,
Range (R) = Largest value (L) - — Smallest value (S)
Or R=L—S

or
Quartile Deviation or Semi-Interquartile Range
Definition: Quartile Deviation (Q) is an absolute measure of dispersion and is defined by formula.

Where Q1 and Q2 are the first (or lower) and the third (or upper) quartiles respectively. Here Q3 –Q1 is the interquartile range and hence quartile deviation is called Semi-interquartile Range.
As it is based only on the Q1 and Q3, it does not take into account the variability of all the values and hence it is not very much used for practical purposes.
Find the quartile deviation of the following frequency distribution.
Daily wages : 10-15 15-20 20-25 25-30 30-35
No of workers: 6 12 18 10 4
KEY
CLASS BOUNDARY POINTS CUMULATIVE FREQUENCY
(“LESS THAN”)
10 0
15 6
Q1
12.5 = N/4

20 18
25 36
Q3
37.5 = 3N/4

30 46
35 50 = N
N/4 = 50/4 = 12.5 and 3N/4 = 37.5
By simple Interpolation,
=
Or,
Q1 = 15 + 2.71 OR Q1 = 17.71
Similarly,
Or, Similarly,
Or Q3 = 25.75
HENCE QUARTILE DEVIATION

MEAN DEVIATION (OR AVERAGE DEVIATION OR MEAN ABSOLUTE DEVIATION)
Definition. Mean Deviation of a series of values of a variable is the arithmetic mean of all the absolute deviations (ie, difference without regard to sign) from any one of its averages (Mean, Median or Mode, but usually Mean or Median). It is an absolute measure of dispersion
Mean deviation of a set of n values x1, x2, ……………..xn about their A.M is defined by

Where = A.M and = = absolute deviation from xi from .


For a frequency distribution,
= =
Where x = values or mid-value according as the data is ungrouped or grouped and = Mean
Mean Deviation about Median = =
Where M = Median and d= x – M=Value (or mid value) – Median
Similarly, we can define Mean Deviation about Mode.
Note: The expression is read as mod. d and gives only numerical or absolute value of d without regard to sig. Thus.
= 3, = 4, = 0.56.
The reason for taking only the absolute and not the algebraic values of the deviation is that the algebraic sum of the deviations of the value from their mean is zero.
Find the Mean Deviation about the Arithmetic Mean of the numbers 35,29,63,55,72,37.
KEY

CALCULATION OF ABSOLUTE DEVIATIONS
VALUE
X DEVIATION FROM MEAN
D= X - = X- 46
ABSOLUTE DEVIATION


31 -15 15
35 -11 11
29 -17 17
63 17 17
55 9 9
72 26 26
37 -9 9
TOTAL - 104 = 

The required Mean Deviation about the Mean
= = = 14.86
ADVANTAGES
Mean deviation is based on all the values of the variable and sometimes gives fairly good result as a measure of dispersion. However the practice of neglecting signs and taking absolute deviations for the calculation of the mean Deviation seems rather unjustified and this makes algebraic treatment difficult.
STANDARD DEVIATION
It is the most important absolute measures of Dispersion. Standard Deviation of set values of a variable is defined as the positive square root of arithmetic mean of the squares of all deviations of the values from their arithmetic mean. In short it is the square root of the mean of the squares of deviations from mean.
If x1, x2 ….xn be a series of values of a variable and their A.M, then S.D () is defined by


The square of Standard Deviation is known as Variance ie,
Variance 2 = (S.D)2
S.D is often defined as the positive square –root of Variance.
Find the standard deviation of the following numbers:1, 2, 3,4,5,6,7,8,9
KEY

The deviations of the numbers from the A.M 5 are respectively -4,-3,-2,-1, 0, 1,2,3,4
The squares of the deviations from A.M are 16,9,4,1,0,1,4,9,16

=
= = = 2.58
ADVANTAGES AND DISADVANTAGES
Standard Deviation is the most important and widely used among the measures of dispersion and it possesses almost all the requisites of a good measure of dispersion. It is rigidly defined and based on all the values of the variable.
It is suitable for algebraic treatment. S.D. is less affected by sampling fluctuations than any other absolute measure of dispersion.
S.D. is difficult to understand. The process of squaring the deviations from their A.M. and then taking the square-root of the A. M. of these squared deviations is a complicated affair.
The calculation of S.D. can be made easier by changing the origin and the scale conveniently.
RELATIVE MEASURES OF DISPERSION
Absolute measures expressed as a percentage of a measure of a control tendency gives relative measures of dispersion. Relative measures are independent of the units of measurement and hence they are used for the comparison of dispersion of two or more distributions given in different units.
CO-EFFICIENT OF VARIATION
Co-efficient of variation is the first important relative measure of dispersion and is defined by the following formula:

Co-efficient of variation is thus the ratio of the Standard deviation to the mean, expressed as a percentage. In the words of Karl Pearson, Co-efficient of Variation is the percentage variation in the mean.
COEFFICIENT OF QUARTILE DEVIATION
Co-efficient of quartile deviation is a relative measure of dispersion and is defined by

COEFFICIENT OF MEAN DEVIATION
It is a relative measure of dispersion. Co efficient of Mean Deviation is defined by

4. Find the Mean Deviation about the Median in respect of the following numbers.
46,79,26,85,39,65,99,29,56,72
Find also the Co efficient of Mean Deviation.
KEY
By arranging the given numbers in ascending order of magnitude, we obtain
26,29,39,46,56,65,72,79,85,99
Mean = Value
Median = Value
= 5.5th Value
=
= = = 60.5
Absolute deviation of the values from the median 60.5 is respectively
34.5, 31.5, 21.5, 14.5, 4.5, 4.5, 11.5, 18.5, 24.5, 38.5
 M.D about the Median
= 34.5+31.5+21.5+14.5+4.5+4.5+11.5+18.5+24.5+38.5
= = 20.4

=
= 33.72%
KURTOSIS
Kurtosis in Greek means ‘bulginess’. The degree of kurtosis of a distribution is measured relative to the pealedness of a normal curve. The measure of kurtosis indicates whether the curve of a frequency distribution is flat or peaked.
Kurtosis is the peakedness of the frequency curve. In two or more distributions having same average, disrsion and skewness, one many have high concentration of values near the mode, and in this case, its frequency curve will show a sharper peak than the others. This characteristic of frequency distribution is known as Kurtosis.
Kurtosis is measured by the coefficient 2 which is defined by the formula
2 = = or by 2 = 2 -3, where m4 are the 2nd and the 4th control moments and  = S.D.
A distribution is said to be Platy – Kurtic, Meso-Kurtic and Lepto – Kurtic according as
2  3, 2 = 3, and 2  3.
Ie according as  3, = 3, and  3.
Karl person in 1905 introduced the terms MESOKURTIC, LEPTOKURTIC, and PLATYKURTIC. A peaked curve is called “Leptokurtic” and a flat topped curve is termed “Platykurtic”. These are evaluated by comparison with intermediate peaked curve. These three curves differ widely in regard to convexity. The aforesaid curves are depicted below.
CALCULATE THE MEASURES OF THE FOLLOWING DISTRIBUTION.
AGE IN YEARS NO. OF EMPLOYEES AGE IN YEARS NO. OF EMPLOYEES
20-25 6 45-50 15
25-30 8 50-55 11
30-35 11 55-60 9
35-40 14 60-65 5
40-45 21
100
KEY
CLASS MID VALUE M F A= 42.5 /I = 5
D FD’ FD’2 FD’3 FD’4
20-25 22.5 6 -4 -24 96 -384 1536
25-30 27.5 8 -3 -24 72 -216 648
30-35 32.5 11 -2 -21 44 -88 176
35-40 37.5 14 -1 -14 14 -14 14
40-45 42.5 21 0 0 0 0 0
45-50 47.5 15 1 15 15 15 15
50-55 52.5 11 2 22 44 88 176
55-60 57.5 9 3 27 81 243 729
60-65 62.5 5 4 20 80 320 1280
TOTAL 100 0 446 -36 4574
V1 = = = 0,
V2 = = = 4.46,
V3 = = = -.36,
V4 = = = 45.74,
4 = V4- 4V1V3 + 6 V1 2 V2 - 3 V41
= 45.74 - ( 4 x 0 x - .36) x ( 6 x 02 x 4.46 ) - ( 3 x 04 ) = 45.74
4 or 2 = = = = 2.3
The value of 2 is less than 3 hence the curve is Platykurtic.
SUMMARY
This chapter help us to know the nature of the statistical study. This chapter recognize the importance of statistics and also its limitation. The differences between descriptive statistics and inferential statistics are dealt in detail.
KEY WORDS
• Descriptive Statistics
• Inferential Statistics
• Mean Standard Deviation
• Median Mean Deviation
• Mode Arithmatic Mean
• Geometric Mean Averages
• Dispersion Skewness
• Mean deviation Kurtosis
REVIEW QUESTIONS
1. Explain the special features of measures of central tendency.
2. How will you choose an ‘average’?
3. What is dispersion? State its objectives. Explain the various types measures of dispersion.
4. Explain the various methods of measuring dispersion.
5. Differentiate standard deviation from mean deviation.
6. Define ‘skewness’. How will you measure it?
7. Explain the application of averages in research.





LESSON – 11
MEASURES OF RELATIONSHIP
OBJECTIVES
To study simple, partial and multiple correlation and their application in research.
STRUCTURE
 Measure of relationship
 Correlation
 Properties of correlation co-efficient
 Methods of studying correlations
 Application of correlation
MEASURES OF RELATIONSHIP
The following statistical tools measure the relationship between the variables analyzed in social science research:
i) Correlation
• Simple correlation
• Partial correlation
• Multiple correlation
ii) Regression
• Simple regression
• Multiple regressions
iii) Association of Attributes
CORRELATION
Correlation measures the relationship (positive or negative, perfect) between the two variables. Regression analysis considers relationship between variables and estimates the value of another variable, having the value of one variable. Association of Attributes attempts to ascertain the extent of association between two variables.
Aggarwal Y.P, in his book ‘Statistical Methods’ has defined coefficient of correlation as, “a single number that tells us to what extent two variables or things are related and to what extent variations in one variable go with variations in the other”.
Richard 1. Levin in his book, ‘Statistics for Management’ has defined correlation analysis as “the statistical tool that we can use to describe the degree to which one variable is linearly related to another”. He has further stated that, “frequently correlation analysis is used in conjunction with regression analysis to measure bow well the regression line explains the variation of the dependent variable, Y. Correlation can also be used by itself, however, to measure the degree of association between two variables”.
Srivastava U.K. Shenoy G.V, and Sharma S.C, in their book ‘Quantitative Techniques for Managerial Decision’ have stated that, “correlation analysis is the statistical technique that is used to describe the degree to which one variable is related to another. Frequently correlation analysis is also used along with the regression analysis to measure how well the regression line explains the variations of the dependent variable. The correlation coefficient is the statistical tool that is used to measure the mutual .relationship between the two variables”.
Coefficient of correlation is denoted by r.
The sign of ‘r’ shows the direction of the relationship between the two variables X and Y. Positive correlation reveals that there is a positive correlation between the two variables. Negative correlation reveals negative relationship. Levin states that if an inverse relationship exists — that is, if Y decreases as X increases — then r will fall between 0 and —I. Likewise, if there is a direct relationship (if Y increases as X increases), then r will be a value within the range of 0 to 1.
Aggarwal Y.P, has highlighted in his book ‘Statistical Methods’, the properties of correlation and factors influencing the size of the correlation coefficient. The details are given below:
PROPERTIES OF THE CORRELATION COEFFICIENT
The range of correlation coefficient is from —1 through 0 to +1. The values of r = -1 and r = +1 reveal a case of perfect relationship, though the direction of relationship is negative in the first case, and positive in the second case.
The correlation coefficient can be interpreted in terms of r2 It is known as coefficient of determination. It may be considered as the variance interpretation of r2
Example:
r = 0.5
r2 = 0.5 x 0. •5 = 0.25
in terms of percentage: 0.25 x 100 = 25%
It refers that 25 percent of the variance in Y scores has been accounted for by the variance in X
The correlation coefficient does not change if every score in either or both distribution is increased or multiplied by a constant.
Causality cannot be inferred solely as the basis of a correlation between two variables. It can be inferred only after conducting controlled experiments.
The direction of the relation is indicated by the sign (+ or -) of the correlation.
The degree of relationship is indicated by the numerical value of the correlation. A value near 1 indicates a nearly perfect relation, and a value near 0 indicates no relationship.
In a positive relationship both variables tend to change in the same direction: as x increases, y also tends to increase.
The Pearson correlation measures linear (straight line) relationship.
A correlation between x and y should not be interpreted as a cause-effect relationship. Two variables can be related without one having a direct effect on the other. 34
FACTORS INFLUENCING THE SIZE OF CORRELATION COEFFICIENT
1. The size of r is very much dependent upon the variability of measured values in the correlated sample. The greater the variability, the higher will be the correlation, everything else being equal.
2. The size of r is altered when researchers select extreme groups of subjects in order to compare these groups with respect to certain behaviors. Selecting extreme groups on one variable increases the size of r over what would be obtained with more random sampling.
3. Combining two groups which differ in their mean values on one of the variables is not likely to faithfully represent the true situation as far as the correlation is concerned.
4. Addition of an extreme case (and conversely dropping of an extreme case) can lead to changes in the amount of correlation. Dropping of such a case leads to reduction in the correlation while the converse is also true.
TYPES OF CORRELATION
a) Positive or Negative
b) Simple, Partial and Multiple
c) Linear and Non-linear
A) POSITIVE CORRELATION
Both the variables (X and Y) will vary in the same direction. If variable X increases, variable Y also will increase; If variable X decreases, variable Y also will decrease.
NEGATIVE CORRELATION
The given variables will vary in opposite direction. If one variable increases, other variable will decrease.
B) SIMPLE, PARTIAL AND MULTIPLE CORRELATION
In simple ‘correlation, relationship between two variables are studied. In partial and multiple correlation three or more variables are studied. Three or more variables are simultaneously studied in multiple correlation. In partial correlation more than two variables are studied, but the effect on one variable is kept constant and relationship between other two variables is studied,
Linear and Non-Linear Correlation: It depends upon the constancy of the ratio of change between the variables. In linear correlation the percentage change in one variable will be equal to the percentage change in another variable. It is not so in non-linear correlation.
METHODS OF STUDYING CORRELATION
a) Scatter Diagram Method
b) Graphic Method
c) Karl Pearson’s Coefficient of Correlation
d) Concurrent Deviation Method
e) Method of Least Squares Karl Pearson’s Coefficient of Correlation
KARL PEARSON’S COEFFICIENT OF CORRELATION
r =
x = X - ; Y = Y -
PROCEDURE
• Compute mean of the X series data
• Compute mean of the Y series data
• Compute deviations of X series from the mean of X. It is denoted as x.
• Square the deviations. It is denoted as Zx2.
• Compute deviations of Y series from the mean of Y. It is denoted as y.
• Square the deviations. It is denoted as ∑y2.
• Multiply deviation (X series, Y series) and compute total. It denoted as ฮฃxy.
The above values can be applied in the formula and correlation can be computed.
Karl Pearson’s Coefficient of Correlation – formula (assumed mean)
r =
dx = sum of deviations of X series from the assumed mean
dy = sum of deviations of Y series from the assumed mean
ฮฃdx dy = total of deviations (X and Y series)
ฮฃdx2 = deviations of X series from assumed mean are squared
ฮฃdy2 = deviations of Y series from assumed mean are squared
N = Number of items
The above values can be applied in the above formula and correlation can be computed.
Correlation for the grouped data can be computed with the help of the following formula:
r =
In the above formula, deviations are multiplied by the frequencies. Other steps are the same.
CALCULATION OF CORRELATION
RAW SCORE METHOD
X 20 16 12 8 4
Y 22 14 4 12 8

X Y X2 Y2 XY
20
16
12
8
4 22
14
4
12
8 400
256
144
64
16 484
196
16
144
64 440
224
48
96
32
ฮฃX
60 ฮฃY
60 ฮฃ X2
880 ฮฃ Y2
904 ฮฃXY
840
r =

r = 0.7
DEVIATION SCORE METHOD (USING ACTUAL MEAN)
Calculate Karl Pearson Coefficient of Correlation from the following data:
YEAR 1985 1986 1987 1988 1989 1990 1991 1992
INDEX OF PRODUCTION 100 102 104 107 105 112 103 99
NUMBER OF UNEMPLOYED 15 12 13 11 12 12 19 26
SOLUTION
Calculation of Karl Pearson’s Correlation Coefficients
YEAR INDEX OF PRODUCTION (X-X) X2 NO. OF UNEMPLOYED (Y-Y) Y2 XY
1985
1986
1987
1988
1989
1990
1991
1992 100
102
104
107
105
112
103
99
ฮฃX
832 -4
-2
0
+3
+1
+8
-1
-5
ฮฃX
0 16
4
0
9
1
64
1
25
ฮฃX2
120 15
12
13
11
12
12
19
26
ฮฃY
120 0
-3
-2
-4
-3
-3
+4
+11
ฮฃY
0 0
9
4
16
9
9
16
121
ฮฃY2
184 0
+6
0
-12
-3
-24
-4
-55
ฮฃXY
-92
X = = = 104
X = = = 15
r =
∑xy = -92 ∑x2 = 120 ∑y2 = 184
r = = 0.619
Correlation between index of production and unemployed is negative
CALCULATION OF CORRELATION COEFFICIENT (USING ASSUMED MEAN)
Calculate Correlation Coefficient from the following data
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
Calculation of Karl Pearson Coefficient of Correlation
X X-50
DX DX2 Y Y-55
DY DY2 DX DY
50
60
58
47
49
33
65
43
46
68 0
+10
+8
-3
-1
-17
+15
-7
-4
+18 0
100
64
9
1
289
225
49
16
324 48
65
50
48
55
58
63
48
50
70 -7
+10
-5
-7
0
3
8
-7
-5
15 49
100
25
49
0
9
64
49
25
225 0
+100
-40
+21
0
-51
+120
+49
+20
+270
519 +19 1077 535 5 595 489
r =
r =
r =
RANK CORRELATION
It is a method of ascertaining co variability or the lack of it between the two variables. Rank correlation method is developed by the British Psychologist Charles Edward Spearmen in 1904. Gupta S.P has stated that “the rank correlation method is used when quantitative measures for certain factors cannot be fixed, but individual in the group can be arranged in order thereby obtaining for each individual a number of indicating his/her rank in the group”.
The formula for Rank Correlation,
r= 1 -
Rank-Difference Coefficient of Correlation (Case of No Ties)

STUDENT SCORE ON TEST I SCORE ON TEST II RANK OF TEST I RANK OF TEST II DIFFERENCE BETWEEN RANKS DIFFERENCE SQUARED
X Y R1 R2 D D2
A
B
C
D
E 16
14
18
10
2 8
14
12
16
20 2
3
1
4
5 5
3
4
2
1 -3
0
-3
2
4 9
0
9
4
16
N = 5 ฮฃD2 = 38
RANK CORRELATION
r = 1 –
r = 1 –
r = 1 – 228 = 1 – 1.9 = 120 = – 0.9
Relationship between X and Y is very high and inverse. Relationship between Score on Test I and Test II is very high and inverse.
PROCEDURE FOR ASSIGNING RANKS
First rank is given to the student secured highest score. [In Test I, student F is given first rank. His score is the highest].Second rank is given to the next highest score. [In Test I, student E is given second rank].
Student A and G have similar scores of 20 each and they stand for 6th and 7th ranks. Instead of giving either 6th or 7th to both the students, the average of the two ranks [6 and 7, i.e. (6+7) ÷ 2] 6.5 is given to each of them. The same procedure is followed to assign ranks to the scores secured by students in Test II.
CALCULATION OF RANK CORRELATION WHEN TIED RANKS EXIST
Rank-Difference Coefficient of Correlation
STUDENT SCORE ON TEST I SCORE ON TEST II RANK OF TEST I RANK OF TEST II DIFFERENCE BETWEEN RANKS DIFFERENCE SQUARED
X Y R1 R2 D D2
A
B
C
D
E
F
G
H
I
J 20
30
22
28
32
40
20
16
14
18 32
32
48
36
44
48
28
20
24
28 6.5
3
5
4
2
1
6.5
9
10
8 5.5
5.5
1.5
4
3
1.5
7.5
10
9
7.5 1.0
-2.5
3.5
0
-1.0
-0.5
-1.0
-1.0
1.0
0.5 1.00
6.25
12.25
0
1.00
0.25
1.00
1.00
1.00
0.25
N = 10 ฮฃD2 = 24
r = 1 –
r = 1 –
r = 1 –
r = 1 – 0.145
r = 0.855
APPLICATION OF CORRELATION
Karl Pearson Coefficient of Correlation can be used to assess the extent of relationship between motivation of export incentive schemes and utilization of such schemes by exporters.
Motivation and Utilization of Export Incentive Schemes – Correlation Analysis
SL.NO. EXPORTERS CORRELATION
1.
2.
3.
4. AGRICULTURE EXPORTERS
SEAFOOD EXPORTERS
TEXTILE EXPORTERS
GEMS AND JEWELLERY EXPORTERS + 0.642
+ 0.4.23
+ 0.125
+ 0.898
Opinion scores of the various categories of exporters towards motivation and utilization of export incentive schemes can be recorded and correlated by using Karl Pearson Coefficient of Correlation and appropriate interpretation may be given based on the value of correlation.
TESTING OF CORRELATION
‘t’ test is used to test correlation coefficient
Height and weight of a random sample of six adults.
HEIGHT (CM) 170 175 176 178 183 185
WEIGHT (KG) 57 64 70 76 71 82
It is reasonable to assume that these variables are normally distributed, so the Karl Pearson Correlation coefficient is the appropriate measure of the degree of association between height and weight.
r = 0.875
Hypothesis test for Pearson’s population correlation coefficient
H0: p = 0 this implies no correlation between the variables in the population
H1: p > 0 this implies that there is positive correlation in the population (increasing Height is associated with increasing weight)
5 % significance level
Test statistic‘t’ test = r
0.875 = 3.61
Table value of 5 % significance level
4 degrees of freedom (6 -2) = 2.132
(n-2)
Calculated value is more than the table value. Null hypothesis is rejected. There is significant positive correlation between height and weight.
PARTIAL CORRELATION
Partial Correlation is used in a situation where three and four variables involved. There variables such as age, height and weight are given. Here, partial correlation is applied. Correlation between height and weight can be computed by keeping age constant. Age may be the important factor influences the strength of relationship between height and weight. Partial correlation is used to keep constant the effect age. The effect of one variable is partially out from the correlation between other two variables. This statistical technique is known as partial correlation.
Correlation between variables x and y is denoted as rxy
Partial correlation is denoted by the symbol r123. Here correlation between variable 1 and 2, keeping 3rd variable constant.
R123 =
r123 = partial correlation between variables 1 and 2
r12 = correlation between variables 1 and 2
r13 = correlation between variables 1 and 3
r23 = correlation between variables 2 and 3
R13. 2 =
R231 =
MULTIPLE CORRELATION
There or more variables are involved in multiple correlation. The dependent variable is denoted by X1 and other variables are denoted by X2, X3 etc. Gupta S.P, has expressed that “the coefficient of multiple linear correlation is represented by R1 and it is common to add subscripts designating the variables involved. Thus R1.234 would represent the coefficient of multiple linear correlation between X1 on the one hand, X2, X3 and X4 on the other. The subscript of the dependent variable is always to the left of the point”.
The coefficient of multiple correlation for r12, r13 and r 23 can be expressed as follows:
R123 =
R213 =
R312 =
Coefficient of multiple correlations for R1.23 is the same as R1.32.A coefficient of multiple correlation lies between 0 and 1. If the coefficient of multiple correlations is 1, it shows that the correlation is perfect. If it is 0, it shows that there is no linear relationship between the variables. The coefficients of multiple correlation are always positive in sign and range from +1.0 to 0.
Coefficient of multiple determinations can be obtained by squaring R1.23.
Multiple correlation analysis measures the relationship between the given variables. In this analysis the degree of association between one variable considered as the dependent variable and a group of other variables considered as the independent variables.
SUMMARY
This chapter outlined the significance in measuring the relationship. This chapter discuss the factors that affecting correlation. The different application of correlation have been dealt in detail.
KEY WORDS
• Measures of Relationship
• Correlation
• Simple correlation
• Partial correlation
• Multiple correlation
• Regression
• Simple regression
• Multiple regressions
• Association of Attributes
• Scatter Diagram Method
• Graphic Method
• Karl Pearson’s Coefficient of Correlation
• Concurrent Deviation Method
• Method of Least Squares Karl Pearson’s Coefficient of Correlation
REVIEW QUESTIONS
1. What are the different measures and their significance in measuring Relationship?
2. Discuss the factors affecting Correlation.
3. What are the applications of Correlation?
4. Discuss in detail on different types of Correlation.
REFERENCE BOOKS
7. Robert Ferber, Marketing research, New York: McGraw Hill Inc., 1976.
8. Chaturvedhi, J.C., Mathematical Statistics, Agra: Nok Jhonk Karyalaya, 1953.
9. Emony, C. William, Business Research Methods, Illinois, Irwin, Homewood. 1976.


LESSON – 12
MEASUREMENT
OBJECTIVES
• To know what is to be measured
• To define the operation definition and scale measurement
• To distinguish among nominal, ordinal, interval, and ratio scales
• To understand the criteria of good measurement
• To discuss the various methods of determining reliability
• To discuss the various methods of assessing validity
STRUCTURE
 Measurement
 Scale measurement
 Types of scale
 Criteria for good measurement
MEASUREMENT
Measurement is an integral part of the modern world. Today we have progressed in the physical sciences to such as extent that we are now able to measure the rotation of a distant star, the attitude in micro-inches and so on. Today such a precise physical measurement is very critical. In many business situations, the majority of the measurements are applied to things that are much more abstract than attitude or time. The accurate measurement is essential for effective decision making. The purpose of this chapter is to provide with a basic understanding of the measurement process and rules needed for developing sound scale measurements.
In management research, measurement is viewed as the integrative process of determining the amount (intensity) of information about constructs, concepts or objects of interest and their relationship to a defined problem or opportunity. It is important to understand the two aspects of measurement one is construct development, which provides necessary and precise definition which begins the research process called problem definition in turn determine what specific data should be collected. Another is scale measurement means how the information is collected with reference to construct. In other words, the goal of construct development is to precisely identify and define what is to be measured including dimensions. In turn, the goal of scale measurement is to determine how to precisely measure the constructs.
Regardless of whether the researcher is attempting to collect primary data or secondary data, all data can be logically classified as under.
A) STATE-OF-BEING DATA
When the problem requires to collect responses that are pertinent to the physical, demographical or socioeconomic characteristics of individuals, objects or organizations, the resulting raw data are considered as state-of –being data. This data represent factual characteristics that can be verified through several sources other than the persons providing the information.
B) STATE-OF-MIND DATA
This represents that mental attributes of individuals that are not directly observable or available through some other external sources. It exists only within the minds of people. The researcher has to ask a person to respond the stated questions, Examples are personality traits, attitudes, feelings, perceptions, beliefs, awareness level, preferences, images etc.,
C) STATE-OF –BEHAVIOR DATA
This represents an individuals or organizations current observable actions or reactions or recorded past actions. A person may categorically ask the past behavior. This can be checked using external secondary sources, but that is very difficult process in terms of time, effort and accuracy.
D) STATE-OF-INTENSION DATA
This represents individuals or organizations expressed plans of future behavior. Again this also collected by asking carefully defined questions. Like the above data, this also very difficult to verify through external, secondary sources, but verification is possible.
With the background information about the type of data which are collected, the following pages will be very useful in understanding the concepts of scale measurement.
SCALE MEASUREMENT
Scale measurement can be defined as the process of assigning a set of descriptions to represent the range of possible responses to a question about a particular object or construct. Scale measurement directly determines the amount of raw data that can be ascertained from a given questioning or observation method. This attempts to assign designated degrees of intensity to the responses, which are commonly referred to as scale points. The researcher can control the amount of raw data that can be obtained from asking questions by incorporating scaling properties or assumption in scaling points. There are four (4) scaling properties that a researcher can use in developing scales namely assignment, order, distance and origin.
1. Assignment also referred to as description or category property. It is the researchers employment of unique description to identity each object within a set. e.g., the use of numbers, colors, yes & no responses.
2. Order refers to the relative magnitude between the raw responses. It establishes and creates hierarchical rank-order relationship coming objects. E.g., 1st place is better than 4th place.
3. Distance, is the measurement the express the exact difference between the two responses. This allows the researcher and respondent to identity, understand and accurately express absolute difference between objects. e.g., Family A has 3 children and Family B has 6 children.
4. Origin, refers to the use of a unique starting as being “true zero” e.g., asking respondent his or her weight or current age, market share of specific brand.
TYPES OF SCALES
While scaling properties determine the amount of raw data that can be obtained from any scale design, all questions and the scale measurement can be logically and accurately classified as one of four basic scale types: nominal, ordinal, integral or ratio. A scale may be defined as any series of items that are arranged progressively according to value or magnitude, into which an item can be placed according to its quantification.
The following table represents the relationship between types of scales & scaling properties.

TYPES OF SCALE SCALING PROPERTIES
ASSIGNMENT ORDER DISTANCE ORIGIN
NORMAL YES NO NO NO
ORDINAL YES YES NO NO
INTERNAL YES YES YES NO
RATIO YES YES YES YES
1. NOMINAL SCALE
In business research, nominal data are probably more widely collected than any other. It is the simplest type of scale and also the most basic of the four types of scale designs. In such a scale, the numbers serve as labels to identify persons, objects or events. Nominal scales are the least powerful of the four data types. They suggest no order or distance relationship and have no arithmetic origin. This scale allows the researcher only to categorize the raw responses into mutually exclusive and collectively exhaustive. In the nominal scale, the only operation in the counting of numbers in each group. An example of typical nominal scale in business research is the coding of males as 1 and females as 2.
E.g. 1: Please indicate your current marital status.
Married – Single – Never married – Widowed
E.g. 2: How do you classify yourself?
Indian – American – Asian – Black
2. ORIDINAL SCALE
As the name implies, are ranking scales. Ordinal data include the characteristics of the nominal data plus an indicator of order, means, this data activates both are assignment and order scaling properties. The researcher can rank-order the raw responses into a hierarchical pattern. The use of ordinal data scale implies a statement of “greater than” or “less than” without stating how much greater or less. Examples of ordinal data include opinion and preference scales. A typical ordinal scale in business research asks respondents to rate career opportunities, brands, companies etc., as excellent, good, fair or poor.
E.g. 1: Which of the following one category best describes your knowledge
about computers?
Excellent, 2) Good, 3) Basic, 4) Little, 5) No knowledge
E.g. 2: Among the listed below, please indicate top three preference using
1,2,3 as your choice in the respective source provided
By post - By courier
By telephone By speed post
By internet By person
Also, to be noted that the individual ranking can be combined and get a collective ranking of a group.
3. INTERVAL SCALES
The structure of this scale not only show the assignment, order scaling properties but also the distance property with interval scale, researchers can identity not only some type of hierarchical order among to raw data but also the specific differences between the data. The classic example of this scale is the Fahrenheit temperature scale. If a temperature is 80 degree, it cannot be said that it is twice as hot as 40 degree –the reasons is that 0 degree does not represent the lack of temperature, but a relative point on the Fahrenheit scale. Similarly, when this scale is used to measure psychological attributes, the researcher can comment on the magnitude of differences or compare the average differences but cannot determine the actual strength of attitude toward an object. However many attitude scales are presumed to be interval scales. Interval scales are more powerful than nominal and ordinal scales. Also they are quicker to complete and it is convenient for researcher.
E.g. 1: Into which of the following categories does your income fall?
1. below 5000 2. 5000 – 10,000 3. 10,000 – 15,000
4. 15,000 – 25,000 5. above 25,000
E.g. 2: Approximately how long you lived in the current address?
1. less than 1 year 2. 1 – 3 year 3. 4 – 6 year 4. more than 6 year
4. RATIO SCALES
This is the only scale that simultaneously activates all four scaling properties. A ratio scale tends to be the most sophisticated scale in the sense that it allows not only to identify the absolute differences between each represents but also absolute comparisons.
Examples of ratio scales are the commonly used physical dimensions such as height, weight, distance, money value and population counts. It is necessary to remember that ratio scale structures are designed to allow a “zero” or “true state of nothing” response to be a valid raw response to the question. Normally, the ratio scale requests that respondents give a specific singular numerical value as their response, regardless of whether or not a set of scale points used. The following are the examples of ratio scales.
E.g. 1: Please circle the numbers of children under 18 years of ages in your
house
0 1 2 3 4 5
E.g. 2: In past seven days, how many times did you go to retail shop.
Number of times __________
MATHEMATICAL AND STATISTICAL ANALYSIS OF SCALES
The type of scale that is utilized in business research will determine the form of statistical analysis. For example certain operations can be conducted only if a scale of a particular nature. The following will show the relationship between scale types and measures of central tendency and dispersion.

MEASUREMENT SCALE TYPES
NOMINAL ORDINAL INTERNAL RATIO
1. CENTRAL TENDENCY
A) MODE
B) MEDIAN
C) MEAN
A
I A
I A
A
MORE A
I A
A
A
MOST A
A
A
MOST A
2. DISPERSION
A) FREQUENCY DISTRIBUTION
B) RANGE
C) STANDARD DEVIATION
A
I A
I A
A
MORE A
I A
A
A
MOST A
A
A
MOST A
Here
A – Appropriate
More A – More appropriate
Most A – Most appropriate
I A – Inappropriate
CRITERIA FOR GOOD MEASUREMENT
There are four major criteria for evaluating measurement: reliability, validity, sensitivity and practicality.
1. RELIABILITY
It refers to the extent to which a scale can reproduce the same measurement results in repeated trials. Reliability applies to a measure when similar results are obtained overtime across situations. Broadly defined, reliability is the degree to which measures are free from error and therefore yield consistent results. As discussed in the earlier chapter the error in scale measurements leads to lower scale reliability. Two dimensions underline the concept of reliability: one is repeatability and the other is internal consistency.
First, the test-retest method involves administrating the same scale or measure to the same respondents at two separate times to test for stability. If the measure is stable over time, the test, administered under the same conditions each time, should obtain similar results. The high stability correlation or consistency between the two measures at time 1 and 2 indicates a high degree of reliability.
The second dimension of reliability concerns the homogeneity of the measure. The Split – Half Technique can be used when the measuring tool has many similar questions or statements to which subjects can respond. The instrument is administered and the results are separated by item into even and odd numbers or randomly selected halves. When the two halves are correlated, if the result of the correlation is high, the instrument is said to be high reliability in internal consistency.
The Spearman – Brown Correction Formula is used to adjust the effect of test length and to estimate reliability of the whole set. But, this approach may influence the integral consistency because of the way in which the test is split. In order to overcome Kuder - Richardson Formula (KR 20) and Cronbach’s Coefficient Alpha are two frequently used examples. The KR 20 is the method from which alpha was generalized and is used to estimate reliability for dichotomous items. Cronbach’s alpha has the most utility for multi-scale items at the interval level of measurement.
The third perspective on reliability considers how much error may be introduced by different investigators or different samples of items being studied. In other words, the researcher creates two similar yet different scale measurements for the given construct. An example of this is the scoring of Olympic skaters by a panel of judges.
2. VALIDITY
The purpose of measurement is to measure what it intend to measure – but this is not as simple as it sounds at first. Validity is the ability of a measure to measure what it is proposed to measure. If it does not measure what it is designated to measure, there will be problems. To assess the validity there are second ways which are discussed here under.
Face validity or Content validity, refers to the subjective agreement among professionals that a scale logically appears to reflect accurately what it intend to measure. When it appears evident to experts that the measure provides adequate coverage of the concept, a measure has face validity.
Criterion – Related validity reflects the success of measures used for prediction or estimation. Criterion validity may be classified as either concurrent validity or predictive validity, depending on the time sequence in which the ‘new’ measurement scale and the criterion measure are correlated. If the new measure is taken at the same time as the criterion measure and shown to be valid, then it has concurrent validity. Predictive validity is established when a new measure predicts a future event. These two measures differ only on the basis of time.
Construct validity is established by the degree to which a measure confirms the hypotheses generated from theory based on concepts. It implies the empirical evidence generated by a measure with the theoretical logic. To achieve this validity, the researcher may use convergent validity (should converge with similar measure) or discriminant validity (when it has low correlation with the measures of dissimilar concepts.)
3. SENSITIVITY
It is an important concept, particularly when changes in attitude or other hypothetical constructs are under investigation. It refers to an instruments ability to accurately measure variability in stimuli or responses. A dichotomous response category such as “agree” or “disagree” does not allow attitude change. But the scale staring from “strongly agree”, “agree”, “neither agree nor disagree”, “disagree” and “strongly disagree” increases the sensitivity.
4. PRACTICALITY
It can be defined in terms of economy, convenience and interpretability. This means the scientific requirements for the measurement process is called reliable and valid, while operational requirements called it as practical where the above mentioned three aspects are more important.
SUMMARY
This paper outlined the importance of the measurement. Different types of scales have been dealt in detail. This chapter has given the criteria for a good measurement.
KEY TERMS
• Nominal scale
• Ordinal scale
• integral scale
• ratio scale
• reliability
• Split – Half Technique
• Spearman – Brown Correction Formula
• Kuder - Richardson Formula (KR 20)
• Cronbach’s Coefficient Alpha
• Validity
• Face validity
• Content validity
• Criterion – Related validity - concurrent validity or predictive validity
• Construct validity
• sensitivity
• practicality
QUESTIONS
1. What are different types of data in the attitude measurement could be collected?
2. Discuss the measurement scaling properties.
3. Explain different scales of measurement.
4. Is the statistical analysis is based on the type of scale? Explain.
5. What do you mean by good measurement?
6. Explain various methods of reliability and validity.

LESSON – 13
ATTITUDE MEASUREMENT AND SCALING TECHNIQUES
OBJECTIVES
• To understand the definition of attitude
• To learn the techniques for measuring attitudes
STRUCTURE
 Techniques for measuring attitude
 Physiological measures of attitude
 Summated rating method
 Numerical scale
 Graphic rating scale
ATTITUDE DEFINED
There are many definitions for the term attitude. An attitude is usually viewed as an enduring disposition to respond consistently in a given manner to various aspects of the world, including persons, events, and objects. One conception of attitude is reflected in this brief statement: “Sally loves working at Sam’s. She believes it’s clean, conveniently located, and has the best wages in town. She intends to work there until she retires.” In this short description are three components of attitude: the affective, the cognitive, and the behavioral.
The affective component reflects an individual’s general feelings or emotion toward an object. Statements such as “I love my job,” “I liked that book, A Corporate Bestiary,” and “I hate apple juice” reflect the emotional character of attitudes.
The way one feels about a product, a person, or an object is usually tied to one’s beliefs or cognitions. The cognitive component represents one’s awareness of and knowledge about an object. A woman might feel happy about her job because she “believes that the pay is great” or because she knows “that my job is the biggest challenge in India.”
The third component of an attitude is the behavioral component. Intention and behavioral expectations are included in this component, which therefore reflects a predisposition to action.
TECHNIQUES FOR MEASURING ATTITUDES
A remarkable variety of techniques have been devised to measure attitudes part, this diversity stems from the lack of consensus about the exact definite of the concept. Further, the affective, cognitive, and behavioral component an attitude may be measured by different means. For example, sympathetic nervous system responses may be recorded using physiological measures to measure affect, but they are not good measures of behavioral intentions. Direct verbal statements concerning affect, belief, or behavior are utilized to measure behavioral intent. However, attitudes may also be measured indirectly by using the qualitative, explanatory techniques. Obtaining verbal statements from respondents generally requires that the respondent perform a task such as ranking, rating, sorting, or making a choice or a comparison.
A ranking task requires that the respondents rank order a small number of items on the basis of overall preference or some characteristic of the stimulus. Rating asks the respondents to estimate the magnitude of a characteristic or quality that an object possesses. Quantitative scores, along a continuum that has been supplied to the respondents, are used to estimate the strength of the attitude or belief. In other words, the respondents indicate the position, on a scale, where they would rate the object.
A sorting technique might present respondents with several product concepts, printed on cards, and require that the respondents arrange the cards into a number of piles or otherwise classify the product concepts. The choice technique, choosing one of two or more alternatives, is another type of attitude measurement. If a respondent chooses one object over another, the researcher can assume that the respondent prefers the chosen object over the other.
The most popular techniques for measuring attitudes are presented in this chapter.
PHYSIOLOGICAL MEASURES OF ATTITUDES
Measures of galvanic skin response, blood pressure, and pupil dilation and other physiological measures may be utilized to assess the affective component of attitudes. They provide a means of measuring attitudes without verbally questioning the respondent. In general, they can provide a gross measure of like or dislike, but they are not sensitive measures for identifying gradients of an attitude.
ATTITUDE RATING SCALES
Using rating scales to measure attitudes is perhaps the most common practice in business research. This section discusses many rating, scales designed to enable respondents to report the intensity of their attitudes.
SIMPLE ATTITUDE SCALES
In this most basic form, attitude scaling requires that an individual agree or disagree with a statement or respond to a single question. For example, respondents in a political poll may be asked whether they agree or disagree with the statement. “The president should run for re-election,” or an individual might be asked to indicate whether he likes or dislikes labor unions. Because this type or self-rating scale merely classifies respondents into one of two categories, it has only the properties of a nominal scale. This, of course, limits the type of mathematical analysis that may be utilized with this basic scale. Despite the disadvantages, simple attitude scaling may be used when questionnaires are extremely long, when respondents have little education, or for other specific reasons.
Most attitude theorist’s belief that attitudes vary along, continua. An early attitude researcher pioneered the view that the task of attitude scaling is to measure the distance from “good” to “bad”, “low” to “high”, “like” to “dislike”, and so on. Thus the purpose of an attitude scale is to find an individual’s position on the continuum. Simple scales do not allow for making fine distinctions in attitudes. Several scales have been developed to help make more precise measurements.
CATEGORY SCALES
Some rating scales have only two response categories: agree and disagree. Expanding the response categories provides the respondent more flexibility in the rating task. Even more information is provided if the categories are ordered according to a descriptive or evaluative dimension. Consider the questions below:
How often is your supervisor courteous and friendly to you?
 Never
 Rarely
 Sometimes
 Often
 Very often
Each of these category scales is a more sensitive measure than a scaled with only two response categories. Each provides more information.
Wording is an extremely important factor in the usefulness of these scales. Exhibit 14.1 shows some common wordings for category scales.
EXHIBIT 14.1 SELECTED CATEGORY SCALES
QUALITY
EXCELLENT GOOD FAIR POOR
IMPORTANCE
VERY IMPORTANT FAIRLY IMPORTANT NEUTRAL NOT SO IMPORTANT NOT AT ALL IMPORTANT
INTEREST
VERY INTERESTED SOMEWHAT INTERESTED NOT VERY INTERESTED
SATISFACTION
VERY SATISFIED SOMEWHAT SATISFIED NEITHER SATISFIED NOR DISSATISFIED SOMEWHAT DISSATISFIED VERY DISSATISFIED
FREQUENCY
ALL OF THE TIME? VERY OFTEN OFTEN SOMETIMES HARDLY EVER NEVER.
TRUTH
VERY TRUE SOMEWHAT TRUE NOT VERY TRUE NOT AT ALL TRUE
SUMMATED RATINGS METHOD: THE LIKERT SCALE
Business researchers’ adaptation of the summated ratings method, developed by Rensis Likert, is extremely popular for measuring attitudes because the method is simple to administer. With the Likert scale, respondents indicate their attitudes by checking how strongly they agree or disagree with carefully constructed statements that range from very positive to very negative toward the attitudinal object. Individuals generally choose from five alternatives: strongly agree, agree, uncertain, disagree, and strongly disagree; but the number of alternatives may range from three to nine.
Consider the following example from a study on mergers and acquisitions:
Mergers and acquisitions provide a faster means of growth than internal expansions.
STRONGLY
DISAGREE
(1) DISAGREE
(2) UNCERTAIN
(3) AGREE (4) STRONGLY AGREE
(4)
To measure the attitude, researchers assign scores or weights to the alternative responses. In this example, weights of 5, 4, 3, 2, and 1 are assigned to the answers. (The weights; shown in parentheses, would not be printed on the questionnaire). Because the statement used as an example is positive towards the attitude, strong agreement indicates the most favorable attitudes on the statement, and is assigned a weight of 5. If a negative statement toward the object (such as “Your access to copy machines is limited”) were given, the weights would be reversed, and “strongly disagree” would be assigned the weight of 5. A single scale item on a summated rating scale is an ordinal scale.
A Likert scale may include several scale items to form an index. Each statement is assumed to represent an aspect of a common attitudinal domain. For example, Exhibit 14.2 shows the items in a Likert scale to measure attitudes toward a management by objectives program. The total score is the summation of the weights assigned to an individual’s response.
For e.g. Likert scale.
Here are some statements that describe how employees might feel about the MBO (management- by-objectives) form of management. Please indicate your agreement or disagreement for each statement please circle the appropriate number to indicate whether you:
1 – STRONGLY AGREE 2 – AGREE 3 – NEUTRAL 4 – DISAGREE
5 – STRONGLY DISAGREE
Circle one and only one answer for each statement. There are no right or wrong answers to these questions. Just give your option.
STRONGLY AGREE AGREE NEUTRAL DISAGREE STRONGLY DISAGREE
1. MBO IS AN EFFECTIVE WAY OF PLANNING AND ORGANIZING THE WORK FOR WHICH I AM RESPONSIBLE. 1 2 3 4 5
2. MOB PROVIDES AN EFFECTIVE WAY OF EVALUATING MY WORK PERFORMANCE. 1 2 3 4 5
3. MBO MOTIVATES ME TO DO THE VERY BEST ON MY JOB. 1 2 3 4 5
In Likert’s original procedure, a large number of statements are generated and then an item analysis is performed. The purpose of the item analysis is to ensure that final items evoke a wide response and discriminate among those with positive and negative attitudes. Items that are poor because they lack clarity or elicit mixed response patterns are eliminated from the final statement list. However, many business researchers do not follow the exact procedure prescribed by Likert. Hence, many business researches do not follow the exact procedure prescribed by Likert. Hence, a disadvantage of the Likert-type summated rating method is that it is difficult to know what a single summated score means. Many patterns of response to the various statements can produce the same total score. Thus, identical total scores may reflect different “attitudes” because respondents endorsed different combinations of statements.
SEMANTIC DIFFERENTIAL
The semantic differential is a series of attitude scales. This popular attitude-measurement technique consists of presenting an identification of a company, product, brand, job, or other concept, followed by a series of seven-point bipolar rating scales. Bipolar adjectives, such as “good” and “bad”, “modern” and “old-fashioned”, or “clean” and “dirty,” anchor the beginning and end (or poles) of the scale.
MODERN _____:_____:_____:_____:______:______:______OLD-FASHIONED.
The subject makes repeated judgments of the concept under investigation on each of the scales.
The scoring of the semantic differential can be illustrated by using the scale bounded by the anchors “modern” and “old-fashioned.” Respondents are instructed to check the place that indicates the nearest appropriate adjective. From left to right, the scale intervals are interpreted as extremely modern, very modern, slightly modern, both modern and old-fashioned, slightly old-fashioned, very old-fashioned, and extremely old-fashioned. A weight is assigned to each position on the rating scale. Traditionally, scores are 7, 6, 5, 4, 3, 2, 1, or +3, +2, +1, 0, -1, -2, -3.
Many researchers find it desirable to assume that the semantic differential provides interval data. This assumption, although widely accepted, has its critics, who argue that the data have only ordinal properties because the weights are arbitrary. Depending on whether the data are assumed to be interval or ordinal, the arithmetic mean or the median is utilized to plot the profile of one concept, product, unit, etc., compared with another concept, product, or units.
The semantic differential technique was originally developed by Charles Osgood and others as a method for measuring the meaning of objects or the “semantic space” of interpersonal experience.” Business researchers have found the semantic differential versatile and have modified it for business applications.
NUMERICAL SCALES
Numerical scales have numbers, rather than “semantic space” or verbal descriptions, as response options, to identify categories (response positions). If the scale items have five response positions, the scale is called a 5-point numerical scale; with seven response positions, it is called a 7-point numerical scale; and so on.
Consider the following numerical scale
Now that you’ve had your automobile for about one year,
Please tell us how satisfied you are with your Ford Icon,
Extremely Extremely
Satisfied 7 6 5 4 3 2 1 Dissatisfied
This numerical scale utilizes bipolar adjectives in the same manner as the semantic differential.
CONSTANT-SUM SCALE
If a Parcel Service company wishes to determine the importance of the attributes of accurate invoicing, delivery as promised, and price to organizations that use its service in business-to-business marketing. Respondents might be asked to divide a constant sum to indicate the relative importance of the attributes. For example:
Divide 100 points among the following characteristics of a delivery service according to how important each characteristic is to you when selecting a delivery company.
Accurate invoicing_____________
Delivery as promised _______________
Lower price______________
The constant-sum-scale works best with respondents with high educational levels. If respondents follow instructions correctly, the results approximate interval measures. As in the paired-comparison method, as the number of stimuli increases this technique becomes more complex.
STAPEL SCALE
The Stapel scale was originally developed in the 1950s to measure the direction and intensity of an attitude simultaneously. Modern versions of the scale use a single adjective as a substitute for the semantic differential when it is difficult to create pairs of bipolar adjectives. The modified Stapel scale places a single adjective in the center of an even number of numerical values (for example, ranging from +3 to -3). It measures how close to or how distant from the adjective a given stimulus is perceived to be.
The advantages and disadvantages of the Stapel scale are very similar to those of the semantic differential. However, the Stapel scale is markedly easier to administer, especially over the telephone. Because the Stapel scale does not requires bipolar adjectives, as does the semantic differential, the Stapel scale is easier to construct. Research comparing the semantic differential with the Stapel scale indicates that results from the two techniques are largely the same.
GRAPHIC RATING SCALES
A graphic rating scale presents respondents with graphic continuum. The respondents are allowed to choose any point on the continuum to indicate their attitudes. Typically, a respondent’s score is determined by measuring the length (in millimeters) from one end of the graphic continuum to the point marked by the respondent. Many researchers believe scoring in this manner strengthens the assumption that graphic rating scales of this type are interval scales. Alternatively, the researcher may divide the line into predetermined scoring categories (lengths) and record respondents” marks accordingly. In other words, the graphic rating scale has the advantage of allowing the researchers to choose any interval they wish for purposes of scoring. The disadvantage of the graphic rating scale is that there are no standard answers.
THURSTONE EQUAL-APPEARING INTERVAL SCALE
In 1927, Louis Thurstone, an early pioneer in attitude research, developed the concept that attitudes vary along continua and should be measured accordingly. Construction of a Thurstone scale is a rather complex process that requires two stages. The first stage is a ranking operation, performed by judges, who assigns scale values to attitudinal statements. The second stage consists of asking subjects to respond to the attitudinal statements.
The Thurstone method is time-consuming and costly. From a historical perspective it is valuable, but its current popularity is low. Because it is rarely utilized in most applied business research.
EXHIBIT 14.9 SUMMARY OF THE ADVANTAGES AND DISADVANTAGES OF RATING SCALES
RATING MEASURE SUBJECT MUST ADVANTAGES DISADVANTAGES
CATEGORY SCALE INDICATE RESPONSE CATEGORY. FLEXIBLE, EASY TO RESPOND TO. ITEMS MAY BE AMBIGUOUS, WITH FEW CATEGORIES ONLY GROSS DISTINCTIONS CAN BE MADE.
LIKERT SCALE EVALUATE STATEMENTS ON A SCALE THAT TYPICALLY CONTAINS FIVE ALTERNATIVES. EASIEST SCALE TO CONSTRUCT. HARD TO JUDGE WHAT A SINGLE SCORE MEANS.
SEMANTIC DIFFERENTIAL AND NUMERICAL SCALES CHOOSE POINTS BETWEEN BIPOLAR ADJECTIVES ON RELEVANT DIMENSIONS. EASY TO CONSTRUCT; NORMS EXIST FOR COMPARISON. BIPOLAR ADJECTIVES MUST BE FOUND; DATA MAY BE ORDINAL, NOT INTERVAL.
CONSTANT-SUM SCALE DIVIDE A CONSTANT SUM AMONG RESPONSE ALTERNATIVES. APPROXIMATES INTERVAL MEASURE. DIFFICULT FOR RESPONDENTS WITH LOW EDUCATION LEVELS.
STAPET SCALE CHOOSE POINTS ON A SCALE WITH A SINGLE ADJECTIVE IN CENTRE. EASIER TO CONSTRUCT THAN SEMANTIC DIFFERENTIAL, EASY TO ADMINISTER. ENDOPOINTS ARE NUMERICAL, NOT VERBAL, LABELS.
GRAPHIC SCALE CHOOSE A POINT ON A CONTINUUM. VISUAL IMPACT, UNLIMITED SCALE POINTS. NO STANDARD ANSWERS.
GRAPHIC SCALE WITH PICTURE RESPONSE CATEGORIES CHOOSE A VISUAL PICTURE. VISUAL IMPACT. HARD TO ATTACH VERBAL EXPLANATION TO RESPONSE.
SCALES MEASURING BEHAVIORAL INTENTIONS AND EXPECTATIONS
The behavioral component of an attitude involves the behavioral expectations of an individual toward an attitudinal object. Typically, this represents an intention or a tendency to seek additional information. Category scales that measure the behavioral component of an attitude attempt to determine a respondent’s “likelihood” of action or intention to perform some future action, as in the following examples
How likely is it that you will change jobs in the next six months
 I definitely will change.
 I probably will change.
 I might change.
 I probably will not change.
 I definitely will not change.
I would write a letter to my congressmen or other government official in support of this company if it were in a dispute with government.
 Extremely likely
 Very likely
 Somewhat likely
 Likely, about 50-50 chance
 Somewhat unlikely
 Very unlikely
 Extremely unlikely
BEHAVIORAL DIFFERENTIAL
A general instrument, the behavioral differential, has been developed to measure the behavioral intentions of subjects toward an object or category of objects. As in the semantic differential, a description of the object to be judged is placed on the top of a sheet, and the subjects indicate their behavioral intentions toward this object on a series of scales. For example, one item might be
A 25-YEAR-OLD FEMALE COMMODITY BROKER
WOULD _____: _____: _____: _____: _____: _____: _____ WOULD NOT
ASK THIS PERSON FOR ADVICE.
RANKING
People often rank order their preferences. An ordinal scale may be developed by asking respondents to rank order (from most preferred to lease preferred) a set of objects or attributes. It is not difficult for respondents to understand the task of rank ordering the importance of fringe benefits or arranging a set of job tasks according to preference.
PAIRED COMPARISONS
The following question is the typical format for asking about paired comparisons.
I would like to know your overall opinion of two brands of adhesive bandages. They are Curad brand and Band-Aid brand. Overall, which of these two brands – Curad or Band-Aid-do you think is the better one? Or are both the same?
Curad is better __________
Band-Aid is better __________
They are the same __________
Ranking objects with respects to one attribute is not difficult if only a few concepts or items are compared. As the number of items increases, the number of comparisons increases geometrically. If the number of comparisons is too great, respondents may become fatigued and no longer carefully discriminate among them.
SORTING
Sorting tasks requires that respondents indicate their attitudes or beliefs by arranging items.
SUMMARY
This chapter describes the technique for measuring attitude. This paper outlined the importance of the attitude
KEY TERMS
• Attitude
• Affective component
• Cognitive component
• Behavioral component
• Ranking
• Rating
• Category scale
• Likert scale
• Semantic differential scale
• Numerical scale
• Constant sum scale
• Staple scale
• Graphic rating scale
• Paired comparison
QUESTIONS
1. What is an attitude?
2. Distinguish between rating and ranking. Which is a better attitude measurement? Why?
3. Describe the different methods of scale construction, pointing out the merits and demerits of each.
4. What advantages do numerical scales have over semantic differential scales?
 
LESSON – 14
LESSONS ON ADVANCED DATA TECHNIQUES
OBJECTIVES
• To understand the procedures and applications of following statistical analysis
o Discriminant analysis
o ANOVA
o Multi dimensional Scaling
o Cluster analysis
STRUCTURE
 Analysis of variance
 Condition for ANOVA
 ANOVA model
 Discriminant analysis
 Factor analysis
 Cluster analysis
 Ward’s method
ANALYSIS OF VARIANCE (ANOVA)
Analysis of variance (ANOVA) is used to test hypotheses about differences between two or more means. The t-test based on the standard error of the difference between two means can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using t-tests. However, conducting multiple t-tests can lead to severe inflation of the Type I error rate. (Click here to see why) Analysis of variance can be used to test differences among several means for significance without increasing the Type I error rate. This chapter covers designs with between-subject variables. The next chapter covers designs with within-subject variables.
The statistical method for testing the null hypothesis that the means of several populations are equal is analysis of variance. It uses a single factor, fixed –effects model to compare the effects of one factor (brands of coffee, varieties if residential housing, types if retail stores) on a continuous dependent variable. In a fixed effects model, the levels of the factor are established in advance and the results are not generalizable to other levels of treatment.
Consider a hypothetical experiment on the effect of the intensity of distracting background noise on reading comprehension. Subjects were randomly assigned to one of three groups. Subjects in Group 1 were given 30 minutes to read a story without any background noise. Subjects in Group 2 read the story with moderate background noise, and subjects in Group 3 read the story in the presence of loud background noise.
The first question the experimenter was interested in was whether background noise has any effect at all. That is, whether the null hypothesis: ยต1 = ยต2 = ยต3 is true where ยต1 is the population mean for the "no noise" condition, ยต2 is the population mean for the "moderate noise" condition, and ยต3 is the population mean for the "loud noise" condition. The experimental design therefore has one factor (noise intensity) and this factor has three levels: no noise, moderate noise, and loud noise.
Analysis of variance can be used to provide a significance test of the null hypothesis that these three population means are equal. If the test is significant, then the null hypothesis can be rejected and it can be concluded that background noise has an effect.
In a one-factor between- subjects ANOVA, the letter "a" is used to indicate the number of levels of the factor (a = 3 for the noise intensity example). The number of subjects assigned to condition 1 is designated as n1; the number of subjects assigned to condition 2 is designated by n2, etc.
If the sample size is the same for all of the treatment groups, then the letter "n" (without a subscript) is used to indicate the number of subjects in each group. The total number of subjects across all groups is indicated by "N." If the sample sizes are equal, then N = (a)(n); otherwise,
N = n1 + n2 + ... + na.
Some experiments have more than one between-subjects factor. For instance, consider a hypothetical experiment in which two age groups (8-year olds and 12-year olds) are asked to perform a task either with or without distracting background noise. The two factors are age and distraction.
ASSUMPTIONS
Analysis of variance assumes normal distributions and homogeneity of variance. Therefore, in a one-factor ANOVA, it is assumed that each of the populations is normally distributed with the same variance (ฯƒ²). In between-subjects analyses, it is assumed that each score is sampled randomly and independently. Research has shown that ANOVA is "robust" to violations of its assumptions.
This means that the probability values computed in an ANOVA are satisfactorily accurate even if the assumptions are violated. Moreover, ANOVA tends to be conservative when its assumptions are violated. This means that although power is decreased, the probability of a Type I error is as low or lower than it would be if its assumptions were met. There are exceptions to this rule. For example, a combination of unequal sample sizes and a violation of the assumption of homogeneity of variance can lead to an inflated Type I error rate.
CONDITIONS FOR ANOVA
1. The sample must be randomly selected from normal populations
2. The populations should have equal variances
3. The distance from one value to its group’s mean should be independent of the distances of other values to that mean(independence of error).
4. Minor variations from normality and equal variances are tolerable. Nevertheless, the analyst should check the assumptions with the diagnostic techniques.
Analysis of variance breaks down or partitions total variability into component parts. Unlike the ‘t’ test, which uses the sample standard deviations, ANOVA uses squared deviations of the variance so computation of distances of the individual data points from their own mean or from the grand mean can be summed.
In ANOVA model, each group has its own mean and values that deviate from that mean. Similarly, all the data points from all of the groups produce an overall grand mean. The total deviation is the sum of the squared differences between each data point and the overall grand mean.
The total deviation of any particular data point may be partitioned into between-groups variance and within group variance. The between group variance represents the effect of the treatment or factor. The difference of between-groups means imply that each group was treated differently. The treatment will appear as deviations of the sample means from the grand mean. Even if this were not so, there would still be some natural variability among subjects and some variability attributable to sampling. The within-groups variance describes the deviations of the data points within each group from the sample mean. This results from variability among subjects and from random variation. It is often called error.
When the variability attributable to the treatment exceeds the variability arising from error and random fluctuations, the viability of the null hypothesis begins to diminish. And this is exactly the way the test static for analysis of variance works.
The test statistics for ANOVA is the F ratio. It compares the variance from the last two sources:

Where,


To compute the F ratio, the sum of the squared deviations for the numerator and denominator are divided by their respective degrees of freedom. By dividing, computing the variances as an average or mean, thus the term mean square. The degrees of freedom for the numerator, the mean square between groups, is one less than the number of groups (k-1). The degree of freedom for the denominator, the mean square within groups, is the total number of observations minus the number of groups (n-k).
If the null hypothesis is true, there should be no difference between the populations, and the ratio should be close to 1. If the population means are not equal, the numerator should manifest this difference. The F ratio should be greater than 1. The f distribution determines the size of ratio necessary to reject the null hypothesis for a particular sample size and level of significance.
ANOVA MODEL
To illustrate reports one way ANOVA, consider the following hypothetical example. To find out the number one best business school in India, 20 business magnets were randomly selected and asked to rate the top 3 B-schools. The ratings are given below
DATA
PERSON RATING OF B-SCHOOL 1 RATING OF B-SCHOOL 2 RATING OF B-SCHOOL 3
1 40 56 92
2 28 48 56
3 36 64 64
4 32 56 72
5 60 28 48
6 12 32 52
7 32 42 64
8 36 40 68
9 44 61 76
10 36 58 56
11 40 52 88
12 68 70 79
13 20 73 92
14 33 72 88
15 65 73 73
16 40 71 68
17 51 55 81
18 25 68 95
19 37 81 68
20 44 78 78
Let’s apply the one way ANOVA test on this example.
STEP 1: NULL HYPOTHESIS
H0: A1=A2 =A3
HA : A1A2 A3
STEP 2: STATISTICAL TEST
The F test is chosen because the example has independent samples, accept the assumptions of analysis of variance and have interval data.
STEP 3: SIGNIFICANCE LEVEL
let  = 0.05 and degree of freedom=[numerator (k-1) = (3-1)=2], [denominator (n-k)=(60-3)=57]= 2,57
STEP 4: CALCULATED VALUE

F= 5822.017/ 205.695 = 28.304 degree of freedom (2, 57)
STEP 5: CRITICAL TEST VALUE
From the F-distribution table with degree of freedom (2, 57),
 = 0.05 the critical value is 3.16.
STEP 6: DECISION
Since the calculated value is greater than the critical value (28.3>3.16), the null hypothesis is rejected. The conclusion is there is statistically significant difference between two or more pairs of means. The following table shows that the p value equals 0.0001. Since the p value (0.0001) is less than the significance level (0.05), this have the second method for rejecting the null hypothesis.
The ANOVA model summary given in the following table is the standard way of summarizing the results of analysis of variance. It contains the sources of variation, degrees of freedom, sum of squares, mean squares and calculated F value. The probability of rejecting the null hypothesis is computed up to 100 percent  -- that is, the probability value column reports the exact significance for the F ratio being tested.
TABLE: ONE WAY ANOVA MODEL SUMMARY
DEGREE OF FREEDOM SUM OF SQUARES MEAN SQUARES F VALUE ‘P’ VALUE
2 11644.033 5822.017 28.304 0.0001
57 11724.550 205.694
TOTAL 59 23368.583
TABLE ON MEANS
BUSINESS SCHOOL PERSONS MEAN STANDARD DEVIATION STANDARD ERROR
1 20 38.95 14.01 3.13
2 20 58.90 15.09 3.37
3 20 72.90 13.90 3.11
TABLE ON SCHEFFES MULTIPLE COMPARISON PROCEDURE
BUSINESS SCHOOL COMPARISON DIFFERENCE CRITICAL DIFFERENCE ‘P’ VALUE
1 & 2 19.95 11.40 0.0002 S
1 & 3 33.95 11.40 0.0001 S
2 & 3 14.00 11.40 0.0122 S
S = Significantly different at this level. Significance level: 0.05
All data are hypothetical
FIGURE ON ONE WAY ANALYSIS OF VARIANCE PLOTS

FIGURE ON ONE WAY ANALYSIS OF VARIANCE PLOTS

DISCRIMINANT ANALYSIS
Discriminent function analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) to attend a trade or professional school, or (3) to seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice.
A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a Discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types.
COMPUTATIONAL APPROACH
Let us consider a simple example. Suppose we measure height in a random sample of 50 males and 50 females. Females are, on the average, not as tall as males, and this difference will be reflected in the difference in means (for the variable Height). Therefore, variable height allows us to discriminate between males and females with a better than chance probability: if a person is tall, then he is likely to be a male, if a person is short, then she is likely to be a female.
We can generalize this reasoning to groups and variables that are less "trivial." For example, suppose we have two groups of high school graduates: Those who choose to attend college after graduation and those who do not. We could have measured students' stated intention to continue on to college one year prior to graduation. If the means for the two groups (those who actually went to college and those who did not) are different, then we can say that intention to attend college as stated one year prior to graduation allows us to discriminate between those who are and are not college bound (and this information may be used by career counselors to provide the appropriate guidance to the respective students).
To summarize the discussion so far, the basic idea underlying discriminant function analysis is to determine whether groups differ with regard to the mean of a variable, and then to use that variable to predict group membership (e.g., of new cases).
STEPWISE DISCRIMINANT ANALYSIS
Probably the most common application of Discriminant function analysis is to include many measures in the study, in order to determine the ones that discriminate between groups. For example, an educational researcher interested in predicting high school graduates' choices for further education would probably include as many measures of personality, achievement motivation, academic performance, etc. as possible in order to learn which one(s) offer the best prediction.
Model. Put another way, we want to build a "model" of how we can best predict to which group a case belongs. In the following discussion we will use the term "in the model" in order to refer to variables that are included in the prediction of group membership, and we will refer to variables as being "not in the model" if they are not included.
Forward stepwise analysis. In stepwise discriminant function analysis, a model of discrimination is built step-by-step. Specifically, at each step all variables are reviewed and evaluated to determine which one will contribute most to the discrimination between groups. That variable will then be included in the model, and the process starts again.
Backward stepwise analysis. One can also step backwards; in that case all variables are included in the model and then, at each step, the variable that contributes least to the prediction of group membership is eliminated. Thus, as the result of a successful discriminant function analysis, one would only keep the "important" variables in the model, that is, those variables that contribute the most to the discrimination between groups.
F to enter, F to remove. The stepwise procedure is "guided" by the respective F to enter and F to remove values. The F value for a variable indicates its statistical significance in the discrimination between groups, that is, it is a measure of the extent to which a variable makes a unique contribution to the prediction of group membership.
Capitalizing on chance. A common misinterpretation of the results of stepwise discriminant analysis is to take statistical significance levels at face value. By nature, the stepwise procedures will capitalize on chance because they "pick and choose" the variables to be included in the model so as to yield maximum discrimination. Thus, when using the stepwise approach the researcher should be aware that the significance levels do not reflect the true alpha error rate, that is, the probability of erroneously rejecting H0 (the null hypothesis that there is no discrimination between groups).
MULTI DIMENSIONAL SCALING
Multidimensional scaling (MDS) is a set of related statistical techniques often used in data visualisation for exploring similarities or dissimilarities in data. An MDS algorithm starts with a matrix of item-item similarities, and then assigns a location of each item in a low-dimensional space, suitable for graphing or 3D visualisation.
CATEGORIZATION OF MDS
MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix:
• Classical multidimensional scaling also often called Metric multidimensional scaling -- assumes the input matrix is just an item-item distance matrix. Analogous to Principal components analysis, an eigenvector problem is solved to find the locations that minimize distortions to the distance matrix. Its goal is to find a Euclidean distance approximating a given distance. It can be generalized to handle 3-way distance problems (the generalization is known as DISTATIS).
• Metric multidimensional scaling -- A superset of classical MDS that assumes a known parametric relationship between the elements of the item-item dissimilarity matrix and the Euclidean distance between the items.
• Generalized multidimensional scaling (GMDS) -- A superset of metric MDS that allows for the target distances to be non-Euclidean.
• Non-metric multidimensional scaling -- In contrast to metric MDS, non-metric MDS both finds a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distance between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression.
MULTIDIMENSIONAL SCALING PROCEDURE
There are several steps in conducting MDS research:
1. Formulating the problem - What brands do you want to compare? How many brands do you want to compare? More than 20 is cumbersome. Less than 8 (4 pairs) will not give valid results. What purpose is the study to be used for?
2. Obtaining Input Data - Respondents are asked a series of questions. For each product pair they are asked to rate similarity (usually on a 7 point Likert scale from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as Q = N (N - 1) / 2 where Q is the number of questions and N is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes which are rated on a semantic differential scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity.
3. Running the MDS statistical program - Software for running the procedure is available in most of the better statistical applications programs. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). The researchers must decide on the number of dimensions they want the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to interpret the results.
4. Mapping the results and defining the dimensions - The statistical program (or a related module) will map the results. The map will plot each product (usually in two dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. The dimensions must be labelled by the researcher. This requires subjective judgment and is often very challenging. The results must be interpreted
5. Test the results for reliability and Validity - Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of .6 is considered the minimum acceptable level. Other possible tests are Kruskal’s Stress, split data tests, data stability tests (ie.: eliminating one brand), and test-retest reliability.
INPUT DATA
The input to MDS is a square, symmetric 1-mode matrix indicating relationships among a set of items. By convention, such matrices are categorized as either similarities or dissimilarities, which are opposite poles of the same continuum. A matrix is a similarity matrix if larger numbers indicate more similarity between items, rather than less. A matrix is a dissimilarity matrix if larger numbers indicate less similarity. The distinction is somewhat misleading, however, because similarity is not the only relationship among items that can be measured and analyzed using MDS. Hence, many input matrices are neither similarities nor dissimilarities.
However, the distinction is still used as a means of indicating whether larger numbers in the input data should mean that a given pair of items should be placed near each other on the map, or far apart. Calling the data "similarities" indicates a negative or descending relationship between input values and corresponding map distances, while calling the data "dissimilarities" or "distances" indicates a positive or ascending relationship.
A typical example of an input matrix is the aggregate proximity matrix derived from a pilesort task. Each cell xij of such a matrix records the number (or proportion) of respondents who placed items i and j into the same pile. It is assumed that the number of respondents placing two items into the same pile is an indicator of the degree to which they are similar. An MDS map of such data would put items close together which were often sorted into the same piles.
Another typical example of an input matrix is a matrix of correlations among variables. Treating these data as similarities (as one normally would), would cause the MDS program to put variables with high positive correlations near each other, and variables with strong negative correlations far apart.
Another type of input matrix is a flow matrix. For example, a dataset might consist of the number of business transactions occurring during a given period between a set of corporations. Running this data through MDS might reveal clusters of corporations that whose members trade more heavily with one another than other than with outsiders. Although technically neither similarities nor dissimilarities, these data should be classified as similarities in order to have companies who trade heavily with each other show up close to each other on the map.
DIMENSIONALITY
Normally, MDS is used to provide a visual representation of a complex set of relationships that can be scanned at a glance. Since maps on paper are two-dimensional objects, this translates technically to finding an optimal configuration of points in 2-dimensional space. However, the best possible configuration in two dimensions may be a very poor, highly distorted, representation of your data. If so, this will be reflected in a high stress value. When this happens, you have two choices: you can either abandon MDS as a method of representing your data, or you can increase the number of dimensions.
There are two difficulties with increasing the number of dimensions. The first is that even 3 dimensions are difficult to display on paper and are significantly more difficult to comprehend. Four or more dimensions render MDS virtually useless as a method of making complex data more accessible to the human mind.
The second problem is that with increasing dimensions, you must estimate an increasing number of parameters to obtain a decreasing improvement in stress. The result is model of the data that is nearly as complex as the data itself.
On the other hand, there are some applications of MDS for which high dimensionality are not a problem. For instance, MDS can be viewed as a mathematical operation that converts an item-by-item matrix into an item-by-variable matrix. Suppose, for example, that you have a person-by-person matrix of similarities in attitudes. You would like to explain the pattern of similarities in terms of simple personal characteristics such as age, sex, income and education. The trouble is, these two kinds of data are not conformable. The person-by-person matrix in particular is not the sort of data you can use in a regression to predict age (or vice-versa). However, if you run the data through MDS (using very high dimensionality in order to achieve perfect stress), you can create a person-by-dimension matrix which is similar to the person-by-demographics matrix that you are trying to compare it to.
MDS AND FACTOR ANALYSIS
Even though there are similarities in the type of research questions to which these two procedures can be applied, MDS and factor analysis are fundamentally different methods. Factor analysis requires that the underlying data are distributed as multivariate normal, and that the relationships are linear. MDS imposes no such restrictions. As long as the rank-ordering of distances (or similarities) in the matrix is meaningful, MDS can be used. In terms of resultant differences, factor analysis tends to extract more factors (dimensions) than MDS; as a result, MDS often yields more readily, interpretable solutions. Most importantly, however, MDS can be applied to any kind of distances or similarities, while factor analysis requires us to first compute a correlation matrix. MDS can be based on subjects' direct assessment of similarities between stimuli, while factor analysis requires subjects to rate those stimuli on some list of attributes (for which the factor analysis is performed).
In summary, MDS methods are applicable to a wide variety of research designs because distance measures can be obtained in any number of ways
APPLICATIONS
MARKETING
In marketing, MDS is a statistical technique for taking the preferences and perceptions of respondents and representing them on a visual grid. These grids, called perceptual maps are usually two-dimensional, but they can represent more than two.
Potential customers are asked to compare pairs of products and make judgements about their similarity. Whereas other techniques obtain underlying dimensions from responses to product attributes identified by the researcher, MDS obtains the underlying dimensions from respondents’ judgements about the similarity of products. This is an important advantage. It does not depend on researchers’ judgments. It does not require a list of attributes to be shown to the respondents. The underlying dimensions come from respondents’ judgements about pairs of products. Because of these advantages, MDS is the most common technique used in perceptual mapping
The "beauty" of MDS is that we can analyze any kind of distance or similarity matrix. These similarities can represent people's ratings of similarities between objects, the percent agreement between judges, the number of times a subjects fails to discriminate between stimuli, etc. For example, MDS methods used to be very popular in psychological research on person perception where similarities between trait descriptors were analyzed to uncover the underlying dimensionality of people's perceptions of traits (see, for example Rosenberg, 1977). They are also very popular in marketing research, in order to detect the number and nature of dimensions underlying the perceptions of different brands or products & Carmone, 1970).
In general, MDS methods allow the researcher to ask relatively unobtrusive questions ("how similar is brand A to brand B") and to derive from those questions underlying dimensions without the respondents ever knowing what is the researcher's real interest
CLUSTER ANALYSIS
The term cluster analysis (first used by Tryon, 1939) encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies. In other words cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Given the above, cluster analysis can be used to discover structures in data without providing an explanation/interpretation. In other words, cluster analysis simply discovers structures in data without explaining why they exist.
We deal with clustering in almost every aspect of daily life. For example, a group of diners sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations. There is a countless number of examples in which clustering playes an important role. For instance, biologists have to organize the different species of animals before a meaningful description of the differences between animals is possible. According to the modern system employed in biology, man belongs to the primates, the mammals, the amniotes, the vertebrates, and the animals. Note how in this classification, the higher the level of aggregation the less similar are the members in the respective class. Man has more in common with all other primates (e.g., apes) than it does with the more "distant" members of the mammals (e.g., dogs), etc. For a review of the general categories of cluster analysis methods, see Joining (Tree Clustering), Two-way Joining (Block Clustering), and k-Means Clustering.
Cluster Analysis (CA) is a classification method that is used to arrange a set of cases into clusters. The aim is to establish a set of clusters such that cases within a cluster are more similar to each other than they are to cases in other clusters.
The term cluster analysis (first used by Tryon, 1939) encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies. In other words cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Given the above, cluster analysis can be used to discover structures in data without providing an explanation/interpretation. In other words, cluster analysis simply discovers structures in data without explaining why they exist.
We deal with clustering in almost every aspect of daily life. For example, a group of diners sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations. There is a countless number of examples in which clustering playes an important role. For instance, biologists have to organize the different species of animals before a meaningful description of the differences between animals is possible. According to the modern system employed in biology, man belongs to the primates, the mammals, the amniotes, the vertebrates, and the animals. Note how in this classification, the higher the level of aggregation the less similar are the members in the respective class
Imagine a set of cases (e.g. patients, animals, quadrats etc ). For each case we have a score on two variables. The following scattergram was obtained. Some obvious initial clusters of points have been labelled.
Cluster analysis is an exploratory data analysis tool for solving classification problems. Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type.
Cluster analysis is thus a tool of discovery. It may reveal associations and structure in data which, though not previously evident, nevertheless are sensible and useful once found. The results of cluster analysis may contribute to the definition of a formal classification scheme, such as a taxonomy for related animals, insects or plants; or suggest statistical models with which to describe populations; or indicate rules for assigning new cases to classes for identification and diagnostic purposes; or provide measures of definition, size and change in what previously were only broad concepts; or find exemplars to represent classes.
Whatever business you're in, the chances are that sooner or later you will run into a classification problem. Cluster analysis might provide the methodology to help you solve
PROCEDURE FOR CLUSTER ANALYSIS
1. Formulate the problem - select the variables that you wish to apply the clustering technique to
2. Select a distance measure - various ways of computing distance:
• Squared Euclidean distance - the square root of the sum of the squared differences in value for each variable
• Manhattan distance - the sum of the absolute differences in value for any variable
• Chebychev distance - the maximum absolute difference in values for any variable
3. Select a clustering procedure (see below)
4. Decide on the number of clusters
5. Map and interpret clusters - draw conclusions - illustrative techniques like perceptual maps, icicle plots, and dendrograms are useful
6. Assess reliability and validity - various methods:
• repeat analysis but use different distance measure
• repeat analysis but use different clustering technique
• split the data randomly into two halves and analyze each part separately
• repeat analysis several times, deleting one variable each time
• repeat analysis several times, using a different order each time

FIGURE SHOWING THREE PAIRS OF CLUSTERS
Three pairs of clusters are obvious
• (AB)
• (DE)
• (FG)
Beyond these we can see that (AB) & (C) and (DE) are more similar to each other than to (FG).
Hence we could construct the following dendrogram (hierarchical classification).

Note that the clusters are joined (fused) at increasing levels of 'dissimilarity'.
The actual measure of dissimilarity will depend upon the method used. It may be a similarity measure or a distance measure. Distances between points can be calculated by using an extension of Pythagorus (these are euclidean distances). These measures of 'dissimilarity' can be extended to more than 2 variables (dimensions) without difficulty.
CLUSTERING ALGORITHMS
Having selected how we will measure similarity (the distance measure) we must now choose the clustering algorithm, i.e. the rules which govern between which points distances are measured to determine cluster membership. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. This is important since it tells us that although cluster analysis may provide an objective method for the clustering of cases there can be subjectivity in the choice of method. Five algorithms, available within SPSS, are described.
• AVERAGE LINKAGE CLUSTERING
• COMPLETE LINKAGE CLUSTERING
• SINGLE LINKAGE CLUSTERING
• WITHIN GROUPS CLUSTERING
• WARD'S METHOD
AVERAGE LINKAGE CLUSTERING
The dissimilarity between clusters is calculated using cluster average values; of course there are many ways of calculating an average. The most common (and recommended if there is no reason for using other methods) is UPGMA - Unweighted Pair-Groups Method Average. SPSS also provides two other methods based on averages, CENTROID and MEDIAN. Centroid, or UPGMC (Unweighted Pair-Groups Method Centroid), uses the group centroid as the average. The centroid is defined as the centre of a cloud of points. A problem with the centroid method is that some switching and reversal may take place, for example as the agglomeration proceeds some cases may need to be switched from their original clusters.
COMPLETE LINKAGE CLUSTERING
(Maximum or Furthest-Neighbour Method): The dissimilarity between 2 groups is equal to the greatest dissimilarity between a member of cluster i and a member of cluster j. This method tends to produce very tight clusters of similar cases.
SINGLE LINKAGE CLUSTERING (MINIMUM OR NEAREST-NEIGHBOUR METHOD)
The dissimilarity between 2 clusters is the minimum dissimilarity between members of the two clusters. This methods produces long chains which form loose, straggly clusters. This method has been widely used in numerical taxonomy.

WITHIN GROUPS CLUSTERING
This is similar to UPGMA except clusters are fused so that within cluster variance is minimised. This tends to produce tighter clusters than the UPGMA method.
WARD'S METHOD
Cluster membership is assessed by calculating the total sum of squared deviations from the mean of a cluster. The criterion for fusion is that it should produce the smallest possible increase in the error sum of squares.
Cluster analysis is the statistical method of partitioning a sample into homogeneous classes to produce an operational classification. Such a classification may help:
• formulate hypotheses concerning the origin of the sample, e.g. in evolution studies
• describe a sample in terms of a typology, e.g. for market analysis or administrative purposes
• predict the future behaviour of population types, e.g. in modelling economic prospects for different industry sectors
• optimize functional processes, e.g. business site locations or product design
• assist in identification, e.g. in diagnosing diseases
• measure the different effects of treatments on classes within the population, e.g. with analysis of variance
SUMMARY
The complete process of generalised hierarchical clustering can be summarised as follows:
1. Calculate the distance between all initial clusters. In most analyses initial clusters will be made up of individual cases.
2. Fuse the two most similar clusters and recalculate the distances.
3. Repeat step 2 until all cases are in one cluster.
One of the biggest problems with this Cluster Analysis is identifying the optimum number of clusters. As the fusion process continues increasingly dissimilar clusters must be fused, i.e. the classification becomes increasingly artificial. Deciding upon the optimum number of clusters is largely subjective, although looking at a graph of the level of similarity at fusion versus number of clusters may help. There will be sudden jumps in the level of similarity as dissimilar groups are fused.
KEY TERMS
• SPSS
• Tabulation
• Cross-tabulation
• ANOVA
• Discriminant analysis
• Factor analysis
• Conjoint analysis
• MDS
• Cluster analysis
REVIEW QUESTIONS
10. What do you mean by cross-tabulation?
11. Write short notes on statistical packages .
12. Explain the step wise procedure for doing Discriminant Analysis.
13. Write short notes on ANOVA.
14. Explain the application of Factor analysis in Marketing.
15. What do you mean by conjoint analysis?
16. Explain the procedure of performing Multi Dimensional Scaling.
17. What are the applications of MDS?
18. Describe the different types of cluster analysis.
19. Explain the marketing situations in which the above said tools will be used.


LESSON – 15
FACTOR ANALYSIS
OBJECTIVES
• To learn the basic concepts of Factor.
• To understand procedures of performing Factor.
• To identify the applications of factor.
STRUCTURE
 Evaluation of factor analysis
 Steps involved in conducting the factor analysis
 Process involved in factor analysis
 Output of factor analysis
 Limitation of factor analysis
INTRODUCTION TO FACTOR ANALYSIS
Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. In marketing research, there may be a large number of variables, most of which are correlated and which must be reduced to a manageable level. Relationships among sets of many interrelated variables are examined and represented in terms of a few underlying factors. For example, store image may be measured by asking respondents to evaluate stores on a series of items on a semantic differential scale. The item evaluations may then be analyzed to determine the factors underlying store image.
In analysis of variance, multiple regression, and discriminant analysis, one variable is considered as the dependent or criterion variable, and the others as independent or predictor variables. However, no such distinction is made in factor analysis. Rather, factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined.
FACTOR ANALYSIS IS USED IN THE FOLLOWING CIRCUMSTANCES
To identify underlying dimensions, or factors, that explain the correlations among a set of variables. For example, a set of lifestyle statements may be used to measure the psycho¬graphic profiles of consumers. These statements may then be factor analyzed to identify) the underlying psychographic factors, as illustrated in the department store example.
To identify a new, smaller set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis). For example, the psychographic factors identified may be used as independent variables to explaining the differences between loyal and non loyal consumers.
To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis. For example, a few of the original lifestyle statements that correlate highly with the identified factors may be used as independent variables to explain the differences between the loyal and non-loyal users.
DEFINITION
Factor analysis is a class of procedures primarily used for data reduction and summarization. Factors analysis is an interdependence technique in that an entire set of interdependent relationships is examined. Factors are defined as an underlying dimension that explains the correlation among a set of variables
EVOLUTION OF FACTOR ANALYSIS
Charles Spearman first used the factor analysis as a technique of indirect measurement. When they test human personality and intelligence, a set of questions and tests are developed for this purpose. They believe that a person gives this set of questions and tests would respond on the basis of some structure that exists in his mind. Thus, his responses would form a certain pattern. This approach is based on the assumption that the underlying structure in answering the questions would be the same in the case of different respondents
Even though it is in the field of psychology that factor analysis has it’s beginning, it has since been applied to problems in different areas including marketing. Its use has become far more frequent as a result of the introduction Specialized software packages such as SPSS, SAS.
APPLICATION OF FACTOR ANALYSIS
• It can be used in market segmentation for identifying the underlying variables on which to group the customers. New-car buyers might be grouped based on the relative empha¬sis place on economy, convenience, performance, comfort, and luxury. This might result in five segments: economy seekers, convenience seekers, performance seekers, fort seekers, and luxury seekers.
• In product research, factor analysis can be employed to determine the brand attributes that influence consumer choice. Toothpaste brands might be evaluated in terms of pro¬tection against cavities, whiteness of teeth, taste, fresh breath, and price.
• In advertising studies, factor analysis can be used to understand the media consumption habits of the target market. The users of frozen foods may be heavy viewers of cable TV, see a lot of movies, and listen to country music.
• In pricing studies, it can be used to identify the characteristics of price-sensitive con¬sumers. For example, these consumers might be methodical, economy minded, and home centered.
It can bring out the hidden or latent dimensions relevant in the relationships among product preferences. Factor analysis is typically used to study a complex product or service in order to identify the major characteristics (or factors) considered to be important by consumers of the product or service Ex: Researchers for an automobile (two wheeler) company may ask a large sample of potential buyers to report (using rating scales) the extent of their agreement or disagreement with the number of statements such as “A motor bike’s breaks are its most crucial part”, “Seats should be comfortable for two members”. Researchers apply factor analysis to such a set of data to identity, which factors –such as “safety”, “Exterior Styling”,
“Economy of operations” – are considered important by potential customers. If this information is available, it can be used to guide the overall characteristics to be designed into the product or to identify advertising themes that potential buyers would consider important.
STEPS INVOLVED IN CONDUCTING THE FACTOR ANALYSIS
FORMULATE THE PROBLEM

CONSTRUCT THE CORRELATION MATRIX

DETERMINE THE METHOD OF FACTOR ANALYSIS

DETERMINE THE NUMBER OF FACTORS

ROTATE THE FACTORS

INTERPRET THE FACTORS


CALCULATE THE FACTOR SCORES SELECT THE SURROGATE VARIABLES


DETERMINE THE MODEL FIT
STATISTICS ASSOCIATED WITH FACTOR ANALYSIS
The key statistics associated with factor analysis are as follows:
Bartlett's test of sphericit: A test statistic used to examine the hypothesis that the variables are uncorrelated in the population. In other words, the population correlation matrix is an identity matrix; each variable correlates perfectly with itself (r = I) but has no correlation with the other variables (r = 0).
Correlation matrix. A lower triangle matrix showing the simple correlations, r, between all possible pairs of variables included in the analysis. The diagonal elements, which are all I, are usually omitted.
Communality. The amount of variance a variable shares with all the other variables being considered. This is also the proportion of variance explained by the common factors.
Eigenvalue. Represents the total variance explained by each factor.
Factor loadings. Simple correlations between the variables and the factors.
Factor loading plot. A plot of the original variables using the factor loadings as coordinates.
Factor matrix. Contains the factor loadings of all the variables on all the factors extracted.
Factor scores. Composite scores estimated for each respondent on the derived factors.
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. An index used to examine the ap¬propriateness of factor analysis. High values (between .5 and 1.0) indicate factor analysis is ap¬propriate. Values below .5 imply that factor analysis may not be appropriate.
Percentage of variance. The percentage of the total variance attributed to each factor.
Residuals. The differences between the observed correlations, as given in the input correlation matrix, and the reproduced correlations, as estimated from the factor matrix.
Scree plot. A plot of the eigenvalues against the number of factors in order of extraction. We describe the uses of these statistics in the next section, in the context of the procedure for conducting factor analysis.
PROCESS INVOLVED IN FACTOR ANALYSIS
Factor analysis applies an advanced form of correlation analysis to the responses to number of statements. The purpose of this analysis is to determine if the responses to several of statements are highly correlated .If the responses to three or more statements are highly correlated, it is believed that the statements measure some factor common to all of them.
The statements in any one set are highly correlated with each other but are not highly correlated with the statements in any of the other sets.
For each set of highly correlated statements, the researchers use their own judgment to determine what the single “theme” or “factor” is that ties the statements together in the minds of the respondents. For example, regarding the automobile study mentioned above, researchers
May find high correlations among the responses to the following three statements:
Mileage per liter should be high; Maintenance cost should be low; Mileage should be consistent in all types of roads. The researcher may then make the judgment that agreement with these set of statements indicates an underlying concern with the factor of “Economy of operation”.
DETERMINE THE METHOD OF FACTOR ANALYSIS
Once it has been determined that factor analysis is an appropriate technique for analyzing data, an appropriate method must be selected. The approach used to derive the weights factor score coefficients differentiates the various methods of factor analysis. The two basic approaches are principal components analysis and common factor analysis. In principal ¬components analysis, the total variance in the data is considered. The diagonal of correlation matrix consists of unities, and full variance is brought into the factor ma¬trix. Principal components analysis is recommended when the primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis. The factors are called principal components.
In common factor analysis, the factors are estimated based only on the common vari¬ance. Communalities are inserted in the diagonal of the correlation matrix. This method is appropriate when the primary concern is to identify the underlying dimensions and the common variance is of interest. This method is also known as principal axis factoring.
Other approaches for estimating the common factors are also available. These include methods of unweighted least squares, generalized least squares, maximum likelihood, ha method, and image factoring. These methods are complex and are not recommended for experienced users
DETERMINE THE NUMBER OF FACTORS
It is possible to compute as many principal components as there are variables, but in doing so, no parsimony is gained. In order to summarize the information contained in the origi¬nal variables, a smaller number of factors should be extracted. The question is, how many? Several procedures have been suggested for determining the number of factors. These include ¬a priori determination and approaches based on eigenvalues, scree plot, percentage variance accounted for, split-half reliability, and significance tests.
A Prior Determination. Sometimes, because of prior knowledge, the researcher knows many factors to expect and thus can specify the number of factors to be extracted beforehand. The extraction of factors ceases when the desired number of factors have been extracted. Most computer programs allow the user to specify the number of factors, allow¬ing for an easy implementation of this approach.
Determination Based on Eigenvalues. In this approach, only factors with eigenvalues greater than 1.0 are retained; the other factors are not included in the model. An eigenvalue sents the amount of variance associated with the factor. Hence, only factors with a vari¬ance greater than 1.0 are included. Factors with variance less than 1.0 are no better than a sin¬gle variable, because, due to standardization, each variable has a variance of 1.0. If the number variables is less than 20, this approach will result in a conservative number of factors.
Determination Based on Scree Plot. A scree plot is a plot of the eigenvalues against the number of factors in order of extraction. The shape of the plot is used to determine the number of factors. Typically, the plot has a distinct break between the steep slope of fac¬tors with large eigenvalues and a gradual trailing off associated with the rest of the factors. This gradual trailing off is referred to as the scree. Experimental evidence indicates that point at which the scree begins denotes the true number of factors. Generally, the numberr of factors determined by a scree plot will be one or a few more than that determined by the eigenvalue criterion.
Determination Based on Percentage of Variance. In this approach the number of factors extracted is determined so that the cumulative percentage of variance extracted by the factors reaches a satisfactory level. What level of variance is satisfactory depends upon the problem. However, it is recommended that the factors extracted should account for at least o percent of the variance.
Determination Based on Split-Half Reliability. The sample is split in half and factor anlalysis is performed on each half. Only factors with high correspondence of factor load¬ings across the two subsamples are retained.
Determination Based on Significance Tests. It is possible to determine the Statistical significance of the separate eigenvalues and retain only those factors that are statistically, significant. A drawback is that with large samples (size greater than 200), many factors likely to be statistically significant, although from a practical viewpoint many of the count for only a small proportion of the total variance.
ILLUSTRATION
A manufacture of motorcycles wanted to know which motorcycle characteristics were considered very important by the customers. The company identified 100 statements that related to all characteristics of motorcycles that they believed important. 300 potential customers of motorcycles were selected on a probability basis and were asked to rate the 100 statements, five of which are listed below. They were then asked to report on a 5-point scale the extent to which they agreed or disagreed with statement.
• Breaks are the most important parts for motorcycles.
• Appearance of motorcycle should be masculine
• Mileage per liter should be high
• Maintenance cost should be low
• Mileage should be consistent in all types of roads.
This resulted in a set of data in which each of 300 individuals gave a response to each of 100 statements. For any given statement, some individuals were found to agree strongly, some were found to disagree slightly, some neither agreed nor disagreed with the statement, and so on. Thus, for each statement, there was a distribution of 300 responses on a 5-point scale.
THREE IMPORTANT MEASURES
There are three important measures used in the factor analysis.
1. Variance
2. Standardized scores of an individual’s responses
3. Correlation coefficient.
VARIANCE
A factor analysis is somewhat like regression analysis in that it tries to “best fit” factors to a scatter diagram of the data in such a way that the factors explain the variance associated with the responses to each statement.
STANDARDIZED SCORES OF AN INDIVIDUAL’S RESPONSES
To facilitate comparisons of the responses from such different scales, researchers standardize all of the answers from all of the respondents on all statements and questions.
INDIVIDUALS STANDARDIZED SCORE ON THE STATEMENT = [INDIVIDUAL’S ACTUAL RESPONSE TO THE STATEMENT] - [ MEAN OF ALL 300 RESPONSES TO THE STATEMENT]
STANDARD DEVIATION OF ALL 300 RESPONSES TO THAT STATEMENT
Thus, an individual’s standardized score is nothing more than an actual response measured in terms of the number of standard deviations (+ or -) it lies away from the mean. Therefore, each standardized score is likely to be a value somewhere in the range of +3.00 and –3.00, with +3.00 typically being equated to the “agree very strongly” response and –3.00 typically being equated to the “disagree very strongly” response.
CORRELATION COEFFICIENT
The third measure used is the correlation coefficient associated with the standardized scores of the responses to each pair of statements. The matrix of correlation coefficients, which is very important part of factor analysis.
The factor analysis searches through a large set of data to locate two or more sets of statements, which have highly correlated responses. The responses to the statements in one set will all be highly correlated with each other, but they will also be quite uncorrelated with the responses to the statements in other sets. Since the different sets of statements are relatively uncorrelated with each other, a separate and distinct factor relative to motorcycles is associated with each set.
It is already noted that Variance is one of the three important measures used in factor analysis with standardized responses to each statement used in the study. Factor analysis selects one factor at a time using procedures that “best fit” each other to the data. The first factor selected is one that fits the data in such a way that it explains more of the variance in the entire set of standardized scores than any other possible factor. Each factor selected after the first factor must be uncorrelated with the factors already selected. This process continues until the procedures cannot find additional factors that significantly reduce the unexplained variance in the standardized scores
OUTPUT OF FACTOR ANALYSIS
Here, only following six statements (or variables) for simplicity will be used to explain the output of a factor analysis
X1 = Mileage per liter should be high
X2 = Maintenance cost should be low
X3 = Mileage should be consistent in all types of roads.
X4 = Appearance of motorcycle should be masculine
X5 = multiple colors should be available
X6 = Breaks are the most important parts for motorcycles .
The results a factor analysis of these six statements will appear in the form shown in the following table which can be used to illustrate the three important outputs from a factor analysis and how they can be use to researchers.
TABLE NO: 1 TABLE SHOWING FACTOR ANALYSIS OUTPUT OF THE MOTORCYCLE STUDY
FACTORS
STATEMENT NO: X F1 F2 F3 COMMUNALITIES
X1 0.86 0.12 0.04 0.76
X2 0.84 0.18 0.10 0.75
X3 0.68 0.24 0.15 0.54
X4 0.10 0.92 0.05 0.86
X5 0.06 0.94 0.08 0.89
X6 0.12 0.14 0.89 0.83
EIGEN VALUES 1.9356 1.8540 0.8351
EIGEN VALUES/NO. OF STATEMENTS 0.3226 0.3090 0.1391
FACTOR LOADINGS
The top six rows of the table are associated with the six statements listed above. The table shows that the factor analysis has identified three factors (F1, F2, and F3) and the first three columns are associated with those factors. For example, the first factor can be written as
F1= 0.86 × 1+ 0.84 × 2 + 0.68 × 3 + 0.10 × 4 + 0.06 × 5 + 0.12 × 6
The 18 numbers located in the six rows and three columns are called as factor loadings and they are one of the 3 useful outputs obtained from a factor analysis. As shown in table, each statement has a factor loading associated with a specific factor and a specific statement is simply the correlation between that factor and that statements standardized response scores.
Thus, table 1 shows that factor 1 is highly correlated with the responses to statement 1 (0.86 correlation) and also with the response to statement 2 (0.84 correlation).
Table 1 shows that statements 1 and 2 are not highly correlated (0.12 and 0.18 respectively) with factor 2. Thus a factor loading is a measure of how well the factor “fits” the standardized responses to Statement.
NAMING THE FACTORS
From the table It is clear that factor F1 is good fit on the data from statements 1,2 and 3 ,but a poor fit on the other statements. This indicates that statements 1,2 and 3 are probably measuring the same basic attitude or value system; and it is this finding that provides the researchers- with evidence that factor exists.
By using the knowledge on the industry and the contents of statements 1,2, and 3 researchers from Motor cycle company subjectively concluded from these results that “Economy of operation” was the factor that tied these statements together in the minds of the respondents.
Researchers next want to know if the 300 respondents participating in the study mostly agreed with or disagreed with statements 1,2, and 3. To answer this question, the researchers had to look at the 300 standardized responses to each of the statements 1,2, and 3 . They found that the means of these responses to each of statements 1,2, and 3. They found that the means of these responses were +0.97, +1.32, and +1.18, respectively, for statements 1, 2 and 3, indicating that most respondents agreed with the three statements as per above discussion on “standardized scores”) .Since a majority of respondents had agreed with these statements, the researchers also concluded that the factor of “economy of operation” was important in the minds of potential Motor cycle customers.
Table also shows that F2 is good fit on statements 4 and 5 but a poor fit on the other statements. This factor is clearly measuring some thing different from statements 1,2,3 and 6. Factor F3 is good fit only on statement 6 and so it is clearly measuring something not being statements 1-5. Researchers again subjectively concluded that the factor underlying statements 4 and 5 was “Comfort” and that statement 6 was related to “safety”.
FIT BETWEEN DATA AND FACTOR
Researcher has to find how well all of the identified factors fit the data obtained from all of the respondents on any given statement. Communalities for each statement indicate the proportion of the variance in the responses to the statement which is explained by the three identified factors.
For example, Three factors explain 0.89 (89%) of the variance in all of the responses to statement 5, but only 0.54 (54%) of the variance in all of the responses to statement 3. It shows that three factors explain 75% or more of the variance associated with statements 1,2,4,5, and 6, but only about half of statement 3’s variance .Researchers can use these communalities to make a judgment about for most of the variance associated with each of the six statements in this example, the three factors fit the data quite well.
How well any given factor fits the data from all of the respondents on all of the statements can be determined by eigenvalue. There is an eigenvalue associated with each of the factors.
When a factor’s eigen value is divided by the number of statements used in factor analysis the resulting figure is the proportion of the variance in the entire set of standardized response scores which is explained by that factor ie each eigen value is divided by 6 – number of statements. For example factor F1 explains 0.3226 (32%) of the variance of the standardized response scores from all of the respondents on all six statements. By adding theses figures for the three factors ,three factors together explain 0.3226+0.3090+0.1391 = 0.7707 (77.07 %) of the variance in the entire set of response data. This figure can be used as a measure of how well, over all, the identified factors fit the data. In general, a factor analysis that accounts for 60-70% or more of the total variance can be considered a good fit to the data.
LIMITATIONS
The utility of this technique largely depends to a large extent on the judgment of the researcher. He has to make number of decisions as to how the factor analysis will come out. Even with a Given set of decisions, different results will emerge from different groups of respondents, different mixes of data as also different ways of getting data. In other words, factor analysis is unable to give a unique solution or result.
As any other method of analysis, a factor analysis will be of little use if the appropriate variable has not been measured, or if the measurements are inaccurate, or if the relationships in the data are non linear
In the view of Ongoing limitations, the exploratory nature of factor analysis becomes clear .As Thurston mentions the use of factor analysis should not be made where fundamental and fruitful concepts are already well formulated and tested. It may be used especially in those domains where basic and fruitful concepts are essentially lacking and where crucial experiments have been difficult to conceive.
SUMMARY
This chapter has overview of factor analysis in detail. Factor analysis is used to find latent variable or factor among observed variables. With factor analysis you can produce a small number of factors frame large number of variables. The induced factor can also be used for further analysis.

KEY WORDS
Factor analysis
Communalities
Factor loading
Correlation matrix
Eigen value

IMPORTANT QUESTIONS
20. What are the applications of factor analysis?
21. What is the significance of factor loading in factor analysis?
22. What do your mean by Eigen value?




lesson – 16
CONJOINT ANALYSIS
STRUCTURE
 Conjoint analysis
 Basics of conjoint analysis
 Steps involved in conjoint analysis
 Application of conjoint analysis
Conjoint analysis, also called multi attribute compositional models, is a statistical technique that originated in mathematical psychology and was developed by marketing professor Paul Green at the Wharton School of the University of Pennsylvania. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations research. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most preferred by respondents. It is used frequently in testing customer acceptance of new product designs and assessing the appeal of advertisements. It has been used in product positioning, but there are some problems with this application of the technique. Recently, new alternatives such as Genetic Algorithms have been used in market research.
THE BASICS OF CONJOINT ANALYSIS
The basics of conjoint analysis are easy to understand. It should only take about 20 minutes to introduce this topic so you can appreciate what conjoint analysis has to offer.
In order to understand conjoint analysis, let's look at a simple example. Suppose you wanted to book an airline flight and you had a choice of spending Rs.400 or Rs.700 for a ticket. If this were the only consideration then the choice is clear: the lower priced ticket is preferable. What if the only consideration in booking a flight was sitting in a regular or extra-wide seat? If seat size was the only consideration then you would probably prefer an extra-wide seat. Finally, suppose you can take either a direct flight which takes three hours or a flight that stops once and takes five hours. Virtually everyone would prefer the direct flight.
Conjoint analysis attempts to determine the relative importance consumers attach to salient attributes and the utilities they attach to the levels of attributes. This information is derived from consumers' evaluations of brands, or brand profiles composed of these attributes and their levels. The respondents are presented with stimuli that consist of combi¬nations of attribute levels. They are asked to evaluate these stimuli in terms of their desirability. Conjoint procedures attempt to assign values to the levels of each attribute, so that the resulting values or utilities attached to the stimuli match, as closely as possible, the input evaluations provided by the respondents. The underlying assumption is that any set of stimuli, such as products, brands, or stores, is evaluated as a bundle of attributes.
Conjoint Analysis is a technique that attempts to determine the relative importance consumers attach to salient attributes and the utilities they attach to the levels of attributes.
In a real purchase situation, however, consumers do not make choices based on a single attribute like comfort. Consumers examine a range of features or attributes and then make judgments or trade-offs to determine their final purchase choice. Conjoint analysis examines these trade-offs to determine the combination of attributes that will be most satisfying to the consumer. In other words, by using conjoint analysis a company can determine the optimal features for their product or service. In addition, conjoint analysis will identify the best advertising message by identifying the features that are most important in product choice.
Like multidimensional scaling, conjoint analysis relies on respondents' subjective evaluations. However, in MDS the stimuli are products or brands. In conjoint analysis, the stimuli are combinations-of attribute levels determined by the researcher. The goal in MDS is to develop a spatial map depicting the stimuli in a multidimensional perceptual or pref¬erence space. Conjoint analysis, on the other hand, seeks to develop the part-worth or util¬ity functions describing the utility consumers attach to the levels of each attribute. The two techniques are complementary.
In sum, the value of conjoint analysis is that it predicts what products or services people will choose and assesses the weight people give to various factors that underlie their decisions. As such, it is one of the most powerful, versatile and strategically important research techniques available.
STATISTICS AND TERMS ASSOCIATED WITH CONJOINT ANALYSIS
The important statistics and terms associated with conjoint analysis include:
• Part-worth junctions. Also called utility functions, these describe the utility consumers attach to the levels of each attribute.
• Relative importance weights. Indicate which attributes are important in influencing consumer choice. These weights are estimated.
• Attribute levels. Denote the values assumed by the attributes.
• Full profiles. Full profiles or complete profiles of brands are constructed in terms of all the ai¬tributes by using the attribute levels specified by the design.
• Pairwise tables. The respondents evaluate two attributes at a time until all the required pairs of attributes have been evaluated.
• Cyclical designs. Designs employed to reduce the number of paired comparisons.
• Fractional factorial designs: Designs employed to reduce the number of stimulus profiles to be evaluated in the full-profile approach.
• Orthogonal arrays. A special class of fractional designs that enable the efficient estimation of all main effects.
• Internal validity: This involves correlations of the predicted evaluations for the holdout or validation stimuli with those obtained from the respondents.
CONDUCTING CONJOINT ANALYSIS
The following chart lists the steps in conjoint analysis. Formulating the problem involves identify¬ing the salient attributes and their levels. These attributes and levels are used for construct¬ing the stimuli to be used in a conjoint evaluation task.
FORMULATE THE PROBLEM

CONSTRUCT THE STIMULI

DECIDE ON THE FORM OF INPUT DATA

SELECT A CONJOINT ANALYSIS PROCEDURE

INTERPRET THE RESULTS

ASSESS RELIABILITY AND VALIDITY
STEPS INVOLVED IN CONJOINT ANALYSIS
The Basic steps are:
• Select features to be tested
• Show product feature combinations to potential customers .
• respondents rank, rate, or choose between the combinations
• Input the data from a representative sample of potential customers into a statistical software program and choose the conjoint analysis procedure. The software will produce utility functions for each of the features.
• Incorporate the most preferred features into a new product or advertisement.
Any number of algorithms may be used to estimate utility functions. The original methods were monotonic analysis of variance or linear programming techniques, but these are largely obsolete in contemporary marketing research practice. Far more popular are Hierarchical Bayesian procedures that operate on choice data. These utility functions indicate the perceived value of the feature and how sensitive consumer perceptions and preferences are to changes in product features.
A PRACTICAL EXAMPLE OF CONJOINT ANALYSIS
Conjoint analysis presents choice alternatives between products/services defined by sets of attributes. This is illustrated by the following choice: would you prefer a flight with regular seats, that costs Rs.400 and takes 5 hours, or a flight which costs Rs.700, has extra-wide seats and takes 3 hours?
Extending this, we see that if seat comfort, price and duration are the only relevant attributes, there are potentially eight flight choices.
CHOICE SEAT COMFORT PRICE DURATION
1 EXTRA-WIDE RS700 5 HOURS
2 EXTRA-WIDE RS700 3 HOURS
3 EXTRA-WIDE RS400 5 HOURS
4 EXTRA-WIDE RS400 3 HOURS
5 REGULAR RS700 5 HOURS
6 REGULAR RS700 3 HOURS
7 REGULAR RS400 5 HOURS
8 REGULAR RS400 3 HOURS
Given the above alternatives, product 4 is very likely the most preferred choice while product 5 is probably the least preferred product. The preference for the other choices is determined by what is important to that individual.
Conjoint analysis can be used to determine the relative importance of each attribute, attribute level, and combinations of attributes. If the most preferable product is not feasible for some reason (perhaps the airline simply cannot provide extra-wide seats and a 3 hour arrival time at a price of Rs400) then the conjoint analysis will identify the next most preferred alternative. If you have other information on travelers, such as background demographics, you might be able to identify market segments for which distinct products may be appealing. For example, the business traveller and the vacation traveller may have very different preferences which could be met by distinct flight offerings.
You can now see the value of conjoint analysis. Conjoint analysis allows the researcher to examine the trade-offs that people make in purchasing a product. This allows the researcher to design products/services that will be most appealing to a specific market. In addition, because conjoint analysis identifies important attributes, it can be used to create advertising messages that will be most persuasive.
In evaluating products, consumers will always make trade-offs. A traveller may like the comfort and arrival time of a particular flight, but reject purchase due to the cost. In this case, cost has a high utility value. Utility can be defined as a number which represents the value that consumers place on an attribute. In other words, it represents the relative "worth" of the attribute. A low utility indicates less value; a high utility indicates more value.
The following figure presents a list of hypothetical utilities for an individual consumer:
DURATION UTILITY
3 HOURS 42
5 HOURS 22

COMFORT UTILITY
EXTRA-WIDE SEATS 15
REGULAR SEATS 12

COST UTILITY
RS. 400 61
RS. 700 5
Based on these utilities, we can make the following conclusions:
• This consumer places a greater value on a 3 hour flight (the utility is 42) than on a 5 hour flight (utility is 22).
This consumer does not differ much in the value that he or she places on comfort. That is, the utilities are quite close (12 vs. 15).
• This consumer places a much higher value on a price of Rs.400 than a price of Rs.700.
• The preceding example depicts an individual's utilities. Average utilities can be calculated for all consumers or for specific subgroups of consumers.
These utilities also tell us the extent to which each of these attributes drives the decision to choose a particular flight. The importance of an attribute can be calculated by examining the range of utilities (that is, the difference between the lowest and highest utilities) across all levels of the attribute. That range represents the maximum impact that the attribute can contribute to a product.
Using the hypothetical utilities presented earlier, we can calculate the relative importance of each of the three attributes. The range for each attribute is given below:
• Duration: Range = 20 (42-22)
• Comfort : Range = 3 (15-12)
• Cost : Range = 56 (61-5)
These ranges tell us the relative importance of each attribute. Cost is the most important factor in product purchase as it has the highest range of utility values. Cost is followed in importance by the duration of the flight. Based on the range and value of the utilities, we can see that seat comfort is relatively unimportant to this consumer. Therefore, advertising which emphasizes seat comfort would be ineffective. This person will make his or her purchase choice based mainly on cost and then on the duration of the flight.
Marketers can use the information from utility values to design products and/or services which come closest to satisfying important consumer segments. Conjoint analysis will identify the relative contributions of each feature to the choice process. This technique, therefore, can be used to identify market opportunities by exploring the potential of product feature combinations that are not currently available.
CHOICE SIMULATIONS
In addition to providing information on the importance of product features, conjoint analysis provides the opportunity to conduct computer choice simulations. Choice simulations reveal consumer preference for specific products defined by the researcher. In this case, simulations will identify successful and unsuccessful flight packages before they are introduced to the market!
For example, let's say that the researcher defined three flights as follows:
FLIGHT 1: RS.300 5 HOURS 2 STOPS MEAL
FLIGHT 2: RS.400 4 HOURS 1 STOP SNACK
FLIGHT 3: RS.500 3 HOURS DIRECT NO MEAL
The conjoint simulation will indicate the percentage of consumers that prefer each of the three flights. The simulation might show that consumers are willing to travel longer if they can pay less and are provided a meal. Simulations allow the researcher to estimate preference, sales and share for new flights before they come to market.
Simulations can be done interactively on a microcomputer to quickly and easily look at all possible options. The researcher may, for example, want to determine if a price change of Rs50, Rs100, or Rs150 will influence consumer's choice. Also, conjoint will let the researcher look at interactions among attributes. For example, consumers may be willing to pay Rs50 more for a flight on the condition that they are provided with a hot meal rather than a snack.
DATA COLLECTION
Respondents are shown a set of products, prototypes, mock-ups or pictures. Each example is similar enough that consumers will see them as close substitutes, but dissimilar enough that respondents can clearly determine a preference. Each example is composed of a unique combination of product features. The data may consist of individual ratings, rank-orders, or preferences among alternative combinations. The latter is referred to as "choice based conjoint" or "discrete choice analysis."
In order to conduct a conjoint analysis, information must be collected from a sample of consumers. This data can be conveniently collected in locations such as shopping centers or by the Internet. In the previous example, data collection could take place at a booth located in an airport or in the office of a travel agent.
A sample size of 400 is generally sufficient to provide reliable data for consumer products or services. Data collection involves showing respondents a series of cards that contain a written description of the product or service. If a consumer product is being tested then a picture of the product can be included along with a written description. A typical card examining the business traveller might look like the following:
"ON YOUR NEXT BUSINESS FLIGHT OVERSEAS, HOW LIKELY WOULD YOU BE TO CHOOSE A FLIGHT THAT HAS ALL THE FOLLOWING CHARACTERISTICS? PLEASE CIRCLE THE APPROPRIATE NUMBER FROM 1 TO 10 TO INDICATE YOUR FEELINGS."
• ONE STOP EN ROUTE
• EXTRA-WIDE SEATS
• DEPARTURE TIME: BEFORE 8:00 AM
• "DOUBLE" MILEAGE POINTS
• RS200 FEE TO CHANGE TICKET

WOULD NEVER CHOOSE
THIS FLIGHT WOULD DEFINITELY CHOOSE
THIS FLIGHT
1 2 3 4 5 6 7 8 9 10

Readers might be worried at this point about the total number of cards that need to be rated by a single respondent. Fortunately, we are able to use statistical manipulations to cut down on the number of cards. In a typical conjoint study, respondents only need to rate between 10-20 cards.
This data would be input to the conjoint analysis. Utilities can then be calculated and simulations can be performed to identify which products will be successful and which should be changed. Price simulations can also be conducted to determine sensitivity of the consumer to changes in prices.
A wide variety of companies and service organizations have successfully used conjoint analysis
A conjoint analysis was developed using a number of attributes such as saving on energy bills, efficiency rating of equipment, safety record of energy source, and dependability of energy source. The conjoint analysis identified that cost savings and efficiency were the main reasons for converting appliances to gas. The third most important reason was cleanliness of energy source. This information was used in marketing campaigns in order to have the greatest effect.
A natural gas utility used conjoint analysis to evaluate which advertising message would be most effective in convincing consumers to switch from other energies to natural gas. Previous research failed to discover customer's specific priorities - it appeared that the trade-offs that people made were quite subtle.
ADVANTAGES
• able to use physical objects
• measures preferences at the individual level
• estimates psychological tradeoffs that consumers make when evaluating several attributes together
DISADVANTAGES
• Only a limited set of features can be used because the number of combinations increases very quickly as more features are added.
• Information gathering stage is complex.
• Difficult to use for product positioning research because there is no procedure for converting perceptions about actual features to perceptions about a reduced set of underlying features
• respondents are unable to articulate attitudes toward new categories
APPLICATIONS OF CONJOINT ANALYSIS
Conjoint analysis has been used in marketing for a variety of purposes, including:
Determining the relative importance of attributes in the consumer choice process. A stan¬dard output from conjoint analysis consists of derived relative importance weights for all the attributes used to construct the stimuli used in the evaluation task. The relative im¬portance weights indicate which attributes are important in influencing consumer choice.

Estimating market share of brands that differ in attribute levels. The utilities derived from conjoint analysis can be used as input into a choice simulator to determine the share of choices, and hence the market share, of different brands.
Determining the composition of the most-preferred brand. The brand features can be var¬ied in terms of attribute levels and the corresponding utilities determined. The brand fea¬tures that yield the highest utility indicate the composition of the most-preferred brand.
Segmenting the market based on similarity of preferences for attribute levels. The part worth functions derived for the attributes may be used as a basis for clustering respon¬dents to arrive at homogenous preference segments.
Applications of conjoint analysis have been made in consumer goods, industrial goods, financial and other services. Moreover, these applications have spanned all areas of marketing. A recent survey of conjoint analysis reported applications in the areas of new¬ product/concept identification, competitive analysis, pricing, market segmentation, adver¬tising, and distribution.
SUMMARY
This chapter has given over view conjoint analysis in detail. Conjoint analysis, also called multiattribute compositional models. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations research
KEY TERMS
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy
• Residuals
• Scree plot
• Conjoint analysis
IMPORTANT QUESTIONS
1. What are the applications of conjoint analysis?
2. Explain the procedure of performing Conjoint analysis with one practical example.
REFERENCE BOOKS
23. Robert Ferber, Marketing Research, New York: McGraw Hill, Inc. 1948.
24. Dennis, Child, The Essential of Factor Analysis, New York, 1973.
25. Cooley, William. W., and Lohnes, Paul R., Multivariate Data Analysis, New York: John Wiley and Sons. 1971.





LESSON – 17
STATISTICAL SOFTWARE
OBJECTIVES
• To learn the application of various statistical packages used for the management research process
• To understand the procedures for performing the tests using SPSS
STRUCTURE
 Statistical packages
 Statistical analysis using SPSS
 t-test, F-test, chi-square test, Anova
 Factor analysis
STATISTICAL PACKAGES
The following statistical software packages are widely used:
• STATA,
• SPSS,
• SAS.
STATA
Stata, created in 1985 by Statacorp, is a statistical program used by many businesses and academic institutions around the world. Most of its users work in research, especially in the fields of economics, sociology, political science, and epidemiology.
Stata's full range of capabilities includes:
• Data management
• Statistical analysis
• Graphics
• Simulations
• Custom programming
SPSS
The computer program SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968, and is among the most widely used programs for statistical analysis in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, and others. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation are features of the base software.
The many features of SPSS are accessible via pull-down menus (see image) or can be programmed with a proprietary 4GL "command syntax language". Command syntax programming has the benefits of reproducibility and handling complex data manipulations and analyses
Solve business and research problems using SPSS for Windows, a statistical and data management package for analysts and researchers.
SPSS for Windows provides you with a broad range of capabilities for the entire analytical process. With SPSS, you can generate decision-making information quickly using powerful statistics, understand and effectively present the results with high-quality tabular and graphical output, and share the results with others using a variety of reporting methods, including secure Web publishing. Results from the data analysis enable you to make smarter decisions more quickly by uncovering key facts, patterns, and trends. An optional server version delivers enterprise-strength scalability, additional tools, security, and enhanced performance
SPSS can be used for Windows in a variety of areas, including:
• Survey and market research and direct marketing
• Academia
• Administrative research
• Medical, scientific, clinical, and social science research
• Planning and forecasting
• Quality improvement
• Reporting and ad hoc decision making
• Enterprise-level analytic application development
In particular, apply SPSS statistics software to gain greater insight into the actions, attributes, and attitudes of people—the customers, employees, students, or citizens.
ADD MORE FUNCTIONALITY AS YOU NEED IT
SPSS for Windows is a modular, tightly integrated, full-featured product line for the analytical process—planning, data collecting, data access, data management and preparation, data analysis, reporting, and deployment. Using a combination of add-on modules and stand-alone software that work seamlessly with SPSS Base enhances the capabilities of this statistics software. The intuitive interface makes it easy to use—yet it gives you all of the data management, statistics, and reporting methods you need to do a wide range of analysis.
GAIN UNLIMITED PROGRAMMING CAPABILITIES
Dramatically increase the power and capabilities of SPSS for Windows by using the SPSS Programmability Extension. This feature enables analytic and application developers to extend the SPSS command syntax language to create procedures and applications—and perform even the most complex jobs—within SPSS. The SPSS Programmability Extension is included with SPSS Base, making this statistics software an even more powerful solution.
MAXIMIZE MARKET OPPORTUNITIES
The more competitive and challenging the business environment, the more you need market research. Market research is the systematic and objective gathering, analysis, and interpretation of information. It helps the organization identify problems and opportunities and allows for better-informed, lower-risk decisions.
For decades, solutions from SPSS Inc. have added value for those involved in market research. SPSS solutions support the efficient gathering of market research information through many different methods, and make it easier to analyze and interpret this information and provide it to decision makers.
We offer solutions both to companies that specialize in providing market research services and to organizations that conduct their own market research.
SPSS market research solutions help you:
• Understand the market perception of the brand
• Conduct effective category management
• Confidently develop product features
• Perform competitive analysis
With this insight, you or the clients can confidently make decisions about developing and marketing the products and enhancing the brand.
The heart of SPSS’ market research solution is dimensions product family. Through Dimensions, the organization can centralize the creation and fielding of surveys in any mode and in any language, as well as the analysis and reporting phases of the research.
Dimensions data can be directly accessed using SPSS for Windows, which enables the analysts to use SPSS’ advanced statistical and graphing capabilities to explore the survey data. Add-on modules and integrated stand-alone products extend SPSS’ analytical and reporting capabilities. For example, analyze responses to open-ended survey questions with SPSS Text Analysis for Surveys.
Maximize the value the organization receives from its Dimensions data by using an enterprise feedback management (EFM) solution from SPSS. EFM provides you with a continuous means of incorporating regular customer insight into the business operations. Engage with current or prospective customers through targeted feedback programs or by asking questions during naturally occurring events. Then use the resulting insights to drive business improvement across the organization. SPSS’ EFM solution also enables you to integrate the survey data with transactional and operational data, so you gain a more accurate, complete understanding of customer preferences, motivations, and intentions.
Thanks to the integration among SPSS offerings, you can incorporate insights gained through survey research in the predictive models created by the data mining tools. You can then deploy predictive insight and recommendations to people and to automated systems through any of the predictive analytics applications.
SAS
The SAS System, originally Statistical Analysis System, is an integrated system of software products provided by SAS Institute that enables the programmer to perform:
• Data entry, retrieval, management, and mining
• Report writing and graphics
• Statistical and mathematical analysis
• Business planning, forecasting, and decision support
• Operations research and project management
• Quality improvement
• Applications development
• Warehousing (extract, transform, load)
• Platform independent and remote computing
In addition, the SAS System integrates with many SAS business solutions that enable large scale software solutions for areas such as human resource management, financial management, business intelligence, customer relationship management and more.
STATISTICAL ANALYSES USING SPSS
INTRODUCTION
This section shows how to perform a number of statistical tests using SPSS. Each section gives a brief description of the aim of the statistical test, when it is used, an example showing the SPSS commands and SPSS (often abbreviated) output with a brief interpretation of the output. In deciding which test is appropriate to use, it is important to consider the type of variables that you have (i.e., whether your variables are categorical, ordinal or interval and whether they are normally distributed).
STATISTICAL METHODS USING SPSS
ONE SAMPLE T-TEST
A one sample t-test allows us to test whether a sample mean (of a normally distributed interval variable) significantly differs from a hypothesized value. For example, using the data file, say we wish to test whether the average writing score (write) differs significantly from 50. We can do this as shown below.
t-test
/testval = 50
/variable = write.


The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly different from the test value of 50. We would conclude that this group of students has a significantly higher mean on the writing test than 50.
ONE SAMPLE MEDIAN TEST
A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. We will use the same variable, write, as we did in the one sample t-test example above, but we do not need to assume that it is interval and normally distributed (we only need to assume that write is an ordinal variable). However, we are unaware of how to perform this test in SPSS.
BINOMIAL TEST
A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value. For example, using the, say we wish to test whether the proportion of females (female) differs significantly from 50%, i.e., from .5. We can do this as shown below.
npar tests
/binomial (.5) = female.

The results indicate that there is no statistically significant difference (p = .229). In other words, the proportion of females in this sample does not significantly differ from the hypothesized value of 50%.
CHI-SQUARE GOODNESS OF FIT
A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. For example, let's suppose that we believe that the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions.
npar test
/chisquare = race
/expected = 10 10 10 70.


These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.029, p = .170).
TWO INDEPENDENT SAMPLES T-TEST
An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, using the, say we wish to test whether the mean for write is the same for males and females.
t-test groups = female(0 1)
/variables = write.

The results indicate that there is a statistically significant difference between the mean writing score for males and females (t = -3.734, p = .000). In other words, females have a statistically significantly higher mean score on writing (54.99) than males (50.12).
CHI-SQUARE TEST
A chi-square test is used when you want to see if there is a relationship between two categorical variables. In SPSS, the chi2 option is used with the tabulate command to obtain the test statistic and its associated p-value. Let’s see if there is a relationship between the type of school attended (schtyp) and students' gender (female). Remember that the chi-square test assumes that the expected value for each cell is five or higher. This assumption is easily met in the examples below. However, if this assumption is not met in your data, please see the section on Fisher's exact test below.


These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom = 0.047, p = 0.828).
Let's look at another example, this time looking at the linear relationship between gender (female) and socio-economic status (ses). The point of this example is that one (or both) variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels (male and female) and ses has three levels (low, medium and high).


Again we find that there is no statistically significant relationship between the variables (chi-square with two degrees of freedom = 4.577, p = 0.101).
FISHER'S EXACT TEST
The Fisher's exact test is used when you want to conduct a chi-square test but one or more of your cells has an expected frequency of five or less. Remember that the chi-square test assumes that each cell has an expected frequency of five or more, but the Fisher's exact test has no such assumption and can be used regardless of how small the expected frequency is. In SPSS unless you have the SPSS Exact Test Module, you can only perform a Fisher's exact test on a 2x2 table, and these results are presented by default. Please see the results from the chi squared example above.
ONE-WAY ANOVA
A one-way analysis of variance (ANOVA) is used when you have a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable and you wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable. For example, using the data file, say we wish to test whether the mean of write differs between the three program types (prog).

The mean of the dependent variable differs significantly among the levels of program type. However, we do not know if the difference is between only two of the levels or all three of the levels. (The F test for the Model is the same as the F test for prog because prog was the only variable entered into the model. If other variables had also been entered, the F test for the Model would have been different from prog.) To see the mean of write for each level of program type,

From this we can see that the students in the academic program have the highest mean writing score, while students in the vocational program have the lowest.
DISCRIMINANT ANALYSIS
Discriminant analysis is used when you have one or more normally distributed interval independent variables and a categorical dependent variable. It is a multivariate technique that considers the latent dimensions in the independent variables for predicting group membership in the categorical dependent variable.



Clearly, the SPSS output for this procedure is quite lengthy, and it is beyond the scope of this page to explain all of it. However, the main point is that two canonical variables are identified by the analysis, the first of which seems to be more related to program type than the second.
FACTOR ANALYSIS
Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables. All variables involved in the factor analysis need to be interval and are assumed to be normally distributed. The goal of the analysis is to try to identify factors which underlie the variables. There may be fewer factors than variables, but there may not be more factors than variables. For our example, let's suppose that we think that there are some common factors underlying the various test scores. We will include subcommands for varimax rotation and a plot of the eigenvalues. We will use a principal components extraction and will retain two factors. (Using these options will make our results compatible with those from SAS and Stata and are not necessarily the options that you will want to use.)

Communality (which is the opposite of uniqueness) is the proportion of variance of the variable (i.e., read) that is accounted for by all of the factors taken together, and a very low communality can indicate that a variable may not belong with any of the factors. The scree plot may be useful in determining how many factors to retain. From the component matrix table, we can see that all five of the test scores load onto the first factor, while all five tend to load not so heavily on the second factor. The purpose of rotating the factors is to get the variables to load either very high or very low on each factor. In this example, because all of the variables loaded onto factor 1 and not on factor 2, the rotation did not aid in the interpretation. Instead, it made the results even more difficult to interpret.
SUMMARY
The statistical packages applied for the management research process are SPSS, SAS and STATA. This software makes the research process effective. It reduces the time of doing analysis. Large data also can be easily analyzed using these softwares.
The chapter also given the detailed procedures and interpretation of SPSS output for statistical tests.
KEY TERMS
• SPSS
• SAS
• STATA
• Dependent variable
• Independent variable
REVIEW QUESTIONS
1. Define SPSS
2. What do you mean by STATA and SAS
3. List out the application of statistical software to the market research
4. Define dependent variable
5. Define independent variable
6. Differentiate the relative frequency and cumulative frequency with suitable

LESSON – 18
APPLICATION OF RESEARCH TOOLS
OBJECTIVES
• To identify the areas of management in which Research tools can be used
• To understand the application of various research tools in the following domains of Management
Marketing Management
Operations Management
Human Resources Management
STRUCTURE
 Application of marketing research
 Concept of market potential
 Techniques of perceptual mapping
 Limitation of Marketing Research
 Methods of Forecasting
 Statistical Methods
INTRODUCTION
Research Methodology is becoming a necessary tool for its application of all functional areas of Management such as Marketing, Human Resources and operations Management. There is an increasing realisation of the importance of Research methodology in various quarters. This is reflected in the increasing use of research methodology in various domains of Management. Here, a brief description of typical application of Research methodology is given below:
APPLICATIONS OF MARKETING RESEARCH
Applications of marketing research can be divided into two broad areas:
1. Strategic
2. Tactical
Among the strategic areas, marketing research applications would be demand forecasting, sales forecasting, segmentation studies, identification of target markets for a given product, and positioning strategies identification.
In the second area of tactical applications, we would have applications such as product testing, pricing research, advertising research, promotional research, distribution and logistics related research. In other words, it would include research related to all the 'P's of marketing: how much to price the product, how to distribute it, whether to package it in one way or another, what time to offer a service, consumer satisfaction with respect to the different elements of the marketing mix (product, price, promotion, distribution), and so on. In general, we would find more tactical applications than strategic applications because these areas can be fine-tuned more easily, based on the marketing research findings. Obviously, strategic changes are likely to be fewer than tactical changes. Therefore, the need for information would be in proportion to the frequency of changes.
The following list is a snapshot of the kind of studies that have actually been done in India.
A study of consumer buying habits for detergents-frequency, pack size, effect of promotions, brand loyalty and so forth.
To find out the potential demand for ready-to-eat chapattis in Mumbai city.
To determine which of the three proposed ingredients-tulsi, coconut oil or neem, the consumer would like to have in a toilet soap
To find out what factors would affect the sales of Flue Gas Desulphurization equipment (industrial pollution control equipment)
To find out the effectiveness of the advertising campaign for a car brand
To determine brand awareness and brand loyalty for a branded PC (Personal Computer)
To determine appropriate product mix, price level, and target market for a new restaurant
To find the customer satisfaction level among consumers of an Internet service provider
To determine factors which influenced consumers in choosing a brand of cellular phone hand-set
To find out the TV viewing preferences of the target audience in specific time slots in early and late evenings
As the list shows, marketing research tackles a wide variety of subjects. The list is only indicative, and the applications of marketing research in reality can be useful for almost any major decision related to marketing. The next sections discuss some typical application areas.
CONCEPT RESEARCH
During a new product launch, there would be several stages-for example, concept development, concept testing, prototype development and testing, test marketing in a designated city or region, estimation of total market size based on the test marketing, and then a national rollout or withdrawal of the product based on the results.
The first stage is the development of a concept and its testing. The concept for a new product may come from several sources-the idea may be from a brain-storming session consisting of company employees, a focus group conducted among consumers, or the brainwave of a top executive. Whatever may be its source, it is generally researched further through what is termed as concept testing, before it goes into prototype or product development stages.
A concept test takes the form of developing a description of the product, its benefits, how to use it, and so on, in about a paragraph, and then asking potential consumers to rate how much they like the concept, how much they would be willing to pay for the product if introduced, and similar questions. As an example, the concept statement for a fabric softener may read as follows:
This fabric softener cum whitener is to be added to the wash cycle in a machine or to the bucket of detergent in which clothes are soaked. Only a few drops of this liquid will be needed per wash to whiten white clothes and also soften them by eliminating static charge. It will be particularly useful for woollens, undergarments and baby's or children’s clothes. It will have a fresh fragrance, and will be sold in handy 200 ml, bottles to last about a month. It can also replace all existing 'blues 'with the added benefit of a softener.
This statement can be used to survey existing customers of 'blues' and whiteners, and we could ask customers for their reactions on pack size, pricing, colour of the liquid, ease of use, and whether or not they would buy such a product. More complex concept tests can be done using Conjoint Analysis where specific levels of price or product/service features to be offered are pre-determined and reactions of consumers are in the form of ratings given to each product concept combining various features. This is then used to make predictions about which product concepts would provide the highest utility to the consumer, and to estimate market shares of each concept. The technique of Conjoint Analysis is discussed with an example in Part II of the book.
PRODUCT RESEARCH
Apart from product concepts, research helps to identify which alternative packaging is most preferred, or what drives a consumer to buy a brand or product category itself, and specifics of satisfaction or dissatisfaction with elements of a product. These days, service elements are as important as product features, because competition is bringing most products on par with each other.
An example of product research would be to find out the reactions of consumers to manual cameras versus automatic cameras. In addition to specific likes or dislikes for each product category, brand preferences within the category could form a part of the research. The objectives may be to find out what type of camera to launch and how strong the brand salience for the sponsor's brand is.
Another example of product research could be to find out from existing users of photocopiers (both commercial and corporate), whether after-sales service is satisfactory, whether spare parts are reasonably priced, and easily available, and any other service improvement ideas-for instance, service contracts, leasing options or buy-backs and trade-ins.
The scope of product research is immense, and includes products or brands at various stages of the product life cycle-introduction, growth, maturity, and decline. One particularly interesting category of research is into the subject of brand positioning. The most commonly used technique for brand positioning studies (though not the only one) is called Multidimensional Scaling. This is covered in more detail with an example and case studies in Part II as a separate chapter.
PRICING RESEARCH
Pricing is an important part of the marketing plan. In the late nineties in India, some interesting changes have been tried by marketers of various goods and services. Newer varieties of discounting practices including buy-backs, exchange offers, and straight discounts have been offered by many consumer durable manufacturers-notably AKAI and AIWA brands of TVs. Most FMCG (fast moving consumer goods) manufacturers/marketers of toothpaste, toothbrush, toilet soap, talcum powder have offered a variety of price-offs or premium-based offers which affect the effective consumer price of a product.
Pricing research can delve into questions such as appropriate pricing levels from the customers' point of view, or the dealer's point of view. It could try to find out how the current price of a product is perceived, whether it is a barrier for purchase, how a brand is perceived with respect to its price and relative to other brands' prices (price positioning). Here, it is worth remembering that price has a functional role as well as a psychological role. For instance, a high price may be an indicator of high quality or high esteem value for certain customer segments. Therefore, questions regarding price may need careful framing, and careful interpretation during the analysis.
Associating price with value is a delicate task, which may require indirect methods of research at times. A bland question such as
"Do you think the price of Brand A of refrigerators is appropriate?" mayor may not elicit true responses from customers. It is also not easy for a customer to articulate the price he would be willing to pay for convenience of use, easy product availability, good after-sales service, and other elements of the marketing mix. It may require experience of several pricing-related studies before one begins to appreciate the nuances of consumer behaviour related to price as a functional and psychological measure of the value of a product offering.
An interesting area of research into pricing has been determining price elasticity at various price points for a given brand through experiments or simulations. Price framing, or what the consumer compares (frames) price against, is another area of research. For example, one consumer may compare the price of a car against an expensive two-wheeler (his frame of reference), whereas another may compare it with an investment in the stock market or real estate. Another example might be the interest earned from a fixed deposit, which serves as a benchmark for one person before he decides to invest in a mutual fund, whereas for another, the investment may be a substitute for buying gold, which earns no interest. In many cases, therefore, it is the frame of reference used by the customer which determines 'value' for him of a given product. There are tangible as well as intangible (and sometimes not discernible) aspects to a consumer's evaluation of price. Some of the case studies at the end of Part I include pricing or price-related issues as part of the case.
DISTRIBUTION RESEARCH
Traditionally, most marketing research focuses on consumers or buyers. Sometimes this extends to potential buyers or those who were buyers but have switched to other brands. But right now, there is a renewed interest in the entire area of logistics, supply chain, and customer service at dealer locations. There is also increasing standardisation from the point of view of brand building, in displays at the retail level, promotions done at the distribution points. Distribution research focuses on various issues related to the distribution of products including service levels provided by current channels, frequency of salespeople visits to distribution points, routing/transport related issues for deliveries to and from distribution points throughout the channel, testing of new channels, channel displays, linkages between displays and sales performance, and so on. As an example, a biscuit manufacturer wanted to know how it could increase sales of a particular brand of biscuits in cinema theatres. Should it use existing concessionaires selling assorted goods in theatres, or work out some exclusive arrangements? Similarly, a soft drink manufacturer may want to know where to set up vending machines. Potential sites could include roadside stalls, shopping malls, educational institutions, and cinema theatres. Research would help identify factors that would make a particular location a success.
In many service businesses where a customer has to visit the location, it becomes very important to research the location itself. For example, a big hotel or a specialty restaurant may want to know where to locate themselves for better visibility and occupancy rates. Distribution research helps answer many of these questions and thereby make better marketing decisions.
ADVERTISING RESEARCH
The two major categories of research in advertising are:
1. Copy
2. Media
COPY TESTING
This is a broad term that includes research into all aspects of advertising-brand awareness, brand recall, copy recall (at various time periods such as day after recall, week after recall), recall of different parts of the advertisement such as the headline for print ads, slogan or jingle for TV ads, the star in an endorsement and so on. Other applications include testing alternative ad copies (copy is the name given to text or words used in the advertisement, and the person in the advertising agency responsible for writing the words is known as the copy writer) for a single ad, alternative layouts (a layout is the way all the elements of the advertisement are laid out in a print advertisement) with the same copy, testing of concepts or storyboards (a storyboard is a scene-by-scene drawing of a TV commercial which is like a rough version before the ad is actually shot on film) of TV commercials to test for positive/negative reactions, and many others. Some of these applications appear in our discussion of Analysis of Variance (ANOVA) in Part II and some case studies elsewhere in the book.
A particular class of advertising research is known as Tracking Studies. When an advertising campaign is running, periodic sample surveys known as tracking studies can be conducted to evaluate the effect of the campaign over a long period of time such as six months or one year, or even longer. This may allow marketers to alter the advertising theme, content, media selection or frequency of airing / releasing advertisements and evaluate the effects. As opposed to a snapshot provided by a one-time survey, tracking studies may provide a continuous or near-continuous monitoring mechanism. But here, one should be careful in assessing the impact of the advertising on sales, because other factors could change along with time. For example, the marketing programmes of the sponsor and the competitors could vary over time. The impact on sales could be due to the combined effect of several factors.
MEDIA RESEARCH
The major activity under this category is research into viewership of specific television programmes on various TV channels. There are specialised agencies like A.C. Nielsen wordwide which offer viewership data on a syndicated basis (i.e., to anyone who wants to buy the data). In India, both ORG-MARG and IMRB offer this service. They provide peoplemeter data with brand names of TAM and INTAM which is used by advertising agencies when they draw up media plans for their clients. Research could also focus on print media and their readership. Here again, readership surveys such as the National Readership Survey (NRS) and the Indian Readership Survey (IRS) provide syndicated readership data. These surveys are now conducted almost on a continuous basis in India and are helpful to find out circulation and readership figures of major print media. ABC (Audit Bureau of Circulations) is an autonomous body which provides audited figures on the paid circulation (number of copies printed and sold) of each newspaper and magazine, which is a member of ABC.
Media research can also focus on demographic details of people reached by each medium, and also attempt to correlate consumption habits of these groups with their media preferences. Advertising research is used at all stages of advertising, from conception to release of ads, and thereafter to measure advertising effectiveness based on various parameters. It is a very important area of research for brands that rely a lot on advertising. The top rated programmes in India are usually cricket matches and film based programmes.
SALES ANALYSIS BY PRODUCT
Sales analysis by product will enable a company to identify its strong or weak products. It is advisable to undertake an analysis on the basis of a detailed break-up of •products such as product variation by size, colour, etc. This is because if an analysis is based on a broad break-up, it may not reveal important variations.
When a company finds that a particular product is doing poorly, two options are open to it. One is, it may concentrate on that product to ensure improved sales. Or alternatively, it may gradually withdraw the product and eventually drop it altogether. However, it is advisable to decide on the latter course on the basis of additional information such as trends in the market share, contribution margin, effect of sales volume on product profitability, etc. In case the product in question has complementarity with other items sold by the company, the decision to abandon the product must be made with care and caution.
Combining sales analysis by product with that by territory will further help in providing information on which products are doing better in which areas.
SALES ANALYSIS BY CUSTOMERS
Another way to analyse sales data is by customers. Such an analysis would normally indicate that a relatively small number of customers accounts for a large proportion of sales. To put it differently: a large percentage of customers accounts for a relatively small percentage of aggregate sales. One may compare the data with the proportion of time spent on the customers, i.e. the number of sales calls. An analysis of this type will enable the company to devote relatively more time to those customers who collectively account for proportionately larger sales.
Sales analysis by customer can also be combined with analysis both by area and product. Such an analysis will prove to be more revealing. For example, it may indicate that in some areas sales are not increasing with a particular type of customer though they have grown fast in other areas. Information of this type will be extremely useful to the company as it identifies the weak spots where greater effort is called for.
SALES ANALYSIS BY SIZE OF ORDER
Sales analysis by size of order may show that a large volume of sales is accompanied by low profit and vice versa. In case cost accounting data are available by size of order, this would help in identifying sales where the costs are relatively high and the company is incurring a loss. Sales analysis by size of order can also be combined with that by products, areas and types of customers. Such a perceptive analysis would reveal useful information to the company and enable it to make a more rational and effective effort in maximizing its return from sales.
THE CONCEPT OF MARKET POTENTIAL
Market potential has been defined as "the maximum demand response possible for a given group of customers within a well-defined geographic area for a given product or service over a specified period of time under well-defined competitive and environmental conditions."
We will elaborate this comprehensive definition. First, market potential is the maximum demand response under certain assumptions. It denotes a meaningful boundary condition on ultimate demand.
Another condition on which the concept of market potential depends is a set of relevant consumers of the product or service. It is not merely the present consumer who is to be included but also the potential consumer as maximum possible demand is to be achieved. Market potential will vary depending on which particular group of consumers is of interest.
Further, the geographic area for which market potential is to be determined should be well-defined. It should be divided into mutually exclusive subsets of consumers so that the management can assign a sales force and supervise and control the activities in different territories without much difficulty.
Another relevant aspect in understanding the concept of market potential is to clearly know the product or service for which market potential is to be estimated. Especially in those cases where the product in question can be substituted by another, it is desirable to have market potential for the product class rather than that particular product. For example, tea is subjected to a high degree of cross-elasticity of demand with coffee.
It is necessary to specify the time period for which market potential is to be estimated. The time period should be so chosen that it coincides with planning periods in a firm. Both short and long-time periods can be used depending on the requirements of the firm.
Finally, a clear understanding of environmental and competitive conditions relevant in case of a particular product or service is necessary if market potential is to be useful. What is likely to be the external environment? What is likely to be the nature and extent of competition? These are relevant questions in the context of any estimate of market potential since these are the factors over which the firm has no control.
It may be emphasised that market potential is not the same thing as sales potential and sales forecast. It is only when "a market is saturated can the industry sales forecast be considered equivalent to market potential.” Such a condition is possible in case of well established and mature products. Generally, the industry sales forecast will be less than the market potential. Likewise, a company's sales forecast will be less than its sales potential. The former is a point estimate of the future sales, while the latter represents a boundary condition which the sales might reach in an ideal situation. "In the latter sense, sales potential is to a firm what market potential is to an industry or product class: both represent maximum demand response and are boundary conditions.”
BRAND POSITIONING
Brand positioning is a relatively new concept in marketing. The concept owes its origin to the idea that each brand occupies a particular space in the consumer's mind, signifying his perception of the brand in question in relation to other brands. While product or brand positioning has been defined by various authors in different ways, the underlying meaning conveyed through these definitions seems to be the same. Instead of giving several definitions, we may give one here. According to Green and Tull,
"Brand positioning and market segmentation appear to be the hallmarks of today's marketing research. Brand (or service) positioning deals with measuring the perceptions that buyers hold about alternative offerings."
From this definition it is evident that the term 'position' reflects the essence of a brand as perceived by the target consumer in relation to other brands. In view of this, the management's ability to position its product or brand appropriately in the market can be a major source of company's profits. This seems to be an important reason for the emergence of product or brand positioning as a major area in marketing research.
COMPONENTS OF POSITIONING
Positioning comprises four components. The first component is the product class or the structure of the market in which a company's brand will compete. The second component is consumer segmentation. One cannot think of positioning a brand without considering the segment in which it is to be offered. Positioning and segmentation are inseparable. The third component is the consumer's perception of the company's brand in relation to those of the competitors. Perceptual mapping is the device by which the company can know this. Finally, the fourth component of positioning is the benefit offered by the company's brand. A consumer can allot a position in his mind to a brand only when it is beneficial to him. The benefits may be expressed as attributes or dimensions in a chart where brands are 'fitted' to indicate the consumer's perceptions.
As perceptual maps are used to indicate brand positioning, blank spaces in such maps show that a company can position its brand in one or more of such spaces.
TECHNIQUES FOR PERCEPTUAL MAPPING
There are a number of techniques for measuring product positioning. Some of these which are important are:
• Factor analysis
• Cluster analysis Multi-dimensional scaling.
We will not go into the detailed mechanism of these techniques. All the same, we will briefly explain the techniques.
IMAGE PROFILE ANALYSIS
This technique is the oldest and most frequently used for measuring the consumer's perceptions of competitive brands or services. Normally, a 5 or 7 point numerical scale is used. A number of functional and psychological attributes are selected. The respondent is asked to show his perception of each brand in respect of each attribute on the 5 or 7 point scale.
It will be seen that the figures provides some insight as to which brands are competing with each other and on what attribute(s). This technique has some limitations. First, if the number of brands is large, it may not be possible to plot all the brands in a single figure. Second, there is an implicit assumption in this technique that all attributes are equally important and independent of each other. This is usually not true. However, this limitation can be overcome by using the technique of factor analysis.
FACTOR ANALYSIS
As regards factor analysis, it may be pointed out that its main object is to reduce a large number of variables into a small number of factors or dimensions. In Chapter 17, two examples have been given to illustrate the use of factor analysis. The discussion also brings out some major limitations of the method.
CLUSTER ANALYSIS
Cluster analysis is used to classify consumers or objects into a small number of mutually exclusive and exhaustive groups. With the help of cluster analysis, it is possible to separate brands into clusters or groups so that the brand within a cluster is similar to other brands belonging to the same cluster and is very different from brands included in other clusters. This method has been discussed in Chapter 17.
MULTI-DIMENSIONAL SCALING
Multi-dimensional scaling too has been discussed in Chapter 17, pointing out how perceptual maps can be developed on the basis of responses from consumers. In this connection, two illustrations of perceptual maps were given. The first illustration6 related to selected Business Schools based on hypothetical data. On the basis of two criteria, viz. how prestigious and quantitative an MBA course is, different Business Schools have been shown in the map. It will be seen that the MBA course of Business School 'C' is extremely different from that offered by Business School 'G'. Points which are close to each other indicate similarity of the MBA courses in the student's perception. The second illustration related to four brands of washing soaps based on a survey data from Calcutta. This is a non-attribute based example where a paired comparison for four high- and-medium-priced detergents -Surf, Sunlight, Gnat and Key was undertaken. As mentioned there, Sunlight and Surf are closest and Surf and Key are farthest. In other words, the first two brands are most similar and the remaining two are most dissimilar. How the points in the figures for the four brands have been arrived at, has been explained at length in that chapter and so is not repeated here.
Subroto Sengupta has discussed at length product positioning in his book. While explaining different techniques of product positioning, he has shown how the concept of positioning can be used to improve the image of the concerned product or brand. He has given a number of examples covering a wide variety of products such as coffee, soft drinks, washing soaps, toilet soaps, shampoos and magazines. As Sengupta points out the perceptual maps of product class also indicate holes or vacant positions in the market. These open spaces can be helpful to the management in suggesting new product opportunities as also possibilities for respositioning of old products. While it is true that the management does get the clues on preferred attributes of the product in question, it is unable to know all the relevant features of the new product such as its form, package and price. This problem can be overcome through the application of the conjoint analysis. In addition, Sengupta has discussed some research studies in respect of advertising positioning.
We now give a detailed version of a study indicating how a brand which was putting up a poor performance in the market was repositioned. As a result, it improved its image and contributed to increased market share and profits.
WHEN TO DO MARKETING RESEARCH?
Marketing research can be done when:
There is an information gap which can be filled by doing research.
The cost of filling the gap through marketing research is less than the cost of taking a wrong decision without doing the research.
The time taken for the research does not delay decision-making beyond reasonable limits.
A delay can have many undesirable effects, like competitors becoming aware of strategies or tactics being. Contemplated, consumer opinion changing between the "beginning and end of the study, and so forth.
LIMITATIONS OF MARKETING RESEARCH
It must be kept in mind that marketing research, though very useful most of the time, is not the only input for decision-making. For example, many small businesses work without doing marketing research, and some of them are quite successful. It is obviously some other model of informal perceptions about consumer behaviour, needs, and expectations that is at work in such cases. Many businessmen and managers base their work on judgement, intuition, and perceptions rather than numerical data.
There is a famous example in India, where a company commissioned a marketing research company to find out if there was adequate demand for launching a new camera. This was in pre-¬liberalised India, of the early 1980s. The finding of the research study was that there was no demand, and that the camera would not succeed, if launched. The company went ahead and launched it anyway, and it was a huge success. The camera was Hot Shot. It was able to tap into the need of consumers at that time for an easy-to-use camera at an affordable price.
Thus marketing research is not always the best or only source of information to be used for making decisions. It works best when combined with judgement, intuition, experience, and passion. For instance, even if marketing research were to show there was demand for a certain type of product, it still depends on the design and implementation of the appropriate marketing plans to make it succeed. Further, competitors could take actions which were not foreseen when marketing research was undertaken. This also leads us to conclude that the time taken for research should be the minimum possible, if we expect the conditions to be dynamic, or fast¬changing.
DIFFERENCES IN METHODOLOGY
The reader may be familiar with research studies or opinion polls conducted by different agencies showing different results. One of the reasons why results differ is because the methodology followed by each agency is usually different. The sampling method used, the sample size itself, the representativeness of the population, the quality of field staff who conduct interviews, and conceptual skills in design and interpretation all differ from agency to agency. Minor differences are to be expected in sample surveys done by different people, hut major differences should be examined for the cause, which will usually lead us to the different methodologies adopted by them. Based on the credibility of the agency doing the research and the appropriateness of the methodology followed, the user decides which result to rely upon. A judgment of which methodology is more appropriate for the research on hand comes from experience of doing a variety of research.
To summarise, it is important to understand the limitations of marketing research, and to use it in such a way that we minimise its limitations.
COMPLEMENTARY INPUTS FOR DECISION-MAKING
Along with marketing research, marketing managers may need to look into other information while making a decision. For example, our corporate policy may dictate that a premium image must be maintained in all activities of our company. On the other hand, marketing research may tell us that consumers want a value-for-money product. This creates a dilemma for the basic corporate policy, which has to be balanced with consumer perception as measured by marketing research.
Other inputs for decision-making could be growth strategies for the brand or product, competitors' strategies, and regulatory moves by the government and others. Some of these are available internally-for example, corporate policy and growth plans may be documented internally. Some other inputs may come from a marketing intelligence cell if the company has one. In any case, marketing decisions would be based on many of these complementary inputs, and not on the marketing research results alone.
SECONDARY AND PRIMARY RESEARCH
One of the most basic differentiations is between secondary and primary research. Secondary research is any information we may use, but which has not been specifically collected for the current marketing research. This includes published sources of data, periodicals, newspaper reports, and nowadays, the Internet. It is sometimes possible to do a lot of good secondary research and get useful information. But marketing research typically requires a lot of current data which is not available from secondary sources. For example, the customer satisfaction level for a product or brand may not be reported anywhere. The effectiveness of a particular advertisement may be evident from the sales which follow. But why people liked the advertisement may not be obvious, and can only be ascertained through interviews with consumers. Also, the methodology for the secondary data already collected may be unknown, and therefore we may be unable to judge the reliability and validity of the data.
Primary research is what we will be dealing with throughout this book. It can be defined as research which involves collecting information specifically for the study on hand, from the actual sources such as consumers, dealers or other entities involved in the research. The obvious advantages of primary research are that it is timely, focused, and involves no unnecessary data collection, which could be a wasted effort. The disadvantage could be that it is expensive to collect primary data. But when an information gap exists, the cost could be more than compensated by better decisions, which are taken with the collected data.
Thus Research has become a significant element of successful Marketing System. The following section speaks about application of Research in Operation/Production Management domains.
APPLICATIONS OF RESEARCH METHODOLOGY IN OPERATIONS MANAGEMENT
METHODS OF ESTIMATING CURRENT DEMAND
There are two types of estimates of current demand which may be helpful to a company. These are: total market potential and territory potential. "Total market potential is the maximum amount of sales that might be available to all the firms in an industry during a given period under a given level• of industry marketing effort and given environmental conditions.”
Symbolically, total market potential is:
Q=n x q x p
where Q = total market potential
n = number of buyers in the specific product/market under the given assumptions
q = quantity purchased by an average buyer
p = price of an average unit
Of the three components n, q, and p in the above formula, the most difficult component to estimate is q. One can start with a broad concept of q, gradually reducing it. For example, if we are thinking of readymade shirts for home consumption, we may first take the total male population eliminating that in rural areas. From the total male urban population, we may eliminate the age groups which are not likely to buy readymade shirts. Thus, the number of boys below 20 may be eliminated. Further eliminations on account of low income may be made. In this way we can arrive at the 'prospect pool' of those who are likely to buy shirts.
The concept of market potential is helpful to the firm as it provides a benchmark against which actual performance can be measured. In addition, it can be used as a basis for allocation decisions regarding marketing effort.
The estimate of total market potential is helpful to the company when it is in a dilemma whether to introduce a new product or drop an existing one. Such an estimate will indicate whether the prospective market is large enough to justify the company's entering it.
Since it is impossible for a company to have the global market exclusively to itself, it has to select those territories where it can sell its products well. This means that companies should know the territorial potentials so that they can select markets most suited to them, channelise their marketing effort optimally among these markets and also evaluate their sale performance in such markets.
There are two methods for estimating territorial potentials: (i) market-buildup method, and (ii) index-of-buying-power method. In the first method, several steps are involved. First, identify all the potential buyers for the product in each market. Second, estimate potential purchases by each potential buyer. Third, sum up the individual figures in step (ii) above. However, in reality the estimation is not that simple as it is difficult to identify all potential buyers. When the product in question is an industrial product, directories of manufacturers of a particular product or group of products are used. Alternatively, the Standard Industrial Classification of Manufacturers of a particular product or group of products is used.
The second method involves the use of a straight forward index. Suppose a textile manufacturing company is interested in knowing the territorial potential for its cloth in a certain territory. Symbolically,
Bi = 0.5Yi + 0.2ri + 0.3pi
where Bi = percentage of total national buying power in territory i
Yi = percentage of national disposable personal income originating in
territory i
ri = percentage of national retail sales in territory i
pi = percentage of national population living in territory i
It may be noted that such estimates indicate potential for the industry as a whole rather than for individual company. In order to arrive at a company potential, the concerned company has to make ,certain adjustments in the above estimate on the basis of one or more other factors that have not been covered in the estimation of territorial potential. These factors could be the company's brand share, lumber of salespersons, number and type of competitors, etc.
FORECASTING PROCESS
After having described the methods of estimating the current demand, we now turn to forecasting.
There are five steps involved in the forecasting process. These are mentioned below.
First, one has to decide the objective of the forecast. The marketing researcher should know as to what will be the use of the forecast he is going to make.
Second, the time period for which the forecast is to be made should be selected. Is the forecast short-term, medium-term or long-term? Why should a particular period of forecast be selected?
Third, the method or technique of forecasting should be selected. One should be clear as to why a particular technique from amongst several techniques should be used.
Fourth, the necessary data should be collected. The need for specific data will depend on the forecasting technique to be used.
Finally, the forecast is to be made. This will involve the use of computational procedures.
In order to ensure that the forecast is really useful to the company, there should be good understanding between management and research. The management should clearly spell out the purpose of the forecast and how it is going to help the company. It should also ensure that the researcher has a proper understanding of the operations of the company, its environment, past performance in terms of key indicators and their relevance to the future trend. If the researcher is well-informed with respect to these aspects, then he is likely to make a more realistic and more useful forecast for the management.
METHODS OF FORECASTING
The methods of forecasting can be divided into two broad categories, viz. subjective or qualitative methods and objective or quantitative methods. These can be further divided into several methods. Each of these methods is discussed below.
SUBJECTIVE METHODS
There are four subjective methods-field sales force, jury of executives, users' expectations and delphi. These are discussed here briefly.
FIELD SALES FORCE
Some companies ask their salesmen to indicate the most likely sales for a specified period in the future. Usually the salesman is asked to indicate anticipated sales for each account in his territory. These forecasts are checked by district managers who forward them to the company's head office. Different territory forecasts are then combined into a composite forecast at the head office. This method is more suitable when a short-term forecast is to be made as there would be no major changes in this short period affecting the forecast. Another advantage of this method is that it involves the entire sales force which realises its responsibility to achieve the target it has set for itself. A major limitation of this method is that sales force would not take an overall or broad perspective and hence may overlook some vital factors influencing the sales. Another limitation is that salesmen may give somewhat low figures in their forecasts thinking that it may be easier for them to achieve those targets. However, this can be offset to a certain extent by district managers who are supposed to check the forecasts.
JURY OF EXECUTIVES
Some companies prefer to assign the task of sales forecasting to executives instead of a sales force. Given this task each executive makes his forecast for the next period. Since each has his own assessment of the environment and other relevant factors, one forecast is likely to be different from the other. In view of this, it becomes necessary to have an average of these varying forecasts. Alternatively, steps should be taken to narrow down the difference in the forecasts. Sometimes this is done by organising a discussion between the executives so that they can arrive at a common forecast. In case this is not possible, the chief executive may have to decide which of these forecasts is acceptable as a representative one.
This method is simple. At the same time, it is based on a number of different viewpoints as opinions of different executives are sought. One major limitation of this method is that the executives' opinions are likely to be influenced in one direction on the basis of general business conditions.
USERS' EXPECTATIONS
Forecasts can be based on users' expectations or intentions to purchase goods and services. It is difficult to use this method when the number of users is large. Another limitation of this method is that though it indicates users' 'intentions' to buy, the actual purchases may be far less at a subsequent period. It is most suitable when the number of buyers is small such as in case of industrial products.
THE DELPHI.METHOD
This method too is based on the experts' opinions. Here, each expert has access to the same information that is available. A feedback system generally keeps them informed of each others' forecasts but no majority opinion is disclosed to them. However, the experts are not brought together. This is to ensure that one or more vocal experts do not dominate other experts.
The experts are given an opportunity to compare their own previous forecasts with those of the others and revise them. After three or four rounds, the group of experts arrives at a final forecast.
The method may involve a large number of experts and this may delay the forecast considerably. Generally it involves a small number of participants ranging from 10 to 40.
It will be seen that both the jury of executive opinion and the Delphi method are based on a group of experts. They differ in that in the former, the group of experts meet, discuss the forecasts, and try To arrive at a commonly agreed forecast while in the latter the group of experts never meet. As mentioned earlier, this is to ensure that no one person dominates the discussion thus influencing the forecast. In other words, the Delphi method retains the wisdom of a group and at the same time reduces the effect of group pressure. An approach of this type is more appropriate when long-term forecasts are involved.
In the subjective methods, judgement is an important ingredient. Before attempting a forecast, the basic assumptions regarding environmental conditions as also competitive behaviour must be provided people involved in forecasting. An important advantage of subjective methods is that they are easily understood. Another advantage is that the cost involved in forecasting is quite low.
As against these advantages, subjective methods have certain limitations also. One major limitation is the varying perceptions of people involved in forecasting. As a result, wide variance is found in forecasts. Subjective methods are suitable when forecasts are to be made for highly technical products which have a limited number of customers. Generally, such methods are used for industrial products. Also when cost of forecasting is to be kept minimum, subjective methods may be more suitable.
QUANTITATIVE OR STATISTICAL METHODS
Based on statistical analysis, these methods enable the researcher to make forecasts on a more objective basis. It is difficult to make a wholly accurate forecast for there is always an element of uncertainty regarding the future. Even so, statistical methods are likely to be more useful as they are more scientific and hence more objective.
TIME SERIES
In time-series forecasting, the past sales data are extrapolated as a linear or a curvilinear trend. Even if such data are plotted on a graph, one can extrapolate for the desired time period. Extrapolation can be made with the help of statistical techniques.
It may be noted that time-series forecasting is most suitable to stable situations where the future ids will largely be an extension of the past. Further, the past sales data should have distinctive ids from the random error component for a time-series forecasting to be suitable.
Before using the time-series forecasting one has to decide how far back in the past one can go. It may be desirable to use the more recent data as conditions might have been different in the remote past. Another issue pertains to weighting of time-series data. In other words, should equal weight be given to each time period or should greater weightage be given to more recent data? Finally, should data be decomposed into different components, viz. trend, cycle, season and error? We now discuss methods, viz. moving averages, exponential smoothing and decomposition of time series.
MOVING AVERAGE
This method uses the last 'n' data points to compute a series of average in such a way that each time latest figure is used and the earliest one dropped. For example, when we have to calculate a five-monIthly moving average, we first calculate the average of January, February, March, April and May by adding the figures of these months, and dividing the sum by five. This will give one figure. In the next calculation, the figure for June will be included and that for January dropped thus giving a new average. Thus a series of averages is computed. The method is called as 'moving' average as it uses w data point each time and drops the earliest one.
In a short-term forecast, the random fluctuations in the data are of major concern. One method of minimizing the influence of random error is to use an average of several past data points. This is achieved by the moving average method. It may be noted that in a 12-month moving average, the effect of seasonality is removed from the forecast as data points for every season are included before computing the moving average.
EXPONENTIAL SMOOTHING
A method which has been receiving increasing attention in recent years is known as exponential smoothing. It is a type of moving average that 'smooths' the time-series. When a large number of forecasts are to be made for a number of items, exponential smoothing is particularly suitable as it combines the advantages of simplicity of computation and flexibility.
TIME-SERIES DECOMPOSITION
This method consists of measuring the four components of a time-series (i) trend, (ii) cycle, (iii) season, and (iv) erratic movement The trend component indicates long-term effects on sales that are caused by such factors as income, population, industrialisation and technology. The time period a trend function varies considerably from product to product However, it is usually taken as any period in excess of the time period required for a business cycle (which averages at 4-5 years).
The cyclical component indicates some sort of a periodicity in the general economic activity. When data are plotted, they yield a curve with peaks and troughs, indicating increases and falls in the 'en series with a certain periodicity. A careful study of the impact of a business cycle must be made on the sale of each product Cyclical forecasts are likely to be more accurate for the long-term than for the short term. The seasonal component reflects changes in sales levels due to factors such as weather, festivals, holidays, etc. There is a consistent pattern of sales for period within a year. Finally, the erratic movements in data arise on account of events such as strikes, lockouts, price wars, etc. The decomposition of time-series enables identification of the error component from the trend, cycle and season which are systematic components.
CASUAL OR EXPLANATORY METHODS
Causal or explanatory methods are regarded as the most sophisticated methods of forecasting sales. These methods yield realistic forecasts provided relevant data are available on the• major variables influencing changes in sales. There are three distinct advantages of causal methods. First, turning points in sales can be predicted more accurately by these methods than by time-series methods. Second, the use of these methods reduces the magnitude of the random component far more than it may be possible with the time-series methods. Third, the use of such methods provides greater insight to causal relationships. This facilitates the management in marketing decision making. Isolated sales recasts on the basis of time-series methods would not be helpful in this regard.
Causal methods can be either (i) leading indicators or (ii) regression models. These are briefly discussed here.
LEADING INDICATORS
Sometimes one finds that changes in sales of a particular product or service are preceded by changes one or more leading indicators. In such cases, it is necessary to identify leading indicators and to closely observe changes in them. One example of a leading indicator is the demand for various household appliances which follows the construction of new houses. Likewise, the demand for many durables is preceded by an increase in disposable income. Yet another example is of number of births. The demand for baby food and other goods needed by infants can be ascertained by the number of births in a territory. It may be possible to include leading indicators in regression models.
REGRESSION MODELS
Linear regression analysis is perhaps the most frequently used and the most powerful method among casual methods. As we have discussed regression analysis in detail in the preceding chapters on Bivariate Analysis and Multivariate Analysis, we shall only dwell on a few relevant points.
First, regression models indicate linear relationships within the range of observations and at the les when they were made. For example, if a regression analysis of sales is attempted on the basis of independent variables of population sizes of 15 million to 30 million and per capita income of Rs 1000 to Rs 2500, the regression model shows the relationships that existed between these extremes in the two independent variables. If the sales forecast is to be made on the basis of values of independent variables falling outside the above ranges, then the relationships expressed by the regression model may not hold good. Second, sometimes there may be a lagged relationship between the dependent and independent variables. In such cases, the values of dependent variables are to be related to those of independent variables for the preceding month or year as the case may be. The search for factors with a lead-lag relationship to the sales of a particular product is rather difficult. One should tryout several indicators before selecting the one which is most satisfactory. Third, it may happen that the data required to establish the ideal relationship, do not exist or are inaccessible or, if available, are not useful. Therefore, the researcher has to be careful in using the data. He should be quite familiar with the varied sources and types of data that can be used in forecasting. He should also know about their strengths and limitations. Finally, regression model reflects the association among variables. The causal interpretation is done by the researcher on the basis of his understanding of such an association. As such, he should be extremely careful in choosing the variables so that a real causative relationship can be established among the variables chosen.
INPUT-OUTPUT ANALYSIS
Another method that is widely used for forecasting is the input-output analysis. Here, the researcher takes into consideration a large number of factors, which affect the outputs he is trying to forecast. For this purpose, an input-output table is prepared where the inputs are shown horizontally as the column headings and the outputs vertically as the stubs. It may be mentioned that by themselves input-output flows are of little direct use to the researcher. It is the application of an assumption as to how the output of an industry is related to its use of various inputs that makes an input-output analysis a good method of forecasting. The assumption states that as the level of an industry's output changes, the use of inputs will change proportionately, implying that there is no substitution in production among the various inputs. This mayor may not hold good.
The use of input-output analysis in sales forecasting is appropriate for products sold to governmental, institutional and industrial markets as they have distinct patterns of usage. It is seldom used for consumer products and services. It would be most appropriate when the levels and kinds of inputs required to achieve certain levels of outputs need to• be known.
A major constraint in the use of this method is that it needs extensive data for a large number of items which may not be easily available. Large business organisations may be in a position to collect such data on a continuing basis so that they can use input-output analysis for forecasting. However, this is not possible in case of small industrial organisations on account of excessive costs involved in the collection of comprehensive data. It is for this reason that input-output analysis is less widely used than most analysts initially expected. A detailed discussion of input-output analysis is beyond the scope of this book.
ECONOMETRIC MODEL
Econometcics is concerned with the use of statistical and mathematical techniques to verify hypotheses emerging n economic theory. "An econometric model incorporates functional relationships estimated e techniques into an internally consistent and logically self-contained framework." Econometric use both exogenous and endogenous variables. Exogenous variables are used as inputs into the but they themselves are determined outside the model. These variables include policy variables controlled events. In contrast, endogenous variables are those which are determined within the System.
The use of econometric models is generally found at the macro level such as forecasting national e and its components. Such models show how the economy or any of its specific segment operates. As compared to an ordinary regression equation they bring out the causalities involved more distinctly. This merit of econometric models enables them to predict turning points more accurately. 'However, their use at the micro-level for forecasting has so far been extremely limited
APPLICATIONS OF RESEARCH METHODOLOGY IN HUMAN RESOURCES MANAGEMENT
Research methodology widely used in the domains of Human resources Management. It is called as Human resources Metrics (H.R Metrics).
To move to the center of the organization, HR must be able to talk in quantitative, objective terms. Organizations are managed by data.
Unquestionably, at times, managers make decisions based on emotions as facts. Nevertheless, day-to-day operations are discussed, planned and evaluated in hard data terms.”
Perhaps the most crucial advantage of a sound HR metrics programme is that it enables HR to converse with senior management in the language of business. Operational decisions taken by HR are then based on cold, hard facts rather than gut feeling, the figures being used to back up business cases and requests for resource. The HR function is transformed from a bastion of ‘soft’ intangibles into something more ‘scientific’, better able to punch it’s weight in the organisation. In addition, the value added by HR becomes more visible. This will become increasingly important as more and more functions attempt to justify their status as strategic business partners rather than merely cost centres.
The five key practices of the Human Capital Index are as follows:
1. Recruiting excellence
2. clear rewards and accountability,
3. prudent use of resources,
4. communications integrity
5. collegial flexible workplace
• They require the capture of metrics for their very definition. Metrics help quantify and demonstrate the value of HR
• Metrics can help guide workforce strategies and maximize return on HR investments
• Metrics provide measurement standards
• Metrics help show what HR contributes to overall business results
SUMMARY
The above lesson has given brief introduction to various application of various Research techniques in management. It has identified the appropriate tools for applications of various domains of Management.
KEY TERMS
• Marketing research
• Brand positioning
• Image profile analysis
• Market Potential
• Demand Measurement
• Delphi Method
• Time Series analysis
• Moving average
• HR Metrics
IMPORTANT QUESTIONS
26. What are all the various domains in which Research tools can be used?
27. Explain the application of Image profile analysis with example.
28. Differentiate between Primary and secondary research.
29. What are the limitations of marketing research.
30. Describe the method for finding out the Market potential.
31. Explain the Various methods to estimate the Demand.
32. What do you mean by HR Metrics?
33. Note down the five key practices of the Human Capital Index.
 
LESSON – 19
REPORT PREPARATIONS
OBJECTIVES
• To learn the Structure of Professional Research report.
• To understand the application of following diagrams
o Area Chart
o Line graph
o Bar chart
o Pie chart
o Radar diagram
o Surface diagram
o Scatter diagram
STRUCTURE
 Research report
 Report format
 Data presentation
 Pareto chart
RESEARCH REPORT
The final step in the research process is the preparation and presentation of the research report. A research report can be defined as the presentation of the research findings directed to a specific audience to accomplish a specific purpose.
IMPORTANCE OF REPORT
The research report is important for the following reasons,
1. The results of research can be effectively communicated to management.
2. The report is the only aspect of the study, which executives are exposed to and their consecutive evaluation of the project rests with the effectiveness of the written and oral presentation.
3. The report presentations are typically the responsibility of the project worthiness. So the communication effectiveness and usefulness of the information provided plays a crucial role in determining whether that project will be continued in future.
STEPS IN REPORT PREPARATION
Preparing a research report involves three steps
1. Understanding the research
2. Organizing the information
3. Writing with effectiveness
GUIDELINES
The general guidelines that should be followed for any report are as follows,
1. Consider the audience: The information resulting from research is ultimately importance to management people, who will use the results to make decisions. Decision makers are interested in a clear, concise, accurate and interesting report, which directly focuses on their information needs with a minimum of technological jargons. Thus, the report has to be understood by them; the report should not be too technical and not too much jargon should be used. This is a particular difficulty when reporting the results of statistical analysis where there is a high probability that few, if any, of the target audience have a grasp of statistical concepts. Hence, for example, there is a need to translate such terms as standard deviation, significance level, confidence interval etc. into everyday languages.
2. Be concise, but precise: The real skill of the researcher is tested in fulfilling this requirement. The report must be concise and must focus on the crucial elements of the project. It should not include the unimportant issues. Researcher should know how much emphasis has to be given to each area.
3. Be objective, yet effective: The research report must be an objective presentation of the research findings. The researcher violates the standard of objectivity if the findings are presented in a distorted or slanted manner. The findings can be presented in a manner, which is objective, yet effective. The writing style of the report should be interesting, with the sentence structure short and to the point.
4. Understand the results and draw conclusions: The managers who read the report are expecting to see interpretive conclusions in the report. The researcher should understand the results and be able to interpret it effectively to management. Simply reiterating the facts will not do, implications has to be drawn using the “so what” questions on the results.
REPORT FORMAT
Every person has a different style of writing. There is not really one right style of writing, but the following outline is generally accepted as the basis format for most research projects.
1. Title Page
2. Table of contents
3. Executive summary
4. Introduction
5. Problem statement
6. Research objective
7. Background
8. Methodology
9. Sampling design
10. Research design
11. Data collection
12. Data analysis
13. Limitation
14. Findings
15. Conclusions
16. Summary and conclusions
17. Recommendations
18. Appendices
19. Bibliography
TITLE PAGE
The title page should contain a title which conveys the essence of the study, the date, the name of the organization submitting the report, and the organization for whom there is prepared. If the research report is confidential, the name of those individuals to receive the report should be specified on the title page.
TABLE OF CONTENTS
As a rough guide, any report of several sections that totals more than 6 to 10 pages should have a table of contents. The table of contents lists the essence of topics covered in the report, along with page references. Its purpose is to aid readers in finding a particular section in the report. If there are many tables, charts, or other exhibits, they should also be listed after the table of contents in a separate table of illustrations.
EXECUTIVE SUMMARY
An executive summary can serve tow purposes. It may be a report in miniature covering all the aspects in the body of the report, but in abbreviated form. Or it may be a concise summary of major findings and conclusions including recommendations.
Two pages are generally sufficient for executive summaries. Write this section after the report is finished. It must exclude the new information but may require graphics to present a particular conclusion.
Expect the summary to contain a high density of significant terms since it is repeating the highlights of report. A good summary should help the decision maker and it is designed to be action oriented.
INTRODUCTION
The introduction prepares the reader for the report by describing the parts of the project: the problem statement, research objectives and background material.
The introduction must clearly explain the nature of decision problem. It should review the previous research done on the problem.
PROBLEM STATEMENT
The problem statement contains the need for the research project. The problem is usually represented by a management question. It is followed by a more detailed set of objectives.
RESEARCH OBJECTIVES
The research objectives address the purpose of the project. These may be research question(s) and associated investigative questions.
BACKGROUND
The Background material may be of two types. It may be the preliminary results of exploration from an experience survey, focus group, or another source. Alternatively it could be secondary data from the literature review. Background material may be placed before the problem statement or after the research objectives. It contains information pertinent to the management problem or the situation that led to the study.
METHODOLOGY
The purpose of the methodology section is to describe the nature of the research design, the sampling plan, data collection and analysis procedure. Enough details must be conveyed so that the reader can appreciate the nature the methodology used, yet the presentation must not be boring overpowering. The use of technical jargon must be avoided.
RESEARCH DESIGN
The coverage of the design must be adapted to the purpose. The type of research adapted and reason for adapting that particular type should d be explained.
SAMPLING DESIGN
The research explicitly defines the target population being studied and the sampling methods used. It has to explain the sampling frame, sampling method adapted and sample size. Explanation of the sampling method, uniqueness of the chosen parameters or other relevant points that need explanation should be covered with brevity. Calculation of sample size can be placed either in this part or can be placed in an appendix.
DATA COLLECTION
This part of report describes the specifics of gathering the data. Its content depends on the selected design. The data collection instruments(questionnaire or interview schedule) field instructions can be placed in the appendix.
DATA ANALYSIS
This section summarizes the methods used to analyze the data. Describe data handling, preliminary analysis, statistical tests, computer programs and other technical information. The rationale for the choice of analysis approaches should be clear. A brief commentary on assumptions and appropriateness of use should be presented.
LIMITATIONS
Every project has weakness, which need to be communicated in a clear and concise manner. In this process, the researcher should avoid belaboring minor study weakness. The purpose of this section is not to disparage the quality of the research project, but rather to enable the reader to judge the validity of the study results.
Generally the limitations will occur in sampling, no response inadequacies and methodological weakness. It is the researcher’s professional responsibility to clearly inform the reader of these limitations.
FINDINGS
The objective of this part is to explain the data rather than draw conclusions. When quantitative data can be presented, this should be done as simply as possible with charts, graphics and tables.
The findings can be presented in a small table or chart on the same page. While this arrangement adds to the bulk of the report, it is convenient for the reader.
CONCLUSIONS
It can be further divide into two parts as summary and recommendations.
SUMMARY AND CONCLUSIONS
The summary is brief statement of the essential findings. The conclusion should clearly link the research findings with the information needs, and based on this linkage recommendation for action can be formulated. In some research works the conclusions were presented in a tabular form for easy reading and reference. The research questions /objectives will be answered sharply in this part.
RECOMMENDATIONS
The researcher’s recommendations may be weighed more heavily in favor of the research findings. There are few ideas about corrective actions. The recommendations are given for managerial actions rather than research action. The researcher may offer several alternatives with justifications.
APPENDICES
The purpose of appendix is to provide a place for material, which is not absolutely essential to the body of the report. The material is typically more specialized and complex than material presented in the main report, and it is designed to serve the needs of the technically oriented reader. The appendix will frequently contain copies of the data collection forms, details of the sampling plan, estimates of statistical error, interviewer instructions and detailed statistical tables associated with the data analysis process. The reader who wishes to learn the technical aspects of the study and to look at statistical breakdowns will want a complete appendix.
BIBLIOGRAPHY
The use of secondary data requires a Bibliography. Proper citation, style and formats are unique to the purpose of the report. The instructor, program, institution, or client often specifies style requirements. It will be given as footnote or endnote format. The author name, title, publication, year, pager number are the important elements of bibliography.
DATA PRESENTATION
The research data can be presented in Tabular & graphic form.
TABLES
The tabular form consists of the numerical presentation of the data. Tables should contain the following elements:
1. Table number: this permits easy location in the report
2. Title: the title should clearly indicate the contents of the table or figure.
3. Box head and sub head: the box head contains the captions or labels to the column in a table, while the sub head contains the labels for the rows.
4. Footnote: footnote explains the particular section or item in the table or figure.
TABLE NO 6.1. SALES STATUS IN MARKET
CITY SALES IN $ % OF SALES
DELHI 4500 16
MUMBAI 8100 28
CHENNAI 6900 24
CALCUTTA 2300 08
BANGALORE 7000 24
GRAPHICS
The graphical form involves the presentation of data in terms of visually interpreted sizes. Graphs should contain the following elements:
1. graph or figure number
2. title
3. footnote
4. sub heads in the axis
BAR CHART
A bar chart depicts magnitudes of the data by the length of various bars which have been laid our with reference to a horizontal or vertical scales. The bar chart is easy to construct and can be readily interpreted.
FIGURE 6.1 SALES STATUS IN MARKET

COLUMN CHART
FIGURE 6.2. SALES STATUS IN MARKET

These graphs compare the sizes and amounts of categories usually for the same time. Mostly places the categories on X-axis and values on Y-axis.
PIE CHART
The pie chart is a circle divided into sections such that the size of each section corresponds to a portion of the total. It shows the relationship of parts to the whole. Wedges are row values of data. It is one form of area chart. This type is often used with business data.
FIGURE 6.3 SALES STATUS IN MARKET

LINE GRAPH
Line graphs are used chiefly for time series and frequency distribution. There are several guidelines for designing the line graph.
• Put the independent variable in the horizontal axis
• When showing more than one line, use different line types
• Try not to put more than four lines on one chart
• Use a solid line for the primary data.
FIGURE 6.4 SALES STATUS IN MARKET

I. RADAR DIAGRAM
In this the radiating lines are categories; values are distances from the center. It can be applied where multiple variables used.
FIGURE 6.5 SALES STATUS IN MARKET

II. AREA (SURFACE) DIAGRAM
An area chart is also used for a time series. Like line charts it compares changing values but emphasis relative values of each series
FIGURE 6.6 SALES STATUS IN MARKET

III. SCATTER DIAGRAM
This shows the values if there is relationship between variables follows a pattern. May be used with one variable at different times.

FIGURE 6.7 SALES STATUS IN MARKET

PURPOSE OF A HISTOGRAM
A histogram is used to graphically summarize and display the distribution of a process data set. The purpose of a histogram is to graphically summarize the distribution of a univariate data set.
The histogram graphically shows the following:
1. center (i.e., the location) of the data;
2. spread (i.e., the scale) of the data;
3. skewness of the data;
4. presence of outliers; and
5. presence of multiple modes in the data.
These features provide strong indications of the proper distributional model for the data. The probability plot or a goodness-of-fit test can be used to verify the distributional model.
SAMPLE BAR CHART DEPICTION

HOW TO CONSTRUCT A HISTOGRAM
A histogram can be constructed by segmenting the range of the data into equal sized bins (also called segments, groups or classes). For example, if your data ranges from 1.1 to 1.8, you could have equal bins of 0.1 consisting of 1 to 1.1, 1.2 to 1.3, 1.3 to 1.4, and so on.
The vertical axis of the histogram is labeled Frequency (the number of counts for each bin), and the horizontal axis of the histogram is labeled with the range of your response variable.
The most common form of the histogram is obtained by splitting the range of the data into equal-sized bins (called classes). Then for each bin, the number of points from the data set that fall into each bin are counted. That is
• Vertical axis: Frequency (i.e., counts for each bin)
• Horizontal axis: Response variable
The classes can either be defined arbitrarily by the user or via some systematic rule. A number of theoretically derived rules have been proposed by Scott (Scott 1992).
The cumulative histogram is a variation of the histogram in which the vertical axis gives not just the counts for a single bin, but rather gives the counts for that bin plus all bins for smaller values of the response variable.
Both the histogram and cumulative histogram have an additional variant whereby the counts are replaced by the normalized counts. The names for these variants are the relative histogram and the relative cumulative histogram.
There are two common ways to normalize the counts.
1. The normalized count is the count in a class divided by the total number of observations. In this case the relative counts are normalized to sum to one (or 100 if a percentage scale is used). This is the intuitive case where the height of the histogram bar represents the proportion of the data in each class.
2. The normalized count is the count in the class divided by the number of observations times the class width. For this normalization, the area (or integral) under the histogram is equal to one. From a probabilistic point of view, this normalization results in a relative histogram that is most akin to the probability density function and a relative cumulative histogram that is most akin to the cumulative distribution function. If you want to overlay a probability density or cumulative distribution function on top of the histogram, use this normalization. Although this normalization is less intuitive (relative frequencies greater than 1 are quite permissible), it is the appropriate normalization if you are using the histogram to model a probability density function.
WHAT QUESTIONS THE HISTOGRAM ANSWERS
• What is the most common system response?
• What distribution (center, variation and shape) does the data have?
• Does the data look symmetric or is it skewed to the left or right?
• Does the data contain outliers?
PARETO CHART
Purpose Of A Pareto Chart A pareto chart is used to graphically summarize and display the relative importance of the differences between groups of data. A bar graph used to arrange information in such a way that priorities for process improvement can be established.
SAMPLE PARETO CHART DEPICTION

PURPOSES
 To display the relative importance of data.
 To direct efforts to the biggest improvement opportunity by highlighting the vital few in contrast to the useful many.
Pareto diagrams are named after Vilfredo Pareto, an Italian sociologist and economist, who invented this method of information presentation toward the end of the 19th century. The chart is similar to the histogram or bar chart, except that the bars are arranged in decreasing order from left to right along the abscissa. The fundamental idea behind the use of Pareto diagrams for quality improvement is that the first few (as presented on the diagram) contributing causes to a problem usually account for the majority of the result. Thus, targeting these "major causes" for elimination results in the most cost-effective improvement scheme.
HOW TO CONSTRUCT
• Determine the categories and the units for comparison of the data, such as frequency, cost, or time.
• Total the raw data in each category, then determine the grand total by adding the totals of each category. Re-order the categories from largest to smallest.
• Determine the cumulative percent of each category (i.e., the sum of each category plus all categories that precede it in the rank order, divided by the grand total and multiplied by 100).
• Draw and label the left-hand vertical axis with the unit of comparison, such as frequency, cost or time.
• Draw and label the horizontal axis with the categories. List from left to right in rank order.
• Draw and label the right-hand vertical axis from 0 to 100 percent. The 100 percent should line up with the grand total on the left-hand vertical axis.
• Beginning with the largest category, draw in bars for each category representing the total for that category.
• Draw a line graph beginning at the right-hand corner of the first bar to represent the cumulative percent for each category as measured on the right-hand axis.
• Analyze the chart. Usually the top 20% of the categories will comprise roughly 80% of the cumulative total.
GUIDELINES FOE EFFECTIVE APPLICATIONS OF PARETO ANALYSIS::
 Create before and after comparisons of Pareto charts to show impact of improvement efforts.
 Construct Pareto charts using different measurement scales, frequency, cost or time.
 Pareto charts are useful displays of data for presentations.
 Use objective data to perform Pareto analysis rather than team members opinions.
 If there is no clear distinction between the categories -- if all bars are roughly the same height or half of the categories are required to account for 60 percent of the effect -- consider organizing the data in a different manner and repeating Pareto analysis.
WHAT QUESTIONS THE PARETO CHART ANSWERS
• What are the largest issues facing our team or business?
• What 20% of sources are causing 80% of the problems (80/20 Rule)?
• Where should we focus our efforts to achieve the greatest improvements?
EXAMPLE FOR CONSTRUCTING THE PARETO CHART
The following table show the reasons for failure of patients in a Hospital
CATEGORY FREQUENCY PERCENT OF TOTAL CUMULATIVE %
WRONG DOSE 100 50 50
WRONG TIME 70 35 85
WRONG MEDICINE 15 7.5 92.5
WRONG PATIENT 8 4 96.5
MEDICINE DC'D 4 2 98.5
MISSED DOSE 3 1.5 100
GRAND TOTAL 200 100% 100%
Pareto chart for above details can be drawn as follows:

WHEN TO USE A PARETO CHART
Pareto charts are typically used to prioritize competing or conflicting "problems," so that resources are allocated to the most significant areas. In general, though, they can be used to determine which of several classifications have the most "count" or cost associated with them. For instance, the number of people using the various ATM's vs. each of the indoor teller locations, or the profit generated from each of twenty product lines. The important limitations are that the data must be in terms of either counts or costs. The data can not be in terms that can't be added, such as percent yields or error rates.
PICTOGRAPH
A pictograph is used to present statistics in a popular yet less statistical way to those who are not familiar with charts that contain numerical scales. This type of chart presents data in the form of pictures drawn to represent comparative sizes, scales or areas.
Again as with every chart, the pictograph needs a title to describe what is being presented and how the data are classified as well as the time period and the source of the data.
Examples of these types of charts appear below.
PICTOGRAPHS

A pictograph uses picture symbols to convey the meaning of statistical information. Pictographs should be used carefully because the graphs may, either accidentally or deliberately, misrepresent the data. This is why a graph should be visually accurate. If not drawn carefully, pictographs can be inaccurate
STEMPLOTS
In statistics, a stemplot (or stem-and-leaf plot) is a graphical display of quantitative data that is similar to a histogram and is useful in visualizing the shape of a distribution. The are generally associated with the Exploratory Data Analysis (EDA) ideas of John Tukey and the course Statistics in Society (NDST242) of the Open University, although in fact Arthur Bowley did something very similar in the early 1900's.
Unlike histograms, stemplots:
• retain the original data (at least the most important digits)
• Put the data in order - thereby easing the move order-based inference and non-parametric statistics.
A basic stemplot contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. The ease with which histograms can now be generated on computers has meant that stemplots are less used today than in the 1980's, whe they first became widely used.
To construct a stemplot, the observations must first be sorted in ascending order. Here is the sorted set of data values that will be used in the example:
54 56 57 59 63 64 66 68 68 72 72 75 76 81 84 88 106
Next, it must be determined what the stems will represent and what the leaves will represent. Typically, the leaf contains the last digit of the number and the stem contains all of the other digits. In the case of very large or very small numbers, the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stems.
In this example, the leaf represents the ones place and the stem will represent the rest of the number (tens place and higher).
The stemplot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line. It is important that each stem is listed only once and that no numbers are skipped, even if it means that some stems have no leaves. The leaves are listed in increasing order in a row to the right of each stem.
5 | 4 6 7 9
6 | 3 4 6 8 8
7 | 2 2 5 6
8 | 1 4 8
9 |
10 | 6
DOUBLE STEM PLOTS (STEM AND LEAF PLOT)
Splitting stems and the back-to-back stem plot are two distinct types of double stem plots, which are a variation of the basic stem plot.
SPLITTING STEMS
On the data set, splitting each of the stems into two or five stems may better illustrate the shape of the distribution. When splitting stems, it is important to split all stems and to split the stems equally. When splitting each stem into two stems, one stem contains leaves from 0-4 and leaves from 5-9 are contained in the other stem. When splitting each stem into five stems, one stem contains leaves 0-1, the next 2-3, the next 4-5, the next 6-7, and the last leave 8-9. Here is an example of a split stem plot (using the same data set from the example above) in which each stem is split into two:
5 | 4
5 | 6 7 9
6 | 3 4
6 | 6 8 8
7 | 2 2
7 | 5 6
8 | 1 4
8 | 8
9 |
9 |
10 |
10 | 6
SUMMARY
A research report can be defined as the presentation of the research findings directed to a specific audience to accomplish a specific purpose.
General guidelines followed to write the report are 1) Consider the audience, 2) Be concise, but precise 3) Be objective, yet effective & 4) Understand the results and draw conclusions.
The main elements of report are title page, table of contents, executive summary, introduction, methodology, findings, conclusions, appendices, and bibliography.
Tables and graphs are used for the presentation of data. Different type of graphs are available like bar chart, pie chart, line chart, area diagram, radar diagram, scatter diagram, etc. According to the nature of data and requirement the type of graph can be selected used effectively.
KEY TERMS
• Executive summary
• Sampling
• Bibliography
• Appendix
• Interview schedule
• Area chart
• Line graph
• Bar chart
• Pie chart
• Scatter diagram
• Radar diagram
• Surface diagram
• Pareto Chart
• Pictograph
• stem-graph
IMPORTANT QUESTIONS
34. What do you mean by research report?
35. Why is the research report important?
36. Explain the general guidelines exist for writing a report?
37. What are the preparations required for writing the report?
38. What components are typically included in a research report?
39. What are the alternative means of displaying data graphically?
40. Explain the application of Pareto Chart with example
41. What are the applications of Pictograph?
42. What are the procedures to draw stem-graph
REFERENCE BOOKS
43. Ramanuj Majumdar, Marketing Research, Wiley Estern Ltd., New Delhi, 1991.
44. Harper W Boyd, Jr etal, Marketing Research, Richard D. Irevin Inc. USA 1990.
45. Paul E. Green et al, Research for Marketing Decisions, Prentice Hall of India pvt. Ltd., New Delhi, 2004.


















1 comment: