What is Data Mining?
Data Mining is defined as extracting information
from huge sets of data. In other words, we can say that data mining is the
procedure of mining knowledge from data. The information or knowledge extracted
so can be used for any of the following applications :
- Market Analysis
- Fraud Detection
- Customer Retention
- Production Control
- Science Exploration
Data Mining Applications
Data mining is highly useful in the following
domains −
- Market Analysis and Management
- Corporate Analysis & Risk Management
- Fraud Detection
Apart from these, data mining can also be used
in the areas of production control, customer retention, science exploration,
sports, astrology, and Internet Web Surf-Aid
Market
Analysis and Management
Listed below are the various fields of market
where data mining is used −
·
Customer
Profiling − Data mining
helps determine what kind of people buy what kind of products.
·
Identifying
Customer Requirements − Data mining
helps in identifying the best products for different customers. It uses
prediction to find the factors that may attract new customers.
·
Cross
Market Analysis − Data mining
performs Association/correlations between product sales.
·
Target
Marketing − Data mining
helps to find clusters of model customers who share the same characteristics
such as interests, spending habits, income, etc.
·
Determining
Customer purchasing pattern −
Data mining helps in determining customer purchasing pattern.
·
Providing
Summary Information − Data mining
provides us various multidimensional summary reports.
Corporate
Analysis and Risk Management
Data mining is used in the following fields of
the Corporate Sector −
·
Finance
Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent
claim analysis to evaluate assets.
·
Resource
Planning − It involves
summarizing and comparing the resources and spending.
·
Competition − It involves monitoring competitors and
market directions.
Fraud
Detection
Data mining is also used in the fields of credit
card services and telecommunication to detect frauds. In fraud telephone calls,
it helps to find the destination of the call, duration of the call, time of the
day or week, etc. It also analyzes the patterns that deviate from expected
norms.
Data mining deals with the kind of patterns that can be
mined. On the basis of the kind of data to be mined, there are two categories
of functions involved in Data Mining −
- Descriptive
- Classification
and Prediction
Descriptive Function
The descriptive function deals with the general properties
of data in the database. Here is the list of descriptive functions −
- Class/Concept
Description
- Mining of
Frequent Patterns
- Mining of
Associations
- Mining of
Correlations
- Mining of
Clusters
Class/Concept
Description
Class/Concept refers to the data to be associated with the
classes or concepts. For example, in a company, the classes of items for sales
include computer and printers, and concepts of customers include big spenders
and budget spenders. Such descriptions of a class or a concept are called
class/concept descriptions. These descriptions can be derived by the following
two ways −
·
Data
Characterization − This refers to
summarizing data of class under study. This class under study is called as
Target Class.
·
Data Discrimination − It refers to the mapping or classification of a
class with some predefined group or class.
Mining of
Frequent Patterns
Frequent patterns are those patterns that occur frequently
in transactional data. Here is the list of kind of frequent patterns −
·
Frequent Item Set − It refers to a set of items that frequently appear
together, for example, milk and bread.
·
Frequent
Subsequence − A sequence of patterns
that occur frequently such as purchasing a camera is followed by memory card.
·
Frequent Sub
Structure − Substructure refers to
different structural forms, such as graphs, trees, or lattices, which may be
combined with item-sets or subsequences.
Mining of
Association
Associations are used in retail sales to identify patterns
that are frequently purchased together. This process refers to the process of
uncovering the relationship among data and determining association rules.
For example, a retailer generates an association rule that
shows that 70% of time milk is sold with bread and only 30% of times biscuits
are sold with bread.
Mining of
Correlations
It is a kind of additional analysis performed to uncover
interesting statistical correlations between associated-attribute-value pairs
or between two item sets to analyze that if they have positive, negative or no
effect on each other.
Mining of
Clusters
Cluster refers to a group of similar kind of objects.
Cluster analysis refers to forming group of objects that are very similar to
each other but are highly different from the objects in other clusters.
Classification and
Prediction
Classification is the process of finding a model that
describes the data classes or concepts. The purpose is to be able to use this
model to predict the class of objects whose class label is unknown. This
derived model is based on the analysis of sets of training data. The derived
model can be presented in the following forms −
- Classification
(IF-THEN) Rules
- Decision
Trees
- Mathematical
Formulae
- Neural
Networks
The list of functions involved in these processes are as
follows
·
Classification − It predicts the class of objects whose class label
is unknown. Its objective is to find a derived model that describes and
distinguishes data classes or concepts. The Derived Model is based on the
analysis set of training data i.e. the data object whose class label is well
known.
·
Prediction − It is used to predict missing or unavailable
numerical data values rather than class labels. Regression Analysis is
generally used for prediction. Prediction can also be used for identification
of distribution trends based on available data.
·
Outlier Analysis − Outliers may be defined as the data objects that do
not comply with the general behavior or model of the data available.
·
Evolution Analysis − Evolution analysis refers to the description and
model regularities or trends for objects whose behavior changes over time.
BY
S.SANGEETHA
Asst Professor
Dept of Computer Applications
No comments:
Post a Comment