What is Data Mining?

By Melissa Rudy
Updated: May 09, 2012

What is Data Mining?

You may have noticed that sometimes, the Internet seems almost psychic. Facebook knows who your friends are before you add them, and Google ads suggest products and services you actually need. You may visit a website for the first time, only to find that the sidebar ads know where you live and are suggesting restaurant deals in your area.

Contrary to appearances, these companies don't have crystal balls—they're using the magic of data mining to apply the information they do have about you, and make extraordinarily educated guesses.

What is data mining?

The details of data mining are pretty complex, but at the core, it’s the process of gathering vast amounts of data and then extracting useful information. Using ever-mysterious algorithms that only programmers and statisticians can begin to grasp, the practice can produce marketing gold for businesses.

Data mining gathers and sorts through data from thousands, millions, or even billions of points. This large-scale information discovery can be either descriptive or predictive, and can be used to detect one or more of several different types of patterns:

  • Anomaly detection
  • Association learning
  • Classification
  • Cluster detection
  • Regression

One of these things is not like the others

Anomaly detection looks for differences in data that can be compared against a standard to determine certain information. This type of data mining is often used as part of fraud defense. Credit card companies use anomaly detection to flag suspicious transactions, which are verified with the cardholder before processing.

While anomaly detection isn't commonly used from a marketing standpoint, it's definitely a useful tool for protection. This process makes it easier to pinpoint suspicious activity and prevent possible disaster.

If you like this, try that

Anyone who's bought something from Amazon is familiar with the effects of association learning through data mining. Though Amazon doesn't disclose its algorithms—and probably encourages the rumors that they have a team of programmers changing them every 30 minutes or so—the merchant giant uses association learning to make personalized online recommendations.

Even without Amazon's zealously guarded algorithm secrets, the if you like X, you'll like Y formula that can be derived from association learning can benefit any business. With a plethora of products to choose from, consumers often appreciate a nudge in the direction that's interesting to them.

People who buy car insurance like coffee mugs

With oceans of data to sort through, cluster detection is an essential form of data mining that recognizes sub-categories or distinct clusters, which people reading through piles of reports would otherwise miss. This type of data mining can point out purchasing habits among certain groups, providing an excellent source of targeted marketing.

Separating the wheat from the chaff

Classification enables the application of an existing structure for sorting into pre-determined categories. This type of data mining makes things like automated email folder routing possible. For example, spam filters use sophisticated classification algorithms to weed out messages asking you to buy Viagra or donate large sums of money to Nigerian princes.

Learning from the past

With regression, data from past behavior is collected and applied to predict your future actions. Again, the algorithms are complex, but Facebook uses regression data mining to weigh certain factors and pinpoint new behaviors to encourage, or features to offer—though there might have been an element of anomaly detection behind the decision to introduce Timeline.

How does data mining factor into your life?

If you spend any time online, whether for business or pleasure, you're affected by data mining. Your information is used by companies who want your business in various ways, including:

  • Targeted advertising, such as related products and geographical information
  • Spam that is sent to your email address when you sign up for related services
  • Phone calls from survey companies or lead generation firms using data harvested online
  • Snail mail, including offers related to things you've expressed interest in online
  • Friend suggestions through social media sites like Facebook, Twitter, and Google+
  • Police and security profiling, which sometimes relies on Internet data to identify suspicious activity like credit card fraud and illegal downloading

Data mining practices represent a good reason to protect your privacy online. Never give out personal information to an untrusted source, and avoid posting your email address, phone number, or mailing address on public websites. This can help you avoid spam, junk mail, and other forms of targeted advertising—including those eerily prescient banner ads.

Featured Research
  • Budgeting for BI in 2018

    Is your business ready for Business Intelligence (BI) software? As BI software continues to improve, more businesses are moving to adopt BI sooner rather than later. Before you make that commitment though, it pays to figure out exactly how much money it will cost you to implement. more

  • Real-time network traffic insights

    Businesses all over the world rely on Sinefa to power their network. Sinefa is the only network traffic visibility solution to use Live 1 second technology to find a resolve network issues instantly. Get set-up in minutes, not weeks, and access real-time insights in an easy-to-use dashboard. more

  • The Microsoft Freeway

    Technology is revolutionising the way we live and the world we live in, whether through advances in health, education, smarter cities or more efficient food production. Microsoft's cloud applications are a key enabler for millions of people to create this new world - to collaborate, be connected and achieve more with less. more

  • TOURtech Case Study

    Very connected audiences at live events like Coachella or Lollapalooza, challenge the temporary data networks that support them. TOURtech is now using Sinefa to transform network performance at 80-100 events every year, gaining much needed visibility and control of network traffic. more

  • Should Your Company be Using AI in Business Intelligence

    Our latest report is an in-depth look at how artificial intelligence (AI) is being used to enhance business intelligence (BI) and other data analytics. If you want to stay at the cutting edge of technology and maintain - or gain - advantages thanks to BI, AI is something that needs to be on your radar. AI’s ability to quickly and efficiently find patterns in data is a game changer. more