These techniques have become more famous in the Big Data era due to the new huge size of semi and unstructured datasets. Well, if you ask me, I am more interested to know Unknown as compared to Known. The general concept about Data Dredging, Snooping, p-hacking, and Fishing is, it’s the Dark Side of Data mining but for me these are techniques for Known and Unknown. This is very critical information for Customer KYC (Know Your Customer). Here Data Dredging, Snooping, p-hacking, and Fishing techniques will come into action which will search age KPI from everywhere and may also end up showing those people who don’t have Gender defined in the system and aged more than 500 years. Now, your management asks you to extract abnormal age behavior. Please note, when loading data, we have already defined the structure of data. Now as per business rule, the Age field will be loaded only for Male/ Female and aged among 1 to 99. Let’s take a simple example, there are date of birth (DOB), gender and Age fields where Age is extracted from the DOB column. But that's Not True, sometimes, even in structured data there are Unknown Facts and that’s where Data Dredging, Snooping and Fishing techniques come into the action. Data mining techniques extract Known Facts as it’s done only on structured datasets and mostly, we know what's there in structured data like relationships between tables and relationships of data within the same table. There are two types of Fact Findings in any analysis that ultimately assists in DSS i.e., 1) known facts 2) unknown facts. On occasions, it shows more details about something than it contains. These sometimes bypass Data Mining techniques and come up with immature conclusions. In other words, Data Dredging, Snooping, p-hacking, and Fishing share the results which require more investigation. Data Dredging, Snooping and Fishing all refer to the same behavior of data analysis BUT without proper hypothesis and relationship among datasets.ĭata mining finds results based on the correlation of data in large data sets, but Data Dredging, Snooping, p-hacking, and Fishing find results based on chance methodology.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |