Data Mining

Considerations on Data Mining & Predictive Analytics

In this week, we are sharing another article on what we should care about data mining and/or predictive analytics.  For business, it is aimed to improve competitive advantages against peers in the some market.  In this article, the writer shares the fundamentals to be considered in the investment on analytics.

Original Article by Vikash Kumar:

We do believe that a basic understanding on the core of data analytics should be important for everyone in the world nowadays.

Data Mining


Data Science & Personal Data Protection

Another day, there is another sharing of an article worth to read.  In this week, we would like to highlight the importance of privacy when doing data collections.  Even with GDPR is now applied to EU countries, but there is still room of improvement for handling data privacy.  There are lots of data analysts and data scientists’ collecting too much details including unnecessary personal data for their projects.

Article from ITPRO (

They are sharing the bad example of Microsoft for collecting personal information from Office 365.

As experienced data team, we are putting “ethical data science” as the first priority.  Masking personal information and anonymous data should be enough for most cases of data analytics.  Therefore, institutions and governments should always refine guidelines and rules on data collection to strengthen personal data protection aligned to the technology development.


ShamShuiPo - Gentrification

Identify Gentrification and Prediction on Demand Changes

There is an article in the talking about the gentrification.  In US, the original term refers to higher income (white people) moving their homes and businesses into low-income minority neighborhoods.  However, similar situation is found developed cities like Hong Kong, you can find professionals or middle class moving into Shamshuipo (a district with lowest average household income) due to Urban renewals with newly introduced tall residential buildings.  It changes of the business environment.  You couldn’t find any café 10 years ago in this area.  However, there are 5 different “luxury” café in the district about 1047 heatares.

Original Article:

For facing the dynamics of a city, data is very important for businesses to identify the market trends in home building and estimating the demand for commercial space with the categorization of activities.  With the data science and data analytics, it helps to explore more possibilities with insights & prediction from data.

ShamShuiPo - Gentrification

Potential DDoS Attack from Hadoop

There is a piece of bad news for anyone using Hadoop for data analytics.  According to Radware Research team (25 Oct 2018), they have found a new botnet out targeting Hadoop clusters seeking to perform DDoS attacks.

(NOTE: Hadoop, which is an open source distributed processing framework, allows for the distributed processing of massive amounts of data and computation across clusters of computers using simple programming models).

News – Original Version

Security teams should also look to invest in mitigation tools and services that specialize in defending against a DDoS attack.  Also, it is important to consider whether the Hadoop is necessary to connect to the Internet to reduce the chance of exposure for attack.

Big Data Trends in 2019

Top 7 Big Data Trends in 2019

In this week, we would like to suggest reading an article published in “AnalyticsInsight.Net” about Big Data Trends in 2019.  To be honest, most of them are happening in 2018 and some of them are already running in our data science lab by Smart Data Institute Limited and Clear Data Science Limited.

Original Article:

Top 7 Big Data Analytics Trends For 2019

The trending things are listed with our additional notes / views:

  1. Fast Growing IoT Devices – it is becoming hot from 2017 and keeping it up
  2. Predictive Analytics – this topic is not new and most people are not doing well in the past
  3. Dark Data – only a lower percentage of data is being analyzed. So, there is still a big room to grow.
  4. CDOs in Demand – not only CDOs, but all roles in data science is facing a big demand
  5. Quantum Computing – it sounds nice for Quantum Computing for solving any complex problems but there are still getting to improve for the integration of tools
  6. Open Source – SMBs should start here if without any analytics before
  7. Edge Computing – it is closely related to IoT (in our opinions, not the writer of the article) with computing power close to end-user for sharing workload from centralized computing power.  It could leverage the network and server power (also cloud) by sharing workload to local computers near users.  Thus, it is aligned with the IoT characteristics.

Finally, we would like to say “not just read other people’s ideas, turn them into actions for your own experience”.  This is always the believe of our learning team and committing team members.

Inventory Management by Intelligence

Inventory Management is extremely critical for retail, wholesale and manufacturing.  Customer Satisfaction, cash-flow and profitability are highly related to the inventory management.  There is another area to be considered – logistic management.

Thus, there are lots of people trying to minimize the costs with prediction on damage stock, healthy stock level and best-selling channels / retailers, etc.

We would like to share an article with this vital topic for many businesses:

If the prediction is being accurate, the fruitful results should be reflected immediately.  Previously, SDi team members have participated a project with a battery manufacturer to provide alerts for expiry and prediction on battery demands by retailers.  Their revenue is being improved by >20%.  Another real example is a luxury retailer to improve their fashion inventory by prediction of sales.  The stock level in different stores are better managed by the monitoring and prediction systems.  They have saved >10% of wastage (disposal) within their chains operating in China.  These are real cases leading by our Data Science Evangelist – Samuel Sum.

Inventory Management

A Concise Introduction about Data Mining

In this week, we would like to share an article by a British magazine.  This article is giving concise definition and explanation across jargons of data mining.  There are still many so-called data experts not able to understand the right situation for classification or regression.  Please get it right by understanding more…

Recommended Article:

Data Mining Example

Recommended Article: How does AI Training Works?

In this week, we would like to share an article discussing the basic concepts of AI with the role and importance of training.  Training absolutely crucial that everyone involved in the development of your model understands how it works.

Original Article:

The Arthur for the article uses an example for a basket of songs in English and Spanish for the machine to categorize the song according to the language.   He points out a number of common problems, such as:

  1. Data quality for training
  2. Overfitting of Sample
  3. Testing considerations

Personally, he is discussing machine learning – a subset of AI rather than the huge picture of AI.  Nevertheless, he has illustrated the fundamentals of the “machine learning” processing steps and concerns.  It is still an article worth to read for beginners.

Machine Learning

Business Value Growth

Recommended Article: How Data Management in an Opportunity to Create Business Value

Businesses are taking more serious concerns in investing data management like data warehouse, data lake, etc. However, business owners should put the expected return on the investment in the first priority. In our team experiences, the value being created by mainly cost cutting, improving work-flow existed, helping generation of new sales, etc.
In the article below by InsideBigData, Alika Cooper shares her viewpoints where business value can be increased.

How Data Management is an Opportunity to Create Business Value?

Business Value Growth