Lots of technologists are promoting replacement of data warehouse by data lake. In the view point of our data science team, a modernized data warehouse should carry its own value for the analytics nowadays.
There is an article by TDWI.org sharing the viewpoint of CEO of Yellowbrick – Neil Carson what vital technology and tools being used.
His opinion is aligning with our data science evangelist – Samuel Sum on Hadoop. It is a large-scale data store but it is not easy to access and manage in many situations. It is always better to have a structured data store like data warehouse for easy user access. Also, a key-point of a successful data analytic environment is the storage speed storing the data. With SSD (flash memory), the data warehouse (both ETL and access) are now several times faster than before.
Finally, the editor of this webpage suggests read an article of Samuel Sum – talking about the Data Lake (Data Lake VS Data Warehouse).
In this week, we would like to share an article with highlighting the keys of success for Machine Learning.
For the original article, the author(s) had list out the keys of success for machine learning:
- Start small with Machine Learning – this is similar to all other data analytic projects to have a smaller scoped for better management
- Machine Learning Must Have Data Quality to Succeed – data is the most important ingredient for data analytic; so, the quality is critical.
- No Universal Machine Learning Algorithms – Machine Learning itself is the approach to solve one single “specific” problem and the related algorithms should be unique by the corresponding use-case.
Original Article at DataVersity.net
Sharon Di, assistant professor of civil engineering and engineering mechanics at Columbia Engineering, has discovered the patterns of traveling highly related to the types of people. The research is based on data collected by University of Michigan Transportation Institute (UMTRI) with 349 vehicles’ continuous one-year mobile traces (19,130 travel activities).
- Seniors, who travel to a wider variety of places in a day
- Workers, who stay mostly at work or at home
- Parents, who visit more individual places in a day
News Shared by insideBigData.com
Recently, the mainland China technology giant Tencent has published their big data research report on the usage pattern on WeChat (similar to WhatsApp mobile App). They have found that people born in 90s most stressful. On the other hand, people born in 70s are those with most leisure time.
In our opinion, the findings should be valuable for planning in the society. However, the privacy should be maintained and only “masked” identity should be used for analysis.
As 2019 is just started, it is time to share different experts’ viewpoints on the trends in data science.
It’s very interesting that they are saying something “not too surprised” and many of them are running in reality. For example, large corporation management like Oracle is still talking about the Artificial Intelligence (AI) and Machine Learning (ML) with the interview with technopedia.com.
Article by Technopdiea.com
However, there is another article trying to consolidate different sources to see any common ground about the data science trends in 2019.
Article by DataVersity.net
In this article, more different areas are being covered such as Virtual Reality and Information Security.
To sum up, it is more mature to have more solutions by the support of data science. We are moving from data analytics to intelligent automation.
Our team leader / founder – Samuel Sum has written an article on his blog about Data Lake and Data Warehouse. There are lots of people trying to drop their data warehouse. However, Samuel is providing his viewpoint on the value of the data warehouse. Also, his suggestion on data lake architecture is being discussed by the real-world experiences with our professional service team.
We are always visiting our clients to discuss the value of data analytics. One of the area is being argued about data analytic failed to provide insights for exceptional big deal in the B2B business. The real case situation is being applied in a consulting business like our team with 1 to 3 mega deals in millions scale contributing up to 30% to 40% revenue in total.
Most people think that data analytics should only be good in “regular” deals rather than “mega-deal(s)” due to the data availability. Nevertheless, we have won some projects by lots of research and development based on our own KPI knowledgebase of KPI, open data and data from the Census and Statistics Department.
Original Viewpoint from McKinsey:
The article of McKinsey shares their viewpoint for high quality “small” data could be the key of making Megadeal.
As a learning company, we are always conducting our research on data science related topics. In every week, our team would share some articles suggested.
There is an article discussed about the Big Data Governance and the role of metadata. For instance, data tagging is being used in a Hadoop environment and some business glossary tools being used in data warehouse for tracing the lineage from source data to BI reports or dashboard.
Our team is always putting metadata in top priority with data dictionary and other related documentation in every project delivery. Also, metadata management and data governance are being included in our skill transfer sessions and courses delivered.
Finally, we would like to highlight the importance of data governance for better understanding on the data within the organisation leading to facilitate end-user data consumption and decision support.
In this week, we are sharing another article on what we should care about data mining and/or predictive analytics. For business, it is aimed to improve competitive advantages against peers in the some market. In this article, the writer shares the fundamentals to be considered in the investment on analytics.
Original Article by Vikash Kumar:
We do believe that a basic understanding on the core of data analytics should be important for everyone in the world nowadays.
Another day, there is another sharing of an article worth to read. In this week, we would like to highlight the importance of privacy when doing data collections. Even with GDPR is now applied to EU countries, but there is still room of improvement for handling data privacy. There are lots of data analysts and data scientists’ collecting too much details including unnecessary personal data for their projects.
Article from ITPRO (itpro.co.uk)
They are sharing the bad example of Microsoft for collecting personal information from Office 365.
As experienced data team, we are putting “ethical data science” as the first priority. Masking personal information and anonymous data should be enough for most cases of data analytics. Therefore, institutions and governments should always refine guidelines and rules on data collection to strengthen personal data protection aligned to the technology development.
There is an article in the InsideBigData.com talking about the gentrification. In US, the original term refers to higher income (white people) moving their homes and businesses into low-income minority neighborhoods. However, similar situation is found developed cities like Hong Kong, you can find professionals or middle class moving into Shamshuipo (a district with lowest average household income) due to Urban renewals with newly introduced tall residential buildings. It changes of the business environment. You couldn’t find any café 10 years ago in this area. However, there are 5 different “luxury” café in the district about 1047 heatares.
For facing the dynamics of a city, data is very important for businesses to identify the market trends in home building and estimating the demand for commercial space with the categorization of activities. With the data science and data analytics, it helps to explore more possibilities with insights & prediction from data.