Lots of technologists are promoting replacement of data warehouse by data lake. In the view point of our data science team, a modernized data warehouse should carry its own value for the analytics nowadays.
There is an article by TDWI.org sharing the viewpoint of CEO of Yellowbrick – Neil Carson what vital technology and tools being used.
His opinion is aligning with our data science evangelist – Samuel Sum on Hadoop. It is a large-scale data store but it is not easy to access and manage in many situations. It is always better to have a structured data store like data warehouse for easy user access. Also, a key-point of a successful data analytic environment is the storage speed storing the data. With SSD (flash memory), the data warehouse (both ETL and access) are now several times faster than before.
Finally, the editor of this webpage suggests read an article of Samuel Sum – talking about the Data Lake (Data Lake VS Data Warehouse).
As a learning company, we are always conducting our research on data science related topics. In every week, our team would share some articles suggested.
There is an article discussed about the Big Data Governance and the role of metadata. For instance, data tagging is being used in a Hadoop environment and some business glossary tools being used in data warehouse for tracing the lineage from source data to BI reports or dashboard.
Our team is always putting metadata in top priority with data dictionary and other related documentation in every project delivery. Also, metadata management and data governance are being included in our skill transfer sessions and courses delivered.
Finally, we would like to highlight the importance of data governance for better understanding on the data within the organisation leading to facilitate end-user data consumption and decision support.
There is a piece of bad news for anyone using Hadoop for data analytics. According to Radware Research team (25 Oct 2018), they have found a new botnet out targeting Hadoop clusters seeking to perform DDoS attacks.
(NOTE: Hadoop, which is an open source distributed processing framework, allows for the distributed processing of massive amounts of data and computation across clusters of computers using simple programming models).
News – Original Version
Security teams should also look to invest in mitigation tools and services that specialize in defending against a DDoS attack. Also, it is important to consider whether the Hadoop is necessary to connect to the Internet to reduce the chance of exposure for attack.