看板 Tech_Job 關於我們 聯絡資訊
哈囉,跟大家推薦,絕對優質-Podcast-Iterative Venture 本次分享-EP4重點內容給大家: Building Data-Driven Products with Data Science Founders Panel 絕對值得大家閱讀,滿滿的學習與心境分享, 若看完有興趣的大家請聽起來!並請支持創辦人個人FB 粉專 https://www.facebook.com/richardcheniterativeventure/ Podcast『每月』定期更新,歡迎訂閱收聽!並且推薦給你的朋友, 邀請Podcast大多數訪談來賓,來自有Facebook工作經驗員工 分享許多最精華和成長技術主題或創業、矽谷科技第一流資訊和工作經驗。 Podcast創辦人,本身曾經在矽谷-facebook從事: Data Science, Data Engineering, and Backend Software Engineeering。 也非常榮被『 矽谷為什麼』-Podcast邀請訪談, 如何進入數據科學領域,成為科學家? 透過經營粉專,跟大家分享各種深入數據議題! 根據媒體文章報導,21世紀最性感的職業『數據科學家』。 數據科學家在做什麼嗎? 數據科學家,需要運用到那些技能、統計及構建預測模型? 為什麼需要數據?如何透過歷史數據做更精準、有效決策? 對於數據科學未來十年,如何做出決策評估風險的挑戰? 如何建溝一個更好的data分析評估,它的設計影響公司產品分析, 並根據data幫助企業實現目標和提出新的建議, 不同的商業模式,我們都需有更多洞察力解決問題。 【Podcast重點方向】 The panels shared their perspectives on the current landscape of the Data Science startup ecosystem, the importance of data-driven decision-making culture in building products, and tools that the founders in the panel are building such as Statsig's Product Experimentation tool. 【內容討論】 【Iterative Venture Newsletter: Data Landscape in the 2020s】 ( In our podcast-EP4 episode on discussing the latest data trends.)  "First reckon, then risk." - Von Moltke. Why do we need data and historical perspective Oftentimes, we overhear conversations that companies get acquired for their “ data”, their data platform, or that we have clean (or unclean) “data” etc. What are they trying to say? Why would a company get acquired purely for their data? What is data and why do we need them anyway? The way we understand data is that data are sample points much like the experiences we have in life. The more we see the world, the more data points we have, and therefore the more apt we are to make decisions (hopefully good ones) based on the experiences we have had. This is no different for a company. Thus, data is essential knowledge for the company and our brains are just very complex and efficient data infrastructures enabling us to store data, make decisions, explain such decisions, and iterate after incorporating new information. To make good decisions, therefore we should have a lot of data points to learn from and be efficient to access them. This is what led to Facebook’s creation of Hive, Presto, Scuba, and the likes to suit different problems such as quick data accessibility for debugging purposes and simple insights (scuba), cost-effective big-data data warehouse (Hive) for data crunching, as well as the mixture of the two (presto) so to have more “citizen data scientists, as per Ashu. The new decade Humanizing AI At the turning of the new decade, and with the rise of the optimism of what machine models and AI can bring, so too comes the challenge of how to provide an explanation into how such models make decisions. This is where Krishna's company, Fiddler.ai, comes in. (Fiddler.ai:https://www.fiddler.ai/ ) When it comes to complex machine learning models such as random forest, XgBoost, and neural network, etc. the models can sometimes be so complex with intertwined permutations that explanation is simply impossible as there is not a single formula we can boil down to. For reference, below is an example of how neural network decision works at a very high level with each of the circle as a variable: (Picture 2:Source: IBM) Just how does it work? Many of us just take it as a black box. We spoke(Podcast EP1) with Krishna a while ago about this and he said that the reason why he is solving this problem is that there is a monumental shift in traditional industries such as banking and insurance. Namely, there is a paradigm shift in the way how risks are assessed. In the past, models were deterministic. That means we have a complex if-else switch statement in place where we can evaluate whether someone is a risk and therefore we should deny their credit card application. The good thing is that we know why we rejected someone. The downside is that our models are fixed and the importance of each criterion is pre-determined. Machine learning models on the other hand can easily incorporate new data and the models can easily be swapped with other models and thus can offer a dynamic solution to the problem. Check out the podcast episode we had Krishna to learn more on how Fiddler is solving the problem. Faster Iterative Loop Another challenge that tech companies face in the new decade is the ever-competitive landscape as there are more and more entrants. For reference, the number of apps hitting the Apple App Store is exponential. (Picture 3:Source:Statista (Note the time scale changes in 2020)) In this landscape, differentiation and the ability to learn and incorporate new changes become absolutely key. To this point, there is a saying in Silicon Valley that if there are two startups in competition against each other, the one that iterates faster will win. This is where Vijaye and Statsig (short for statistical significance) comes in. What Statsig is hoping to achieve is to allow any tech companies to easily set up the gatekeeper mechanism (the capability to show certain users a feature that a company wants to roll out and not others for comparison purposes) and the dashboard to collect relevant statistics for the control and test group in order to better understand whether the product launch had achieved the intended purpose. Furthermore, Statisig offers the capability for companies to gradually roll out their features in releases in a controlled fashion. This not only achieves the goal of relying on the platform for release as well as data collection but it also informs the relevant product teams of the effect of any of the releases thus inherently building in the data-driven culture as every release can now be backed up by statistics and data. Democratizing Data Technology From a broader overview and from investors’ perspectives, there is also general democratization of data technology as well as vertical integration for efficiency as per Ashu and Ravi. As more and more folks coming from companies such as Google, Amazon, and Facebook, are spreading the data-driven culture, more and more people are realizing the power of data and are thus seeking to democratize data usage and thus allowing for more “citizen data scientists”. One angle of such is to reduce the technical bar in order to achieve the same result. One prominent example that we came across in the past was Looker, acquired by Google in 2019 for $2.6 billion, as Looker looks at data as objects and thus allows for users to create LookML models (data objects) and to be joined with other datasets thus enriching the data in a drag and drop fashion reducing the need of having to write complex joins via SQL queries. On the other hand, having vertically integrated platforms also means that we can have fewer people working on the same system as the systems can be centrally configured without companies having to write custom software. One prominent example of such is the rise of cloud technology where in the past a company may have to set up their own data centers but now everything can be configured elastically via AWS, GCP, or Microsoft Azure. Such trend is only accelerating as various buzzwords such as DataOps and MLOps enter into our common use. Just a matter of time With various trends emerging, the challenges for many companies, especially ones that are older, are that there are still a lot of data that reside in on-premise data centers as per Ashu. Such data centers, because of archaic technologies, do not lend themselves to easily transfer data across and thus people have to physically remove the hard drives in order to copy over the data. On the other hand, while such trends are emerging in Silicon Valley and other tech hubs, many companies may not be able to afford the same level of compensation for tech employees to justify the said employees to go to other companies (a good number of which are legacy ones) and thus spread the data-driven culture outside of Silicon Valley. However, just with all problems, it is a matter time that such culture will spread beyond the valley and I have no doubt that talented folks will come up with new solutions that it may be both cheaper, more accessible, and easy to use for companies of all kind to adopt a more data-driven solution and ultimately, shape a more data-driven culture. Podcast Link: Spotify: https://lnkd.in/gYSm-Vvm Apple: https://lnkd.in/g6kYiitq Google: https://lnkd.in/gB_p7gMR Any feedback would be appreciated 看到這裡的大家,非常謝謝, 如果這篇文章有幫助到大家學習, 歡迎至Facebook: Richard Chen - Iterative Venture 按讚追蹤, 若有相關疑問,歡迎大家透過粉絲專頁私訊,也可以在底下留言:) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.231.29.166 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/Tech_Job/M.1634829810.A.908.html
eduishappy : 謝謝分享 10/22 03:18