Roaring Elephant
- Autor: Vários
- Narrador: Vários
- Editora: Podcast
- Duração: 305:42:25
- Mais informações
Informações:
Sinopse
Bite-Sized Big Data
Episódios
-
Episode 119 – Knowage: The Open Source Business Analytics Suite
18/12/2018 Duração: 48minThis time we are joined by Paolo from Knowage who gives us a high level overview of Knowage: a totally open source suite for Business Analytics. The Knowage suite is composed of several modules, each one conceived for a specific analytical domain. They can be used individually or combined with one another to ensure full coverage of user’ requirements, allowing to build a tailored product. Thank you to our guest: Paolo Raineri Business Developer (linkedin) https://www.knowage-suite.com Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 118 – Roaring News
11/12/2018 Duração: 32minIn this Big Data News episode, we use an article on how some disgruntled open source projects tried to force the "net giants" to give back as an excuse to talk about open source ethics. The second article for today comes from the hand of Noel Sharkey about possible deception in modern robotics. Time for Net Giants to Pay Fairly for the Open Source on Which They Depend https://www.linuxjournal.com/content/time-net-giants-pay-fairly-open-source-which-they-depend Mama Mia It's Sophia: A Show Robot Or Dangerous Platform To Mislead? https://www.forbes.com/sites/noelsharkey/2018/11/17/mama-mia-its-sophia-a-show-robot-or-dangerous-platform-to-mislead Artificial Intelligence: A Modern Approach (Third edition) by Stuart Russell and Peter Norvig http://aima.cs.berkeley.edu/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 117 – Big Data Disaster Recovery
04/12/2018 Duração: 53minWhen Big data projects mature from R&D projects to business critical components, it becomes important to look at how your environment can survive and recover from catastrophic failures. Considering the not unimportant cost of a good Disaster Recovery plan, it is good to take a good look at your deployment and carefully weigh the good and bad on a granular level. Here is the link to the slideshare presentation by Carlos Izquierdo at Big Data Spain 2017: Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 116 – Roaring News
27/11/2018 Duração: 27minThis Machine Learning heavy edition of Big Data News, covers Boston School Bus schedules and Model interpretation using LIME. As a bonus, we have a great source of Nifi knowledge for you! What the Boston School Bus Schedule can Teach US About AI https://www.wired.com/story/joi-ito-ai-and-bus-routes/ Understanding model predictions with LIME https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b Introduction to Local Interpretable Model-Agnostic Explanations (LIME) https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP) https://arxiv.org/abs/1806.00663 Best of NiFi https://pierrevillard.com/best-of-nifi/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 115 – Anniversary three: I guess we’re in it for the long run now!
20/11/2018 Duração: 59minIt's been three years since we started this podcast and as we've done in previous years, we invited the wonderful people that were a guest on our show in the past twelve months and made our little podcast so much better for our listeners! Our thanks to our guests that celebrated our three year anniversary with us: Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II at Hortonworks Talking about Apache Metron Rohit Jain (linkedin) Chief Technology Officer at Esgyn Talking about Esgyn, Trafodion and cloud vs on-premise vs hybrid. Sanjeev Kulkarni (Linkedin) Co-Founder at Streamlio Talking about Apache Pulsar Phillip Radley (Linkedin) Chief Data Architect at BT Talking about future predictions made years ago Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 114 – Roaring News
13/11/2018 Duração: 26minIn this serving of bite-sized Big Data News we talk about the IBM takeover of Red Hat, a new Botnet going for unprotected Hadoop nodes and a somewhat disappointing Cloudera blog post. IBM To Acquire Red Hat https://investors.redhat.com/news-and-events/press-releases/2018/10-28-2018-184027500 https://newsroom.ibm.com/2018-10-28-IBM-To-Acquire-Red-Hat-Completely-Changing-The-Cloud-Landscape-And-Becoming-Worlds-1-Hybrid-Cloud-Provider New DDoS botnet goes after Hadoop enterprise servers https://www.zdnet.com/article/new-ddos-botnet-goes-after-hadoop-enterprise-servers/ (remember Dr.Who ? https://medium.com/@neerajsabharwal/hadoop-yarn-hack-9a72cc1328b6 ) New in Cloudera Enterprise 6: Apache Hive 2.1 (By the Cloudera Hive Team) http://blog.cloudera.com/blog/2018/10/new-in-cloudera-enterprise-6-apache-hive-2-1/ https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_601_unsupported_features.html#hive_c6_unsupported_features https://hive.apache.org/downloads
-
Episode 113 – H2OAIWorld London 2018 Roaring Report
06/11/2018 Duração: 01h02minHere is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on. The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 112 – Roaring News
30/10/2018 Duração: 26minIn this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM). BI Maturity: You can’t get there from here! http://makingdatameaningful.com/bi-maturity/ Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/ Katacoda example down on this page https://hadoop.apache.org/ozone Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 111 – How Public Cloud changed Big Data
23/10/2018 Duração: 51minNo interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 110 – Roaring News
16/10/2018 Duração: 38minAnother week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai: John Spooner Director of Solution Engineering, h2o.ai Dave: XKCD Curve Fitting in R http://blog.revolutionanalytics.com/2018/09/curve-fitting.html Artificial intelligence, data will be the differentiator in the marketplace https://www.information-age.com/artificial-intelligence-data-123475102/ Jhon: Scaling ETL: How data pipelines evolve as your business grows https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e The design and implementation of modern column-oriented database systems https://blog.acolyer.org/2018/09/26/the-desig
-
Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2
09/10/2018 Duração: 52minIn this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 108 – Roaring News
02/10/2018 Duração: 55minAnother episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower Sai Gundavelli Founder/CEO, Solix Technologies Streamlio Sanjeev Kulkarni Co-Founder at Streamlio Sijie Guo Co-Founder at Streamlio Free Big Data Event ticket giveaways: DataWorks Summit Asia Pacific Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018 To enter the raffle, send email to dws18apac@roaringelephant.org Tell us what event you want to attend! (Singapore, Tokyo, Melbourne) Solix Empower New York 2018 New York November 01, 2018 To enter the raffle, send email to SolixE
-
Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1
25/09/2018 Duração: 41minIn this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 106 – Roaring News
18/09/2018 Duração: 39minIn this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/ Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx https://www.edx.org/micromasters/columbiax-artificial-intelligence https://www.edx.org/course/robotics-columbiax-csmm-103x-4 https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4 Please use the Con
-
Episode 105 – Big Data at British Telecom with Phillip Radley
11/09/2018 Duração: 01h06minIn this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT. Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 104 – Roaring News
04/09/2018 Duração: 36minIn this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualisingdata.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio
28/08/2018 Duração: 43minMatteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap. Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us yo
-
Episode 102 – Roaring News
21/08/2018 Duração: 22minBig Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2 https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/ https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud Quick Tip: The easiest way to grab data out of a web page in Python https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future e
-
Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio
14/08/2018 Duração: 01h05minMatteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 100 – Celebrating our Centennial with the history of Hadoop
07/08/2018 Duração: 01h07min100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our memory lanes! The blockchain related Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which company had contributed more to Hadoop: http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/ Thank you for being part of the ride and now on to episode 200! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.