Roaring Elephant
- Autor: Vários
- Narrador: Vários
- Editora: Podcast
- Duração: 305:42:25
- Mais informações
Informações:
Sinopse
Bite-Sized Big Data
Episódios
-
Episode 59 – Roaring News
31/10/2017 Duração: 35minIt's another installment of Roaring News! This time, we talk about the ensemble recommendation system allegedly used by Spotify, not-so-new kid-on-the-block-after-all Apache Pulsar, the ever so popular "Hadoop is dead" and end with a quick shout-out to the Tokyo Data Platform Conference. Dave Apache Pulsar https://pulsar.apache.org/ https://www.slideshare.net/ydn/october-2016-hug-pulsar-a-highly-scalable-low-latency-pubsub-messaging-system https://streaml.io/blog/apache-pulsar-geo-replication/ https://streaml.io/blog/geo-replication-patterns-practices/ https://news.ycombinator.com/item?id=12453080 Data Platform Conference Tokyo http://dataplatform.jp/ Jhon Spotify’s Discover Weekly: How machine learning finds your new music https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe Hadoop Was Hard to Find at Strata This Week https://www.datanami.com/2017/09/29/hadoop-hard-find-strata-week/ Please use the Contact Form
-
Episode 58 – Big Data Roles: The data scientist
24/10/2017 Duração: 01h09minIn this entry in our long-running "roles in Big Data" series, we talk to Eduardo Barbaro, a Sr. Data Scientist at Mobiquity. To say that the data scientist is a pivotal person in any big data or advanced analytics project is not an exaggeration and we are really grateful to Eduardo for spending some time on the podcast to give us his views and recount his experiences. Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc - Europe https://www.linkedin.com/in/edbarbaro/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2
17/10/2017 Duração: 57minIn this second part of Dave's tale of the Sidney Dataworks Summit, the subjects range from Apache Metron, a talk by Telstra, Australia's leading mobile provider, Yarn 3.0 and Apache Zeppelin Solving Cyber at Scale - Simon Ball https://www.slideshare.net/Hadoop_Summit/solving-cyber-at-scale-80187657 Implementing greenfield Apache Metron SOC – Telstra - Saad Ayad Slides not available :( Yarn past present future - Rohith Sharma KS - Sunil G https://www.slideshare.net/Hadoop_Summit/yarn-past-present-future Model as a service - Casey Stella https://www.slideshare.net/Hadoop_Summit/maas-model-as-a-service-modern-streaming-data-science-with-apache-metron-incubating Protecting your Critical Hadoop Clusters against Disasters - Jeff Sposetti / Sankar Hariappan https://www.slideshare.net/Hadoop_Summit/protecting-your-critical-hadoop-clusters-against-disasters Running Zeppelin in the Enterprise https://www.slideshare.net/Hadoop_Summit/running-zeppelin-in-enterprise-80
-
Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1
10/10/2017 Duração: 01h02minDave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Hadoop 3.0 – Sanjay Radia https://www.slideshare.net/Hadoop_Summit/apache-hadoop-30-community-update-79999467 JDK 8+ Port number changes Class-path isolation HDFS – 3 node Namenode, intra data node balancer for balanced storage within a node, erasure coding 10TB node recovering in a few hours on a large cluster (3000 nodes) Erasure coding 2012, 2013, 2014 Erasure coding methods, blogs or stripes Surprisingly little performance difference for EC, what’s not shown is the network bandwidth cost, which is significantly higher Yarn 3.0 Scheduler, priorities within a queue Q – Inter queue priorities Long running services, dynamic container configuration, cpu and io easy, hard to do memory Service discovery in YARN via zookeeper, dns Elastic resource model, graceful decommissi
-
Episode 55 – Roaring News
03/10/2017 Duração: 46minIn this edition of Roaring News, Dave covers the release of Apache Metron based HCP 1.3 and an HBase vs Cassandra benchmark battle. Jhon talks about some Spark tuning and scheduler inner-workings and finishes with a tale of a compliance kettle... Dave HCP 1.3 release https://hortonworks.com/blog/hortonworks-cybersecurity-platform-big-data-cybersecurity-solution/ https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.3.0/bk_release-notes/content/ch01.html Battle of the Apache NoSQL heavyweights https://hortonworks.com/blog/hbase-cassandra-benchmark/ Jhon Spark Performance Tuning: A Checklist https://medium.com/zero-gravity-labs/spark-performance-tuning-a-checklist-abb3c80efb44 How the Spark Scheduler Work http://www.russellspitzer.com/2017/09/01/Spark-Locality/ A tale of a compliance kettle… https://cupfighter.net/2017/09/a-tale-of-a-compliance-kettle Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future
-
Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones
26/09/2017 Duração: 52minIn this episode, we took an online article by Chris Riccomini and give our take on the discussion on having a single big cluster versus many smaller ones. If you are architecting a Hadoop cluster and are faced with this choice, this episode should give you a lot of information on the subject. One big cluster, or many small ones? by Chris Riccomini https://medium.com/@criccomini/one-big-cluster-or-many-small-ones-5f3126ed7045 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 53 – Roaring News
19/09/2017 Duração: 26minIn this episode of Roaring News, Dave brings up the newly released HDP 2.6.2 which incorporates IBM's move from their proprietary IOP to HDP. Jhon brings an update on the MLEAP story for productionizing your spark model. We finish off discussing the newly released Apache Atlas version 0.8.1 Dave HDP and IBM HDP 2.6.2 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_release-notes/content/ch_relnotes.html Jhon MLeap: Providing (Near) Real-time Data Science with Apache Spark https://medium.com/rv-data/mleap-providing-near-real-time-data-science-with-apache-spark-c34e7df093ca The Apache Atlas team is happy to announce the release of Apache Atlas - version 0.8.1. https://lists.apache.org/thread.html/82337a63dd216dbfa4f4609f76ceaef30de79e68dcbf726a673539b9@%3Cannounce.apache.org%3E Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 52 – Big data in travel
12/09/2017 Duração: 01h16minOver the summer, when your hosts enjoyed a well-earned vacation (well, we like to think we earned it) we could not stop being Big-Data Nerds and in this episode we talk about the Hadoop opportunities we spotted. During this episode you will hear us talk about how Big data does, could or should improve many aspects of vacationing. We talk about review sites, preventive maintenance on rental cars, IoT tracking beer levels, the social media privacy issues and much, much more. We really tried to make this a "new-style" short episode, but clearly, we still need some training... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 51 – Roaring News
05/09/2017 Duração: 38minIn this news episode (our very first one), Dave is all-out on Artificial Intelligence and its use in naming "stuff"; for some subjects it apparently works very well, for other subjects not so much... Jhon brings a blog on deploying new Kerberos functionality and a tutorial for Kafka Connect for those that have not really looked at it. The ensuing discussion on Nifi vs kafka is purely coincidental. Dave AI naming Paint (May 2017) http://lewisandquark.tumblr.com/post/160776374467/new-paint-colors-invented-by-neural-network https://arstechnica.co.uk/information-technology/2017/05/ai-paint-colour-names/ Guinea Pigs (June 2017) http://gizmodo.com/this-is-what-happens-when-you-teach-an-ai-to-name-guine-1796172891 Improved Paint (July 2017) https://arstechnica.co.uk/information-technology/2017/07/ai-paint-colours-reprogrammed/ British sounding place names (July 2017) http://www.telegraph.co.uk/technology/2017/07/20/ai-trained-generate-incredibly-british-place-names/ Bee
-
Episode 50 – Alan Gates Wrap Up (Part 4)
29/08/2017 Duração: 34minThis is the final part of our long interview with Alan Gates. In this part, Alan talks more about ODPI, Cloud First, Apache Flink, Apache Pig and we finish off with a little bit of Philosophy. A big thank you to Alan for sharing his pearls of wisdom with us! [Image from Linux.com] 00:00 Recent events Our vacation is almost over but this episode too was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about 02:10 Alan Gates Wrap Up (Part 4) 34:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 49 – Thomas Henson on IoT architectures
15/08/2017 Duração: 52minIn this episode we have an interview with Thomas Henson for you. Thomas is an Isilon Data Lake Evangelist at Dell/EMC, but in this episode he will talk about IoT architectures, related to his talk at the DataWorks Summit San Jose 2017 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:14 Thomas Henson on IoT architectures You can find Thomas Hensons blog on Big Data at https://www.thomashenson.com/ 52:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 48 – Alan Gates on the DataWorks Summit (Part 3)
01/08/2017 Duração: 35minIn this third part of our interview with Alan Gates, PMC member for various Apache projects including Apache Hive and co-founder of Hortonworks, we talk about his sessions at the DataWorks Summits and about the Summits in general. [Image taken from Linux.com] 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:38 Alan Gates on the DataWorks Summit (Part 3) Since this part of the interview goes public after the San Jose Summit, it is too late to submit abstracts for that particular summit. However, the Australian version is in a couple of months so please go to the DataWorks website for more information about that one. 35:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 47 – Deep dive into Kudu
18/07/2017 Duração: 01h11minWe've been interested in Kudu for a while. But it's something that neither of your hosts have been exposed to very much. Apache Kudu went from incubation to top level project in record time and now seemed like the time was right to dig into this piece of antelope. Mike Percy, PMC member and committer on the Apache Kudu project and software engineer at Cloudera was only too glad to come on the podcast and answer all our questions! 00:00 Recent events Since both Dave and Jhon are currently on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 01:40 Deep dive into Kudu Special guest today is Mike Percy, PMC member and committer on the Apache Kudu project. 01:11:54 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 46 – San Jose DataWorks Summit 2017 in Review
04/07/2017 Duração: 01h54minDave joined our free ticket raffle winner Pitt at the Data Works Summit in Sunny San Jose last month and they came back with almost two hours worth of exciting stories! Thanks again to Hortonworks for providing the free ticket to our raffle that Pitt won. San Jose DataWorks Summit 2017 in Review 00:01:20 Keynotes 00:31:20 Day 1 sessions 01:10:00 Day 2&3 sessions 01:54:55 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 45 – Modern Day Airships
20/06/2017 Duração: 01h09minBreaking up our series of insights from Alan Gates, we switch gears to another really interesting topic (and guest!) where we talk about the new visualisation features coming in Apache Zeppelin and we get it straight from the brains behind the new code, Bernhard Walter. Recent events 03:03 Jhon: Churn Prediction with Apache Spark Machine Learning by Carol McDonald (@caroljmcdonald) @mapr https://mapr.com/blog/churn-prediction-sparkml/ 12:12 Dave: HDFS Maintenance State by Manoj Govindassamy @cloudera https://blog.cloudera.com/blog/2017/05/hdfs-maintenance-state/ https://issues.apache.org/jira/browse/HDFS-7877 https://issues.apache.org/jira/browse/HDFS-6729 https://issues.apache.org/jira/browse/HDFS-7541 30:50 Modern Day Airships Bernhard Walter talks about the new visualisation options in Zeppelin with some of the what, why and how. 01:09:00 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you woul
-
Episode 44 – Suicidal Spark
06/06/2017 Duração: 01h11minIn this episode we're joined by Youen Chéné and Aurélien Vandel from Saagie who talk to us about their experiences deploying Spark Streaming workloads in production (based on their Dataworks Summit talk), what worked well, what didn't and what they'd recommend you might want to do if you follow in their footsteps. Enjoy! 00:00 Recent events Dave Big Data Videos http://www.kdnuggets.com/2017/05/top-recent-big-data-videos-youtube.html https://www.youtube.com/watch?v=RQ9czRAdmMs https://www.youtube.com/watch?v=hsoKlE67rTw Jhon InsightOut: The role of Apache Atlas in the open metadata ecosystem http://www.ibmbigdatahub.com/blog/insightout-role-apache-atlas-open-metadata-ecosystem https://www.youtube.com/watch?v=yQvmoDtGgbo Apache Atlas API Version 2 https://atlas.incubator.apache.org/api/v2/index.html Cloud giants 'ran out' of fast GPUs for AI boffins https://www.theregister.co.uk/2017/05/22/cloud_providers_ai_researchers/ Benchmark: Sub-Second Analytics with
-
Episode 43 – Alan Gates talks Hive (Part 2)
23/05/2017 Duração: 54minIn this episode we discuss the maturity of the Hadoop ecosystem and how hard it currently still is to get the value out of data. In the main section, we will have the second part of the interview with Alan Gates, this time talking about the place Hive has in the ecosystem. We still have more from Alan so stay tuned for more Hive goodness in future episodes! 00:00 Recent events Dave PredictionIO 0.11 release https://github.com/apache/incubator-predictionio/blob/v0.11.0-incubating/RELEASE.md http://predictionio.incubator.apache.org/ http://predictionio.incubator.apache.org/start/ http://predictionio.incubator.apache.org/system/ http://predictionio.incubator.apache.org/gallery/template-gallery/ https://techcrunch.com/2016/02/19/salesforce-acquires-predictionio-to-build-up-its-machine-learning-muscle/ Jhon Ultra-fast OLAP Analytics with Apache Hive and Druid – Part 1 of 3 https://hortonworks.com/blog/apache-hive-druid-part-1-3/ Why Big Data Hasn’t Yet Made a Dent on Farms
-
Episode 42 – Alan Gates talks Hive (Part 1)
09/05/2017 Duração: 01h04minWelcome to the life the universe and everything episode of the Roaring Elephant Podcast. We talk some news and this episode got a little bit ranty... Apologies for that; to balance it out we have a chat with Alan Gates talking about Hive for you. There was so much Alan Gates goodness, we've split it over a few sessions and here's part one... 07:00 Recent events Dave Metron graduates to Apache TLP status https://blogs.apache.org/foundation/entry/apache-software-foundation-announces-apache https://hortonworks.com/blog/congratulations-apache-metron-tlp/ 2017 Big Data Landscape https://www.linkedin.com/pulse/firing-all-cylinders-2017-big-data-landscape-matt-turck You’re doing Hadoop and Spark wrong and they will probably fail https://www.theregister.co.uk/2017/02/21/hadoop_and_spark_risks_and_opportunities/ Jhon Apache Impala Leads Traditional Analytic Database http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/ Cloudera Data Science
-
Episode 41 – News, news and some more news
25/04/2017 Duração: 33minIn this episode, due to us blowing our recording space budget with the Dataworks Summit day by day episodes (39 and 40 if you've not listened yet, go and do so!) we're just bringing you a short episode this time with news, all the news that's new and approved by the Roaring Elephants! 05:10 Recent events Superset: benefits and limitations of the open source data visualization tool by Airbnb https://indatalabs.com/blog/data-strategy/open-source-data-visualization-tool-superset http://airbnb.io/superset/index.html Even artificial intelligence can acquire biases against race and gender http://www.sciencemag.org/news/2017/04/even-artificial-intelligence-can-acquire-biases-against-race-and-gender Building a cognitive data lake with ODPi-compliant Hadoop http://www.ibmbigdatahub.com/blog/building-cognitive-data-lake-odpi-compliant-hadoop Top 5 Performance Boosters with Apache Hive LLAP https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/ Integrate SparkR and
-
Episode 40 – Dataworks Summit Europe – Day 2
06/04/2017 Duração: 01h07minIn this episode of the Roaring Elephant podcast, Dave and I continue to share our Dataworks summit experience, meet yet more listeners, sit in on a few more sessions and give our overall view of the day and the summit as a whole! It will make you wish you were here. 00:00:00 Intro Roaring Elephant Roadshow Day 2 - The night after the party! 00:04:14 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote Meet HBase 2.0 Bridle your Flying Islands and Castles in the Sky HBase in Practice Solving Cyber at Scale Achieving Realtime Ingestion and Analysis of Security Events through Kafka and Metron Row/Column-Level Security in SQL for Apache Spark Apache Kafka Best Practices Mool - Automated Log Analysis using Data Science and ML Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Backup and Disaster Recovery in Hadoop 01:02:15 Wrap up Some final overall observations and lo