Roaring Elephant
- Autor: Vários
- Narrador: Vários
- Editora: Podcast
- Duração: 305:42:25
- Mais informações
Informações:
Sinopse
Bite-Sized Big Data
Episódios
-
Episode 39 – Dataworks Summit Europe – Day 1
05/04/2017 Duração: 01h34minIn this episode of the Roaring Elephant podcast, Dave and I attend the Dataworks summit, meet listeners, sit in on sessions and give our overall view of the day! It's the next best thing to being here. If you ARE here, then look out for us, we'll exchange limited edition Roaring Elephant stickers for audio clips. 00:00 Intro Roaring Elephant Roadshow Day 1- Direct from Munich! 03:25 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote An Apache Hive Based Data Warehouse Interactive Analytics at Scale in Apache Hive using Druid Hadoop 3.0 in a Nutshell Running Services on YARN Streamline - Stream Analytics for Everyone (AKA SAM: Streaming Analytics Manager) Apache Atlas: Governance for your Data File Format Benchmark - Avro, JSON, ORC and Parquet An Approach for Multi-Tenancy through Apache Knox 01:27:00 Wrap up Some final overall observations and looking forward to day 2! 01:34:31 End Please
-
Episode 38 – Dataworks Summit 2017 – Preview
28/03/2017 Duração: 01h42minThis week, your hosts go over what we consider to be our pick of the sessions that will be presented during the Hadoop Summit Dataworks Summit in Munich next week. The Roaring Elephant will be in attendance, look out for the two guys in distinctive yellow fleeces with the Roaring Elephant logo on the back, we hope to see you there! 00:00 Recent events Dave DS Model Lifecycle https://www.svds.com/models-lab-factory/ Stitchfix Algorithm Tour http://algorithms-tour.stitchfix.com/ Cloudera Data Science Workbench http://vision.cloudera.com/cloudera-data-science-workbench-self-service-data-science-for-the-enterprise/ http://www.dbms2.com/2017/03/19/cloudera-data-science-workbench/ Jhon Yarn 3 Data Lake 3.0: The EZ button to deploy in minutes and cut TCO by half https://hortonworks.com/blog/data-lake-3-0-deploy-minutes-cut-tco-half/ Data Lake 3.0 Part 2 – A multi colored YARN https://hortonworks.com/blog/data-lake-3-0-part-2-multi-colored-yarn/ Data Lake 3.0 Pa
-
Episode 37 – Big Data Roles: The starter
14/03/2017 Duração: 01h22minIn this episode, we start a new series on the different roles in Big Data. Purely by coincidence, it turns out that the winner of our raffle started a new job as a Data Engineer at the beginning of this month, so naturally we decided to invite Marcel-Jan on the show to talk about the how and why of his career move. 00:00 Recent events Dave It’s morphing time: Apache Ranger graduates to a Top Level Project https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-1/ https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-2/ Data-Driven User Engagement https://www.svds.com/data-driven-user-engagement/ Driving Product Engagement with User Behaviour Analytics https://www.svds.com/driving-product-engagement-user-behavior-analytics/ Jhon Using Apache Spark for large-scale language model training https://code.facebook.com/posts/678403995666478/using-apache-spark-for-large-scale-language-model-training/ Big d
-
Episode 36 – Use-case: Single View
28/02/2017 Duração: 01h02minNo guests today, just Dave and Jhon talking so brace yourselves! This time we're actually going to explain what we mean by "single view of customer" go through explaining an example of a use-case and discuss how you might implement such a thing. Enjoy. 00:00 Recent events Dave Faster spark! http://www.zdnet.com/article/spark-gets-faster-for-streaming-analytics/ If you’re interested in reading/watching more then check out the site for Spark Summit East, the sessions slides and videos appear to all be live now https://spark-summit.org/east-2016/schedule/ Getting Started with Deep Learning/Speech Recognition http://www.svds.com/getting-started-deep-learning/ http://svds.com/open-source-toolkits-speech-recognition/ Data Driven Depression http://rcharlie.com/2017-02-16-fitteR-happieR/ http://blog.revolutionanalytics.com/2017/02/finding-radioheads-most-depressing-song-with-r.html Jhon IoT Calamity: the Panda Monium http://www.verizonenterprise.com/resources/repor
-
Episode 35 – What do people get wrong when deploying Hadoop? – Part 2
14/02/2017 Duração: 01h12minPaul Codding and Sheetal Dolas, both from Hortonworks, join us in this second part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave TensorKart: self-driving MarioKart with TensorFlow http://kevinhughes.ca/blog/tensor-kart What is Data Engineering? https://www.dataquest.io/blog/what-is-a-data-engineer/ Jhon Machine Learning is Fun (parts 1-6) https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a#.vv1lh5755 Performance comparison of different file formats and storage engines in the Hadoop ecosystem https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines How to write code using the Spark Dataframe API: a focus on composability and testing https://blog.godatadr
-
Episode 34 – What do people get wrong when deploying Hadoop? – Part 1
31/01/2017 Duração: 01h45sPaul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave Apache Beam becomes a top level project! https://beam.apache.org/ https://beam.apache.org/get-started/beam-overview/ https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/slides.pdf https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective Four Types of Data Analytics http://insights.principa.co.za/4-types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive MapR claims open source victory with patent http://www.cbronline.com/news/verticals/cio-agenda/mapr-claims-open-source-big-data-victory-patent-award/ Jhon Ransomware attacks on insecure Hadoop systems may be next, say security researchers http://www.itworldc
-
Episode 33 – Roaring News
17/01/2017 Duração: 50minThis episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode. 00:00 Recent events Dave A pair of “trends to watch in 2017” http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/ http://www.datamation.com/applications/5-big-data-predictions-for-2017.html Learning from a Year of Security Breaches https://medium.com/starting-up-security/learning-from-a-year-of-security-breaches-ed036ea05d9b#.4r22rbfjh Failing to monetise your apps, big data can help http://www.techrepublic.com/article/failing-to-monetize-your-apps-big-data-can-help/ A Perfect Illustration of the Big Data Value Chain http://www.techrepublic.com/article/a-perfect-illustration-of-how-the-big-data
-
Episode 32 – The sense and non-sense of certifications
03/01/2017 Duração: 50minIn this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions. 00:00 Recent events Dave 5 enterprise uses of blockchain today http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html Top 7 big data trends for 2017 https://datafloq.com/read/the-top-7-big-data-trends-for-2017/2493 How to discover the hidden value in your customer journey https://www.linkedin.com/pulse/how-discover-hidden-value-your-customer-journey-ronald-van-loon Jhon Achieving a 300% speedup in ETL with Apache Spark http://blog.cloudera.com/blog/2016/12/achieving-a-300-speedup-in-etl-with-spark/ The Rhythm of Food http://rhythm-of-food.net/ http://www.thefunctionalart.com/ Information is beautiful awards http://www.informationisbeautifulawards.com/news/188-2016-the-winne
-
Episode 31 – Bold Predictions, Past and Future
20/12/2016 Duração: 01h07minIn this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data. Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world? 00:03 Bold predictions - reviewing past predictions for 2016 Apace Atlas Apache Nifi Apache Spark SQL BigInsights 28:50 Bold predictions - future predictions for 2017 Fragmentation Data breaches Chat bots Self service Big Data Snake-Oil Alert Cyber security In-Memory & GPU Apache atlas BigInsights 01:07:07 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 30 – Apache Software Foundation
06/12/2016 Duração: 01h02minSo many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF! 00:00 Recent events Dave: How we caught the circle line rogue train with data https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx Black Friday 2016: Mobile vs Desktop User Behaviour http://appinstitute.com/black-friday-2016-mobile-vs-desktop-sales/ AI Machine Attempts to Understand Comic Books ... and Fails https://www.technologyreview.com/s/602973/ai-machine-attempts-to-understand-comic-books-and-fails/ https://arxiv.org/abs/1611.05118 https://arxiv.org/pdf/1611.05118v1.pdf Jhon: Paypal From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World Part 1 and Reactive programming manifesto http://www.reactivemanifesto.org
-
Episode 29 – 1 Year anniversary
22/11/2016 Duração: 01h04minOne year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to. 00:00 One year of pod-casting... Dave and Jhon reminiscing about how the Podcast got started. 06:55 Fireside chats with guests over the year 07:56 Joe Witt, Senior Director of Engineering at Hortonworks, 22:40 Michele Lamarca, Team Lead Big Data at Bright Computing 43:00 John Mertic, Director of Program Management for ODPi 01:04:23 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 28 – Talking Datameer with Erik Stalpers
08/11/2016 Duração: 59minIn this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool. 00:00 Recent events Dave Machine learning vs AI http://www.wired.co.uk/article/machine-learning-ai-explained Machine Learning Data Cleansing https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx https://activeclean.github.io/ Battle of the Data Science Venn Diagrams http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html http://www.prooffreader.com/2016/09/battle-of-data-science-venn-diagrams.html (original doc 21 september 2016) Jhon How Vector Space Mathematics Helps Machines Spot Sarcasm https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/ Straight talk about big data http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/straight-talk-about-big-data 25:10 Talking Datameer with
-
Episode 27 – Security 3: Encryption at rest and in motion
25/10/2016 Duração: 57minRounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space. 00:00 News for the week! Dave: Executives Still Relying on Gut, Not Gigabytes in Planning for Future http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html Rewriting SAS Programs for Financial Data Manipulation in R http://blog.revolutionanalytics.com/2016/09/rewriting-sas-in-r-for-finance.html Chris Surdak - Why so many Big Data projects fail http://surdak.com/innovation-vs-improvement/ Jhon: Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs (14-Sep-2016) http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs SQL on Hadoop benchmarks get serious (14-Oct-2016) http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/ WHERE IS APACHE HIVE GOING? TO IN
-
Episode 26 – Security 2: Authorisation and audit
11/10/2016 Duração: 01h10minIn this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the major players in the arena. 00:00 Recent events Dave Beyond Privacy and Security in a Connected World http://www.svds.com/beyond-privacy-security-connected-world/ The broken promise of open-source Big Data software – and what might fix it http://siliconangle.com/blog/2016/09/27/the-broken-promise-of-open-source-big-data-software-and-what-might-fix-it-2/ Meet Apache Spot, a new open source project for cybersecurity http://www.csoonline.com/article/3124497/big-data/meet-apache-spot-a-new-open-source-project-for-cybersecurity.html SMEs advised to capitalise on ‘big data’ http://www.farminglife.com/news/farming-news/smes-advised-to-capitalise-on-big-data-1-7606523 Jhon What is hardcore data science—in practice? https://www.oreilly.com/ideas/what-is-hardcore-data-scie
-
Episode 25 – The pro’s and con’s of crafting your own distribution
27/09/2016 Duração: 01h34minWhen we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/which_tool_should_i_use.html YaRrr! - The Pirate’s guide to R Blog: http://nathanieldphillips.com/thepiratesguidetor/ YaRrr! - Download the book: https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view Video tutorials to go with the above: https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ Listener Question from Sampath from Baltimore: When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broad
-
Episode 24 – Hadoop Summit Melbourne 2016 Preview
13/09/2016 Duração: 01h07minWith Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there! 00:00 Recent events Dave Cloud Security Alliance release cloud and big data security guidelines http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/ https://cloudsecurityalliance.org/download/big-data-security-and- privacy-handbook/ Common Big Data Backup and Recovery myths http://www.networkworld.com/article/3113036/big-data-business-intelligence/debunking-the-most-common-big-data-backup-and-recovery-myths.html Big Data, Google, and the end of free will http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html Jhon SuperComputing now going to hadoop style systems https://techcrunch.com/2016/05/24/crays-latest-supercomputer-runs-openstack-and-open-s
-
Episode 23 – Security in Hadoop – Authentication
30/08/2016 Duração: 01h07minIn this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security. 00:00 Recent events Dave: The new science behind customer loyalty http://insights.principa.co.za/the-new-science-behind-customer-loyalty http://insights.principa.co.za/infographic-creating-a-data-driven-customer-loyalty-strategy 5 great charts in 5 lines of R code http://blog.revolutionanalytics.com/2016/08/five-great-charts-in-5-lines-of-r-code-each.html Using big data to create value for customers, not just target them https://hbr.org/2016/08/use-big-data-to-create-value-for-customers-not-just-target-them Jhon: Linux turns 25 (25 August 1991 ) https://www.linux.com/news/linus-torvalds-reflects-25-years-linux http://web.archive.org/web/20100104211620/http://www.linux.org/people/linus_pos
-
Episode 22 – Big Data in Small Business
16/08/2016 Duração: 01h32minThe main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy! 00:00 Recent events Working with a new team in sunny cork, getting them up to speed Workshop with a global SI and a European tel-co about the upcoming phases of their big data journey Workshop with a customer who has been using Hadoop for a very long time, since Hadoop 0.2! Finally looking to migrate into the future Multi vendor workshop fraud analytics Object recognition and detection in images. 11:30 Our very own "New and Noteworthy" Dave http://blogs.teradata.com/international/streaming-analytics-story-many-tales/ http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A453888 http://research.ibm.com/cognitive-computing/ostp/rfi-response.s
-
Episode 21 – The Open Data Platform Initiative
02/08/2016 Duração: 59minThis episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave Study for Jhon 10:40 Interview with John Mertic @ ODPi https://www.odpi.org/ John Mertic, Director of Program Management for ODPi and Open Mainframe Project Find John on twitter: @jmertic If you're not familiar with the ODPi here's a few good links to get you started and interested in the area: Links to the ODPi Specifications: https://www.odpi.org/specifications Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.co
-
Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2
19/07/2016 Duração: 01h06minIn this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time... 00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec Talking point: Hero-culture or why nobody wants to talk about failure anymore Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks Talking point: Guaranteed Governance, who certifies the certificate? Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi,