Roaring Elephant

  • Autor: Vários
  • Narrador: Vários
  • Editora: Podcast
  • Duração: 305:42:25
  • Mais informações

Informações:

Sinopse

Bite-Sized Big Data

Episódios

  • Episode 159 – OCMAWDIAM Part 2?

    24/09/2019 Duração: 47min

    Today we are joined by Mark Phillips, product marketing manager and more interestingly, heavily involved with Ansible at Red Hat. Now, rather than making this episode specifically about Ansible, Mark shares his extensive expertise on the subjects of orchestration, config management and automation. Mark shares his 25 years of experience on various questions covering the usual  "what, why and who". A bit of history is included and of course the difference in approach for cloud versus on-premise also come up. Specific terminology is explained and we cover the usual excuses for not using things like config management. For those that are still on the fence the information shared by Mark should give you a firm grasp of the concepts and deployment methods and help you get started. During the interview, Mark mentioned a number of blogs and other online resources: Why failure should not be celebrated in the startup world "Migrating the runbook - from legacy to DevOps" at IPExpo London 2015 As work gets m

  • Episode 158 – Roaring News

    17/09/2019 Duração: 41min

    The main topic for this news episode is a revisiting of the Multi-Cloud subject we touched last time. Next we take a look at an article about the state of the  Docker project and we end on an article about an excellent post-morten by Monzo about some trouble they had over the summer. The Muli-Cloud article we wanted to discuss… When we discussed the subject of Multi-cloud on the last News Episode, we did it from an article we though wasn't very good. As luck or faith has it, we came across a different article that actually gave us a better start to that particular discussion and we're not above doing just that! In this pretty good article, we follow the article along the 5 talking points he find relevant and important, adding our views. And don't fear, with a subject as broad as multi-cloud, there is plenty to talk about! Community does matter in Open Source! Go figure… When I got the title for this article from Dave, my first thought was "I didn't know Docker went a

  • Episode 157 – Orchestration, config management, automation, what does it all mean?

    10/09/2019 Duração: 43min

    Today we are joined by Mark Phillips, product marketing manager and more interestingly, heavily involved with Ansible at Red Hat. Now, rather than making this episode specifically about Ansible, Mark shares his extensive expertise on the subjects of orchestration, config management and automation. Mark shares his 25 years of experience on various questions covering the usual  "what, why and who". A bit of history is included and of course the difference in approach for cloud versus on-premise also come up. Specific terminology is explained and we cover the usual excuses for not using things like config management. For those that are still on the fence the information shared by Mark should give you a firm grasp of the concepts and deployment methods and help you get started. During the interview, Mark mentioned a number of blogs and other online resources: Why failure should not be celebrated in the startup world "Migrating the runbook - from legacy to DevOps" at IPExpo London 2015 As work gets m

  • Episode 156 – Roaring News

    03/09/2019 Duração: 35min

    In this Roaring News episode we start debunking a "85% of companies use multi cloud" statement, look at the future of Data Engineering and are completely astounded at the amount of tracking that happens on the world wide web. We close off with a deeper look at the Cloud Native Computing Foundation open sourcing the Kubernetes audit and go another round on "Smart Data" versus "Big Data". Multi-Cloud Confusion   With this first article, we want to do a little bit of debunking what recently has become one of the more hyped statements in the Big Technology space. If you believe "them", almost every company out there is now operating in multiple clouds! Now, this does not necessarily need to be a false statement, it just depends on how you define "cloud". We discuss the different views and offer some advise on how a successful multi-cloud strategy should work. And it immediately illustrates how difficult a real multi-cloud deployment actually is... Data Engineering of Future Past This article touches on a fe

  • Episode 155 – NoSQL: You keep using that word…

    27/08/2019 Duração: 37min

    For a podcast on Big Data, we were amazed that we never covered the subject of NoSQL. So we're correcting this today. Not by listing and comparing all the NoSQL solutions out there, but rather by going over the differences between the two paradigms. This way we hope to offer enough insight so you can feel comfortable deciding if you should or should not deploy NoSQL in your environment. There are definitely a number of really important benefits to using a NoSQL solution in your environment and your co-hosts are big fans of the technology. However, make sure to carefully consider the positive and negative consequences. Make sure you are going for NoSQL for the right reasons. The discussion in this episode should give you a good bases for that decision but do let us know if there is something we missed? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 154 – Roaring News

    20/08/2019 Duração: 35min

    In this News installment, coming to you courtesy of Dave and Endgaget, we talk about how the Osaka track is paving the way for data free flow across borders and take a look at the alleged problem with the UK's facial recognition system. Ending on a "high note", we discuss how Facebook and Googles (and a lot of other's) MLPerf benchmark is going to change the way we look at our machine learning setups. Or not... Data without frontiers! Have world leaders woken up and smelled the coffee that is reality or do they still believe data flow is something that stops at a border check point? The Osaka track that was initiated by Japanese Prime Minister Shinzo Abe seems to indicate enlightenment is at hand. However, real concrete information is rather hard to find so we engage in a bout of theory-crafting on this subject in the hopes of coming to a useful conclusion. Wish us luck!   I'm sure I've seen that face before... This article on how some people claim the UK's facial recon system is quite bad and other

  • Episode 153 – How Secure is the Future of Open Source?

    13/08/2019 Duração: 01h04min

    The way open source software is being consumed has changed drastically: originally found on the fringes, open source technology has now become a core part of many organizations of all sizes. We take a look at the confusion and sometimes vocal irritation that has accompanied this adoption by "Big Business" and ask the question if the future of Open Source is in danger. We have been playing with the idea to give our view on this subject for a while now, but we wanted to make sure not to add to very flammable situation. Rather, we try to share usefull information and stay as close to an unbiased narrative as possible. We end the conversation on a positive note, being hopefull that the inherent openness and transparency that imbues Open Source will prevail and a new equilibrium will be found. We are not basing this discussion of any specific article, but here is a list of articles that we reference during the discussion. The CBInsights article is waht we consider the most FUD-less of them all containing a lo

  • Episode 152 – Roaring News

    06/08/2019 Duração: 42min

    Another fortnight, another roaring news episode covering this time: de-anonymizing anonymized data is reportedly easy, Kubernetes is easier than Big Data, Big Data is hard and hard to understand Kafka can be made easier using Factorio visualization. It's not because you're paranoid, they're not out to get you! Not totally unsurprising to your co-hosts, this article discusses how easy it is to recombine previously anonymized data to regain the ability to identify a person, based on the data sets. Now this does involve combining multiple datasets and this is something legislators have warned against in the past. GDPR specifically has a clause that adreses this and data owners need to exercise care to avoid this fro happening. That being the case, though, there are bound to be entities that are not bound by privacy legislation, or that simply choose not to follow them. So long story short (too late!), take care abot what information you share where! Going on a litle tangent, we discuss how bad this loss of

  • Episode 151 – Do you only need 6 principles?

    30/07/2019 Duração: 53min

    A little while ago, Dave came across an article by Francesca Lazzeri titled "The Data Science Mindset: Six Principles to Build Healthy Data-Driven Organizations" and in this episode we're giving our view and expand on those principles. Is it really possible to define a successful data science organization following 6 concrete principles? Are these principle a step by step, one after the other plan you can follow on the road to success, or are these principle something you need to keep in mind from the start up until the end of days? 1. Understand the Business and Decision-Making Process  We're pretty much agreeing with this one and expanding on it, we talk about the benefits of doing this exercise on streamlining the organization and security. However, to achieve the C-level support which we agree is needed, some free-form experimentation needs to take place to get to a position where you actually have something that can be shown in a clear and concise way to said C-level. However, when the step to pro

  • Episode 150 – Roaring News

    23/07/2019 Duração: 33min

    In this news episode, we use a nice little article on how you can help keep open source sustainable as a structure for a broader discussion on this subject. The second subject this time goes another round on the "data engineers are not data scientists" (and the reverse) subject. Ask not what Open Source can do for you, ask what YOU can do for Open Source! Many organizations, commercial and not, are using open source software so heavily, they are becoming dependent on open source for their own survival. So when you look at how you can support open source, it is not an entirely altruistic project, but makes just good business sense. Using this article for structure and inspiration, we go over the different way everybody, including YOU, can help keep the open source movement sustainable. Donating some hard cash, employing open source committers and just be an open source advocate are just a few possibilities. On a related note; do you want to keep this little open source focused podcast sustainable? Please

  • Episode 149 – The State of Developer Ecosystem 2019

    16/07/2019 Duração: 01h03min

    When friend of the show Ward Bekker sent us a link to the recent survey write-up on the State of Developer Ecosystem in 2019 by JetBrains, we immediately set up a recording date with him to go over all the facts and figures... DevOps appear to be quite rare The first thing we picked up on is how many organizations are still surviving without any kind of DevOps. Even though everybody is talking about DevOpsand config management, it would appear, at least according to this survey, that these tools are still far from prevalent in the development environments. After discussing the different facts and figures contained in the webpages on the JetBrains website, we were left wondering how generic the target group was. Since this survey was conducted by JetBrains, it would definitely make sense that the respondent population was taken from their customer base and this could skew the results towards smaller, "Indy" development environments. The sense and non-sense of Multi Cloud deployment We then take a bit of a de

  • Episode 148 – Roaring News

    09/07/2019 Duração: 32min

    With Summer starting and news drying up a little in the heat, we managed to find some interesting things happening at the Apache Software Foundation and we try to find correlations with the Cloud Native Computing Foundation. After that, we discover that Robots actually won't be taking all our jobs... Who would have thought... The more things change, the more they stay the same..?     While the ranks have closed and the messaging is "everything continuing as usual, nothing to see here", things are apparently happening at the ASF with some top level people moving on. Since only the future can tell how (and even if) this will have any noticeable effect, we have a little discussion about software foundations in general. Aside from the ASF, we talk about the Linux Foundation and the CNCF who also have their role to play. One is still glad to be of service! Over the years, there has been more than a little bit of fear mongering going on about how robots and technology in general will destroy a lot of jobs. I

  • Episode 147 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 2)

    02/07/2019 Duração: 44min

    In this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this second part, we go into more depth on the practical consequences of implementing MLOps and the various tools that are available. We also go on a bit of a tangent discussing why traditional enterprises are still having a hard time to look at machine learning models as something that requires and benefits from things like model management, version control and periodic updating of models. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the su

  • Episode 146 – Roaring News

    25/06/2019 Duração: 36min

    A new function is being called into being by Forrester called the "Data Hunter" which sounded interesting enough to us to spend some time on. Then we cover a nice guest blog on the Cloudera site and we finish off with some rambling on he changes in the HPC world. Enjoy! Loincloths and spears to the ready: the Data Hunter is born! Dave found a small arcticle on the Forrester site that points to a paid webinar about Data Hunting. Now we did not pony up the 300$ they charge for the webinar, but we found the concept quite compelling and looked at the three "audience questions" that were included in the article. The "Small File Problem" and a little "You're Doing it Wrong"...? This guest blog on the Cloudera web site actually has some practical information that can be useful when you need to consolidate your incremental upload files to reduce the amount of files your Hive queries need to traverse. The additional complexity here was that this had to happen on a live production environment without service inter

  • Episode 145 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 1)

    18/06/2019 Duração: 45min

    In this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this first episode, Alex talks on a more theoretical level about MLOps and the benefits it can deliver. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the subject, there is not that much "prior art" available just yet. The main advantages that MLOps can deliver, according to Alex, are a much improved move to production of trained algorithmes, even allowing for CI/CD, and a more structured approach to training models where multiple data scienti

  • Episode 144 – Roaring news

    11/06/2019 Duração: 37min

    In the past week, trouble at Cloudera really stood out and in the context of similar problems at MapR and (somewhat less related to Big Data) Pivotal, we are devoting the entire episode to this. (Image taken from https://media.thinknum.com/articles/is-hadoop-hype-wearing-off-the-answer-may-lie-in-startups-data/) As this is a Roaring News Episode, we will discuss this story based on a number of articles we found. Cloudera has a "bad" day... The combination of some bad quarterly results and both CEO Tom Reilly and chief strategy officer and co-founder Mike Olson leaving the company have had a dramatic effect on the stock price. Now this could be an isolated incident, quickly forgotten, but in the light of similar issues at MapR (which is not a public company) and Pivotal, there does seem to be something more fundamental happening in these Open Source, venture capital fueled companies.   Looking at job listings over the years   The second article we discuss (from which we also took the image above becau

  • Episode 143 – Spark in Action with author Jean-Georges Perrin (Part 2)

    04/06/2019 Duração: 58min

    And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a  posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In the second part w

  • Episode 142 – Roaring News – KubeCon 2019 Report

    28/05/2019 Duração: 47min

    A little over a week ago, KubeCon and CloudNativeCon happened and our independent Roaring Roving Reporter Rubik Dave came back from Barcelona with a comprehensive report. Kubernetes As the kubernetes.io webpage tells us: "Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications." As we discuss in the episode, Kubernetes forms a kind of middleware layer that performs orchestration of light weight docker containers. To be sure, you can use other container technologies but Docker (and its companion project Moby) are what is most often used with Kubernetes. The biggest advantage of Kubernetes, I believe, is how it has standardized the way a micro services framework based on docker container instances can be deployed and managed. There have been a myriad of other approaches that tried to solve that problem (and Dave gives a rather exhaustive list in the episode), Kubernetes has emerged to be the best supported by the community. KubeCon And that

  • Episode 141 – Spark in Action with author Jean-Georges Perrin (Part 1)

    21/05/2019 Duração: 48min

    And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a  posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In this first part o

  • Episode 140 – Roaring News

    14/05/2019 Duração: 36min

    Another week another feed of roaring news articles starting with apparent changes at MapR and the release of Red Hat Enterprise Linux 8. We go in depth on the open sourcing of the DataBricks developed Delta Lake and finish with some SQL generated fractals. Big thanks to our Roaring Patreons making this podcast possible! DataWorks Summit free ticket raffle. Final week for our DataWorksSummit Washington  DC free ticket giveaway! Get your free ticket now! The Roaring Elephant on YouTube. The Roaring Elephant YouTube channel has launched! Will you help us reach 100 subscribers (modest goals are a good start!) so we can claim our personalized URL on YouTube? Every time a new episode is published, you will find a video uploaded to the channel as well. There won't be any real video yet though, only a still image as you can see in the thumbnails. But as soon as we reach the related goal on our Patreon, this is where our video content will appear. In case you are wondering, when we start recording ac

página 17 de 24