Apache Spark - Computerphile

Apache Spark - Computerphile

Computerphile

5 лет назад

245,070 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

დროის კარგვა
დროის კარგვა - 26.05.2023 18:38

"RDD is basically an array distributed across the cluster" - genius

Ответить
Michael Ebbs
Michael Ebbs - 20.09.2022 19:54

Computerphile will be excited to learn that tripods exist.

Ответить
Vijeenrosh Ponmani Walson
Vijeenrosh Ponmani Walson - 15.09.2022 04:11

What useless video : - slow down, explain slow , assume audience know not much

Ответить
BigDataLogin
BigDataLogin - 22.06.2022 10:28

thanks

Ответить
Charles Curt
Charles Curt - 20.06.2022 20:33

She must really know this stuff. Very well explained. You can always tell someone that actually knows content by how simply they can describe it.

Ответить
Waqas Ali
Waqas Ali - 02.09.2021 09:55

This was very helpful

Ответить
Tolga Karahan
Tolga Karahan - 12.04.2021 18:12

Great explanations. Of course there are many things going on behind the scenes, but good overview.

Ответить
zhitong liu
zhitong liu - 16.03.2021 15:28

content is nice, well explained.
BUT
the camera and editor are so bad.
We are not here for a documentary, the computer shot from her shoulder is completely useless and distracting, if you want to use your cuts, use something like the picture in picture but please let us focus on the code!!

Ответить
P. Z.
P. Z. - 25.11.2020 18:42

What is the architectural difference between spark and map reduce ?

Ответить
Talita Angelo
Talita Angelo - 10.11.2020 05:40

Wow congrats on the content. You were able to explain it in a concise, yet logical and detailed way. nice

Ответить
Stepan Rogonov
Stepan Rogonov - 14.04.2019 15:38

It's so clear and easy after the explanation! I will be waiting for more vids about clustering and distributed computing)

Ответить
Alex
Alex - 26.01.2019 10:20

I wish she also talked a little about Spark's ability to deal with data streams

Ответить
MOHAMED THI0UNE
MOHAMED THI0UNE - 19.12.2018 19:26

I really love your videos I would like to know if it is possible to watch them in French or at least with subtitles so that we can follow

Ответить
RonaldSVM
RonaldSVM - 16.12.2018 14:22

Sorry for redundancy, just verifying my understanding. Do I understand it correctly that (when running this example in a cluster) collect runs the 'reduceByKey' against the results on each node, and then reduces to a final result. Say on Node 1 I have count of word 'something' = 5 , on Node 2 I have count of word 'something' = 3, then collect combines from those two nodes into a count of 'something' = 8, And so on...?

Ответить
Reckless Roges
Reckless Roges - 14.12.2018 19:47

Is there any meta analysis on the usefulness of bigdata analysis? How often do jobs get run that either produce no meaningful data or don't produce any statistically significant data?

Ответить
Metalstorm
Metalstorm - 14.12.2018 08:05

Would have liked it to be a bit more in-depth and technical, was too high level.

Ответить
Christer Nilsson
Christer Nilsson - 13.12.2018 12:35

Please show some drawings or animations of data going back and forth between the noded.

Ответить
Christer Nilsson
Christer Nilsson - 13.12.2018 12:34

Please give time measurements comparing single node with multi node execution. What is the overhead?

Ответить
oldbootz
oldbootz - 13.12.2018 11:57

Thanks, nice vid.

Ответить
Couch
Couch - 13.12.2018 11:03

She's mumbling in the beginning... can't really hear her (American-born English speaker)

Ответить
MisterPotatoHands
MisterPotatoHands - 13.12.2018 10:43

What programming language is she using??

Ответить
Paulius Šukys
Paulius Šukys - 13.12.2018 10:33

typo in line 32 for using `splitLines` instead of `word`?

Ответить
Aiman Al - Fatih
Aiman Al - Fatih - 13.12.2018 09:57

its bit silly but i cant understand 100% because english isnt my first language , hope someone could add english subs on every this channel videos because i found computerphile videos are easy to understanding because excellent explanation

Ответить
hanelyp1
hanelyp1 - 13.12.2018 09:42

Looks like you could do a search engine in that.

Ответить
Alex Kompos
Alex Kompos - 13.12.2018 08:30

These data ones are really good! Keep them coming!

Ответить
William Wurthmann
William Wurthmann - 13.12.2018 08:03

Thank you for teaching an old man new things.

Ответить
Silly Buttons
Silly Buttons - 13.12.2018 07:54

More like this!!!!!!

Ответить
ZachBora
ZachBora - 13.12.2018 07:15

woohooo rebecca is back

Ответить
Kadderin
Kadderin - 13.12.2018 06:56

Was so excited to see this posted :) I'm a Cassandra professional.

Ответить
MJ
MJ - 13.12.2018 05:16

More of these, please. More big data.

Ответить
Max Mouse
Max Mouse - 13.12.2018 05:04

She's damn good at explaining and easy to listen to, any plans of having her host other episodes?

(sorry for "her" I don't know her name).

Ответить
Hourai
Hourai - 13.12.2018 04:35

The RDD API is outmoded as of Spark 2.0 and in almost every use case you should be using the Dataset API. You lose out on a lot of improvements and optimizations using RDDs instead of Datasets.

Ответить
Tom Hawtin
Tom Hawtin - 13.12.2018 03:40

A great example of how programming languages are a reasonably efficient mechanism to communicate sections of program and how natural language really is not.

Ответить
Klas
Klas - 13.12.2018 03:06

Apache Flink next please

Ответить
skiLLz
skiLLz - 13.12.2018 02:42

ahh.. so refreshing after taking a week break from dev work and staying away from non dev topics. Lol, I love our field. Like music to my ears

Ответить
Sina Madani
Sina Madani - 13.12.2018 02:31

For anyone interested, although the documentation is awful for Apache Flink and it doesn't support Java versions beyond 8, it at least lets you do setup on each node. Spark does not have any functionality for running one-time setup on each node, which makes it infeasible for many use cases. These distributed processing frameworks are quite opinionated and if you're not doing word count or streaming data from one input stream to another with very simple stateless transformations in between you'll find little in the documentation or functionality. They're not really designed for use cases where you have a parallel program with a fixed size data source known in advance and want to scale it up as you would by adding more threads, but more for continuous data processing.

Ответить
Chris Wills
Chris Wills - 13.12.2018 02:27

I study bioinformatics handling txt files many gigabytes in size and this could be so handy

Ответить
James Lawson
James Lawson - 13.12.2018 02:01

The first time I learned about Apache Spark, I was looking up documentation for another framework named Spark.

Ответить
Zuglet Smith
Zuglet Smith - 13.12.2018 01:38

really good summary thankyou!

Ответить
benjamin mellingen
benjamin mellingen - 13.12.2018 00:21

totally lost me 3 min into this video.

Ответить
Bill Oddy
Bill Oddy - 13.12.2018 00:10

Do a video explaining AES!

Ответить
Vincent Marin
Vincent Marin - 12.12.2018 23:23

feels like this video is four years too late ... :-/

Ответить
Jimmy Cheong
Jimmy Cheong - 12.12.2018 23:04

Thank you so much. This was an incredible explanation

Ответить
Christopher Willis
Christopher Willis - 12.12.2018 22:35

Ohhh, she is using VSCode! I love VS Code :D

Ответить
Michael
Michael - 12.12.2018 22:15

Really interesting video! I have done some MapReduce before, but I haven’t came across Apache Spark

Ответить
Abeltensor
Abeltensor - 12.12.2018 22:14

Good old Scala.

Ответить
Adrian F. Long
Adrian F. Long - 12.12.2018 21:56

Where are the extra bits?

Ответить
Not A Robot (Maybe)
Not A Robot (Maybe) - 12.12.2018 21:48

note to the editor: please stop cutting away from the code so quickly. we're trying to follow along in the code based on what she's saying. at that moment, we don't need to cut back to the shot of her face. we can still hear her voice in the voiceover.

Ответить
Technomancer
Technomancer - 12.12.2018 21:47

Can you do Apache Kafka next? How do they compare?

Ответить