Solving one of PostgreSQL's biggest weaknesses.

11 месяцев назад

183,416 Просмотров

Комментарии:

@jojohehe3251 - 26.11.2023 08:50

Is this time scale setup useful for a transactional database such as for a booking system like uber ? Could we structure the data search that every minute of the day is regarded as a bookable resource for every operator and then search through that time series data?

Ответить

@VoldDev - 23.11.2023 16:49

my biggest Postgres database is 5 million company records at 8 GB, reaching 50 GB seems impossible

Ответить

@reejhghosh7840 - 18.11.2023 07:20

Honestly, I found ClickHouse to be far more performant for our my case (time series analytics). The first time I used ClickHouse it felt like magic.

Ответить

@itachi2011100 - 16.11.2023 01:45

no explanation of the underlying technology 👎

Ответить

@nandomax3 - 13.11.2023 14:03

It's a weakness, but postgresql never said they would be good at everything. It's ok to choose different solutions for different problemas. If you are working with time series it does make sense to optimize and choose a better match for this use case. But it's not PostgreSQL fault

Ответить

@xpusostomos - 13.11.2023 07:52

Actually, technically Postgres natively supports time scale data. Postgresql does not. Because the feature was in the Postgres Berkeley code but removed by the Postgresql team.

Ответить

@sayemprodhanananta144 - 12.11.2023 23:47

3.2GB memory vs <10mb memory

Ответить

@jajwarehouse1 - 12.11.2023 03:16

The averages you are getting from the materialized view are different from the trips_hyper. This is happening because you are taking the monthly averages of daily averages.

Ответить

@Zeioth - 11.11.2023 06:51

I had a project where we were writing 30gb per day. A mongo cluster was the only option there.

Ответить

@coderentity2079 - 11.11.2023 03:21

Great! Just what my fellow programmer needed.

Ответить

@Ylyrra - 11.11.2023 01:38

Like the idea of the video, but without showing any EXPLAINs to analyse what the database is actually doing, you're providing no information on why it's speeding up, and are missing the fact that part of the huge gain is likely down to the automatic index that is created for the partitioning. This makes your video unfortunately look like amateur hour to any decent developer with database experience, which is a shame because your presentation style, communications skills, and production quality is top-notch.

Ответить

@magfal - 10.11.2023 09:13

For a lot of these usecases columnar is an even better solution.

Hydra's fork of Citus Columnar is my goto for billion row problems.

Ответить

@JimCarnicelli - 10.11.2023 01:06

Excellent introduction. Thank you.

Ответить

@Petoj87 - 09.11.2023 19:49

Wouldn't it be more fair if you at least had a index on the normal table?

Ответить

@user-ye5jt7ro6c - 09.11.2023 17:34

This is great! One point of feedback: instead of writing each parquet file out to disk as a csv, and then COPYing from the csv file, try converting the parquet file to csv and streaming it directly into the STDIN of the COPY command. Forego the reading and writing to disk. I expect this to speed up your load script considerably.

Ответить

@davidka2271 - 09.11.2023 15:04

"blazing fast" hahahahahahahahahahahahahahahahahaha

Ответить

@zeeshan0987 - 08.11.2023 21:18

50 GB table isn't Big. 😂

Ответить

@christian15213 - 08.11.2023 06:05

but isn't this a worlkload DB for analytics not a transactional DB. would be helpful if that was your usecase

Ответить

@myt-mat-mil-mit-met-com-trol - 07.11.2023 09:32

I wonder how timescaledb would work with postGIS extension

Ответить

@cwgabel - 04.11.2023 16:28

Yeah, you are a dirty cheat, but I get it.

Ответить

@necraduq - 03.11.2023 11:42

One of the big caveats, that is NOT MENTIONED in this video is that there is ALWAYS A TRADE-OFF! You're trading off STORAGE for SPEED. Which is and has always been a solution, in the near and distant past in the database world. Techniques like these used to be called denormalized tables + columnar indexes, now they are called HyperTables (I'm sure there are some other extra bits that differentiate it, but not materially different).

And speaking of "material", "Continuous Aggregates" like materialized views have existed starting with Oracle who introduced the feature in the late 1990's. Whenever there's a "new feature" coming out for the modern database that is 2 years old built by the up and coming start-up, it's typically just a rebranded and remarketed term of a concept that has been around for decades in the database management sphere. It's crucial to understand that while these modern implementations or 'rebrandings' might bring about optimizations or a more user-friendly interface, the core idea often traces back to established database engineering principles.

Ответить

@davidbanhos7308 - 08.09.2023 01:01

It is a amazing video! Congratulations! I have a question as a developer - sorry, it could be a silly question. How should be the workflow on the hyper and non-hyper tables? Will be required the code to call the hyper table version programmatically per case bases? Insert/delete/update only on the non-hyper table? Or is that a way to do it transparently?

Ответить

@minhgiangho1410 - 07.09.2023 09:53

I created a hypertable with 32 columns and inserted 30 million rows to test. With every symbol (a column of that table), the query to slow at the first call (more than 10 seconds), and really fast (less than 50 ms) on subsequent calls. In my opinion, the first call is too slow and unacceptable

Ответить

@CristianHeredia0 - 07.09.2023 06:56

This was great. Thanks for sharing. What do you use to make coding slides/video/transitions?

Ответить

@user-qr4jf4tv2x - 03.09.2023 06:01

would like a video on postgres streaming database

Ответить

@c0ldfury - 02.09.2023 15:00

Your voice sounds remarkably familiar. I think it's AI. DAyda DAyda Dayda. Yeah, someone with an English accent would not say Dayda the American way.

Ответить

@dagtveitgmail - 02.09.2023 01:54

Clickhouse runs circles around it if you need some more advanced and heavy querys

Ответить

@vslabs-za - 01.09.2023 22:50

Interesting, thanks. My big data table is more than a few TB and over 90 billion (no typo) rows. But it's in that "dreaded vendor database", Oracle. It's a 2 node Oracle database cluster on 2017 h/w, processing and analysing over a billion WAN IP traffic flows per day (SQL & PL/SQL code). Would love to try out PostgreSQL though. But then there's my other pursuits (like machine learning on IP traffic for anomaly detection)... and not enough hours in a day. This is a great time for s/w engineers. So many fascinating technologies, products, languages, and concepts to explore.

The Golden Age of Software Technologies. 🙂

Ответить

@ducseul - 01.09.2023 19:10

can I get the keywords for setup the desktop environment as video do ? Thanks

Ответить

@jakubhajduga8940 - 01.09.2023 14:47

50 GB big? :o
Try sience data - 200 GB binary file. :p
In any readable dataformat - to big to reasonable storage it.

Ответить

@LeeStoneman - 01.09.2023 12:40

Fascinating insight into large dayda issues.

Ответить

@dragonwave2652 - 01.09.2023 08:06

Just wanted to write to use timescale and you said it😂

Ответить

@cmilkau - 31.08.2023 13:22

bandaid. RDBMS should really be able to handle this case well without additional work, at most some index tweaking.

Ответить

@oscarlacueva - 31.08.2023 11:57

The question is, why would you want to do read queries in a operational DataBase? Usually you load the data useful for analytics/ML/reporting in an analytical DB which is designed to perform read operations

Ответить

@dinoscheidt - 31.08.2023 05:04

Yeah, at our start-up (solar analytics) timescaleDB broke completely down, stopped auto sizing so we needed to migrate very quickly to HBase like tesla uses. Became evident, that MS did not test their product at more than almost big data size data.

Ответить

@rakeshprasad4969 - 30.08.2023 03:00

first of all big fan of your content. regarding opensource, your comment on terraform? they were opensource too, till they are not. could you post some content on that?

Ответить

@daves.software - 29.08.2023 15:17

Did you have an index on the started_at column for the regular table?

Ответить

@richardbennett4365 - 29.08.2023 06:00

How does the window get those baby blue, peach, pink, and green colors in the information bar?

Ответить

@richardbennett4365 - 29.08.2023 05:58

Why is the syntax of these languages written like COBOL or something equally scores old?

Ответить

@richardbennett4365 - 29.08.2023 05:55

About 6 seconds.

Ответить

@aberba - 29.08.2023 05:10

Yet MySQL is the most popular and most used

Ответить

@gunjitmittal3456 - 26.08.2023 23:22

i really like your terminal. Can you/anyone tell/guide me how I can also achieve a similar look?

Ответить

@Kanikkl - 26.08.2023 17:09

Awesome demo! Thank you! This really opens up another technological layer to me. In the past I had no connection to db optimization other than using indexes in mysql. But this is so much more and so useful! This is actually applicable to an use case that has been bugging me for years. Different type of timeseries data that we are currently cashing through a php scheduler in a redis storage. Will share this with the team 🎉

Ответить

@MrAtomUniverse - 26.08.2023 07:54

I almost laughed out loud when you said 50gb is big ~.~ anyway sorry but video is misleading and feels kinda like clickbait, prefer if you wrote timescaledb on title

Ответить