Комментарии:
Is this time scale setup useful for a transactional database such as for a booking system like uber ? Could we structure the data search that every minute of the day is regarded as a bookable resource for every operator and then search through that time series data?
Ответитьmy biggest Postgres database is 5 million company records at 8 GB, reaching 50 GB seems impossible
ОтветитьHonestly, I found ClickHouse to be far more performant for our my case (time series analytics). The first time I used ClickHouse it felt like magic.
Ответитьno explanation of the underlying technology 👎
ОтветитьIt's a weakness, but postgresql never said they would be good at everything. It's ok to choose different solutions for different problemas. If you are working with time series it does make sense to optimize and choose a better match for this use case. But it's not PostgreSQL fault
ОтветитьActually, technically Postgres natively supports time scale data. Postgresql does not. Because the feature was in the Postgres Berkeley code but removed by the Postgresql team.
Ответить3.2GB memory vs <10mb memory
ОтветитьThe averages you are getting from the materialized view are different from the trips_hyper. This is happening because you are taking the monthly averages of daily averages.
ОтветитьI had a project where we were writing 30gb per day. A mongo cluster was the only option there.
ОтветитьGreat! Just what my fellow programmer needed.
ОтветитьLike the idea of the video, but without showing any EXPLAINs to analyse what the database is actually doing, you're providing no information on why it's speeding up, and are missing the fact that part of the huge gain is likely down to the automatic index that is created for the partitioning. This makes your video unfortunately look like amateur hour to any decent developer with database experience, which is a shame because your presentation style, communications skills, and production quality is top-notch.
ОтветитьFor a lot of these usecases columnar is an even better solution.
Hydra's fork of Citus Columnar is my goto for billion row problems.
Excellent introduction. Thank you.
ОтветитьWouldn't it be more fair if you at least had a index on the normal table?
ОтветитьThis is great! One point of feedback: instead of writing each parquet file out to disk as a csv, and then COPYing from the csv file, try converting the parquet file to csv and streaming it directly into the STDIN of the COPY command. Forego the reading and writing to disk. I expect this to speed up your load script considerably.
Ответить"blazing fast" hahahahahahahahahahahahahahahahahaha
Ответить50 GB table isn't Big. 😂
Ответитьbut isn't this a worlkload DB for analytics not a transactional DB. would be helpful if that was your usecase
ОтветитьI wonder how timescaledb would work with postGIS extension
ОтветитьYeah, you are a dirty cheat, but I get it.
ОтветитьOne of the big caveats, that is NOT MENTIONED in this video is that there is ALWAYS A TRADE-OFF! You're trading off STORAGE for SPEED. Which is and has always been a solution, in the near and distant past in the database world. Techniques like these used to be called denormalized tables + columnar indexes, now they are called HyperTables (I'm sure there are some other extra bits that differentiate it, but not materially different).
And speaking of "material", "Continuous Aggregates" like materialized views have existed starting with Oracle who introduced the feature in the late 1990's. Whenever there's a "new feature" coming out for the modern database that is 2 years old built by the up and coming start-up, it's typically just a rebranded and remarketed term of a concept that has been around for decades in the database management sphere. It's crucial to understand that while these modern implementations or 'rebrandings' might bring about optimizations or a more user-friendly interface, the core idea often traces back to established database engineering principles.
It is a amazing video! Congratulations! I have a question as a developer - sorry, it could be a silly question. How should be the workflow on the hyper and non-hyper tables? Will be required the code to call the hyper table version programmatically per case bases? Insert/delete/update only on the non-hyper table? Or is that a way to do it transparently?
ОтветитьI created a hypertable with 32 columns and inserted 30 million rows to test. With every symbol (a column of that table), the query to slow at the first call (more than 10 seconds), and really fast (less than 50 ms) on subsequent calls. In my opinion, the first call is too slow and unacceptable
ОтветитьThis was great. Thanks for sharing. What do you use to make coding slides/video/transitions?
Ответитьwould like a video on postgres streaming database
ОтветитьYour voice sounds remarkably familiar. I think it's AI. DAyda DAyda Dayda. Yeah, someone with an English accent would not say Dayda the American way.
ОтветитьClickhouse runs circles around it if you need some more advanced and heavy querys
ОтветитьInteresting, thanks. My big data table is more than a few TB and over 90 billion (no typo) rows. But it's in that "dreaded vendor database", Oracle. It's a 2 node Oracle database cluster on 2017 h/w, processing and analysing over a billion WAN IP traffic flows per day (SQL & PL/SQL code). Would love to try out PostgreSQL though. But then there's my other pursuits (like machine learning on IP traffic for anomaly detection)... and not enough hours in a day. This is a great time for s/w engineers. So many fascinating technologies, products, languages, and concepts to explore.
The Golden Age of Software Technologies. 🙂
can I get the keywords for setup the desktop environment as video do ? Thanks
Ответить50 GB big? :o
Try sience data - 200 GB binary file. :p
In any readable dataformat - to big to reasonable storage it.
Fascinating insight into large dayda issues.
ОтветитьJust wanted to write to use timescale and you said it😂
Ответитьbandaid. RDBMS should really be able to handle this case well without additional work, at most some index tweaking.
ОтветитьThe question is, why would you want to do read queries in a operational DataBase? Usually you load the data useful for analytics/ML/reporting in an analytical DB which is designed to perform read operations
ОтветитьYeah, at our start-up (solar analytics) timescaleDB broke completely down, stopped auto sizing so we needed to migrate very quickly to HBase like tesla uses. Became evident, that MS did not test their product at more than almost big data size data.
Ответитьfirst of all big fan of your content. regarding opensource, your comment on terraform? they were opensource too, till they are not. could you post some content on that?
ОтветитьDid you have an index on the started_at column for the regular table?
ОтветитьHow does the window get those baby blue, peach, pink, and green colors in the information bar?
ОтветитьWhy is the syntax of these languages written like COBOL or something equally scores old?
ОтветитьAbout 6 seconds.
ОтветитьYet MySQL is the most popular and most used
Ответитьi really like your terminal. Can you/anyone tell/guide me how I can also achieve a similar look?
ОтветитьAwesome demo! Thank you! This really opens up another technological layer to me. In the past I had no connection to db optimization other than using indexes in mysql. But this is so much more and so useful! This is actually applicable to an use case that has been bugging me for years. Different type of timeseries data that we are currently cashing through a php scheduler in a redis storage. Will share this with the team 🎉
ОтветитьI almost laughed out loud when you said 50gb is big ~.~ anyway sorry but video is misleading and feels kinda like clickbait, prefer if you wrote timescaledb on title
ОтветитьNvchad for java, pls 🙂
ОтветитьI have lite 200Gb of data but it's not related to time in any sense (well maybe dates), what would be the best solution there?
ОтветитьExcellent introduction to TimescaleDB. Would very much like to see more features and capabilities.
Ответить