[Mllab] Questions about the Rhine level dataset
Good morning, I have been working for the past days with the Rhine level dataset and, unfortunately, I have gotten relatively bad results. I have tried several methods for predictions, based on DNN, such as standard (Dense layers) models, convolutional models over the data of each station, locally connected model (in the same sense that the convolutional ones) and LSTM (which has a surprisingly large margin of error, i.e. 9 or 10% after some 150 training epochs). So, I am unsure if DNNs are really a good option for the task. Moreover, I have some issues with the size of the data. My approach has been so far dividing the data into large but manageable chunks (of, say, 50 000 points), and iterating few epochs (say, 5) over each of them. However, although in my imagination this should work more or less fine (50 000 points is several months, so it should be fine in order to predict some 12 hours), the actual situation is that after the first or second epoch the validation loss stops decreasing, and stays at the same level for however long I keep the model training. In raw numbers, this is a mean square error of around 65 when the data is not normalized, and similar values for the normalized case. In practical terms this means that when plotted, the prediction is usually worse than just predicting the current value. So, I don't have many ideas left. I tried normalizing the "MinMax" way instead of X - μ / σ, but the results didn't seem too good. I have also tried several activations without much success (only linear and ReLU work well, the others produce huge errors. Also, with respect to what is suggested in the sheet, I have been unable to find much information on any of them. More concretely, on the approach of wavelets, I would appreciate some sources. My main thought would be decomposing the level as a series of superposed wavelets and taking the most "relevant ones" (i.e. discarding what can be thought of as noise), but I am not sure if this makes much sense theoretically or even in the practical application. So, as a summary, I would appreciate some guiding towards what I should do, or some opinion on whether what I have been doing is just a bad idea or a bad implementation (of course there is a chance that the code is not doing what intended, and so the problem is a the implementation level and not at the more theoretical one). Thank you very much, Olmo
participants (1)
-
Olmo Chiara