Hi,

falls wir das wirklich dieses Semester schon mit den neuen Daten machen wollen, bitte auch kurz die Hog Features und die neuronalen Netze auf den Daten testen. Ich würde den Zettel spätestens am Donnerstag hochladen wollen, da die Studenten sonst ein bisschen wenig Zeit haben dafür. 

Viele Grüße,
Bastian


-------- Ursprüngliche Nachricht --------
Von: Jannik Schürg <schuerg@ins.uni-bonn.de>
Datum: 12.11.18 19:32 (GMT+01:00)
An: mllab@ins.uni-bonn.de
Betreff: Re: [Mllab] Practical Teaching Course, and tomorrow's tutorial

Hallo,

ich würde schauen, dass ich den Zettel an den neuen Datensatz angepasst bekomme heute, so dass er diese Woche schon mit den neuen Daten hochgeladen werden kann.ns

Viele Grüße,
Jannik

> On 10. Nov 2018, at 08:03, Olmo Chiara <olmo.chiara@gmail.com> wrote:
>
> Good afternoon,
>
> With respect to replicating the Daimler dataset, it is indeed feasible. For instance, in http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/Daimler_Mono_Ped__Detection_Be/daimler_mono_ped__detection_be.html there is a dataset available of over 15000 pedestrians, and images to extract the negatives. It works fine taking for instance the 48x96px images, and extracting samples from the negative images (if this is interesting tell me and I'll send the code as well). But I guess this is the less interesting approach.
>
> With respect to color, after some searches the best option I found was the TUD dataset, available in https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/people-detection-pose-estimation-and-tracking/multi-cue-onboard-pedestrian-detection/. The images are in full colour; I was able to extract over 1300 (clean) samples of pedestrians for training data from TUD-MotionPairs and around the same from TUD-Brussels. Negative data was sampled from the suggested images as well.
>
> While the performance is not stellar (around 85 % at best for the simple PCA classifier) it is also true that the testing data is relatively rough, in the sense that many images have overlaps, or are very low resolution (or, for instance, there is a cyclist appearing a couple of times). Still, it is noteworthy that the training data comes from a completely different source (a fixed camera for training vs a camera attached to a car for testing), which might be a more realistic approach and hence justifies the reduced accuracy.
>
> The tests were done with training data as (1000, 100, 50, 3) for positive and same for negative; the testing data was limited to 500 samples (both could be increased to around 1300). It runs fine (or at list it did for my computer, which might be slightly above average but is nothing special); the small sample size is compensated by the huge dimension (in the PCA decomposition, the sample size becomes the limiting factor now). The size is relatively large (all data is 360 MB as four .npy files, although just 17 MB when tar.xz'd), but it will get worse when they get to Neural Networks probably, and still I'd say it's reasonable to assume at least 2 GB of RAM in almost every computer. The eigenpedestrians look as they would be expected to look:
> <image.png>
> for the first 10,
> <image.png>
> for the next, and
> <image.png>
> after a hundred. The simple classifier gives accuracies
> <image.png>
> (note that one should take a small value for C in the SVM). I do not have the code for the C++ binding though, so I haven't been able to test it.
>
> So to sum up I think it is feasible to work with RGB data. While the results are worse in terms of raw numbers, I'd guess they are also taken over a "harder" dataset (mainly since the testing data is, again, relatively bad). Another possibility would be giving B&W to Master's students and RGB to Bachelor students, in which case tell me and I'll attach the Daimler data.
>
> In the more practical side, I attach the .tar.xz with .npy files and an .ipynb Notebook (it is possible to compile the images again by downloading the dataset, otherwise just load them before the PCA starts). With respect to copyright, as stated in https://www.mpi-inf.mpg.de/fileadmin/inf/d2/wojek/wojek09cvpr.pdf (and in the README file for the downloaded data) the datasets are "publicly available", so there should be no issue aside from citing the article (the same holds for the Daimler dataset) even in terms of redistribution (which I think is a better option, compared to asking the students to download all the stuff and process the dataset by themselves even if they are given the code). Also, I attach a .tar.xz file - I guess there should be some Windows utility to unpack it, otherwise it can be zipped.
>
> Regards,
>
> Olmo
>
>
> El jue., 8 nov. 2018 a las 0:44, Jannik Schürg (<schuerg@ins.uni-bonn.de>) escribió:
> Hi Olmo,
>
> > On 30. Oct 2018, at 06:32, Olmo Chiara <olmo.chiara@gmail.com> wrote:
> >
> > 3. Any update with respect to the pedestrian dataset?
>
> I did talk with Mr. Garcke and Bastian about this. As you might know, the problem with the current pedestrian dataset is, that the specific subset we used has license issues. The original Daimler dataset is freely available, our subset isn’t. For legal reasons we need a freely available dataset.
>
> I would like to ask you to create a new subset/dataset which we can use for the sheet. You can either use the original Daimler datasets and create a new subset with identical parameters (image size, number of images etc), or look into one of the newer pedestrian classification datasets, which might be more interesting. The survey referenced on the sheet mentions a few newer datasets, I think all of them are in color. It is also easy to find pedestrian datasets on Google. You would have to pick a reasonable number of images of reasonable size, run the tasks of the sheet on them and check if everything stills works in reasonable time. You can decide if you want to adapt the tasks for colored images or greyscale them beforehand, and how large the dataset should be. Regarding the HoG features, I think the C++ implementation can already handle colored input, but the binding might need some modification in order to use this.
>
> Kind regards,
> Jannik
>
>
> <Pairs.ipynb><rgb_data.tar.xz>


_______________________________________________
Mllab mailing list
Mllab@ins.uni-bonn.de
https://mail.ins.uni-bonn.de/mailman/listinfo/mllab