Résumé | 77% of adults over 50 want to age in place today, presenting a major challenge of ensuring adequate nutritional intake. Recent advancements in machine learning and computer vision show promise of automated tracking methods, but require a large high-quality dataset to have accurate performance. Existing datasets comprise of 2D images with discretely sampled camera views, unrepresentative of the different angles and quality taken by older individuals. By leveraging view synthesis for 3D models, an infinite number of 2D images can be generated for any given viewpoint/camera angle. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce Foodverse, a large-scale high-quality high-resolution multimodal dataset of 52 3D food models, in conjunction with their associated weight, food name, language description, and nutritional value. We also demonstrate 2D view synthesis using these 3D food models. |
---|