CVPR Technical Program Features Presentations on the Latest AI and Computer Vision Research for Healthcare, Robotics, Virtual Reality, Autonomous Vehicles, and Beyond

From pathology to human avatars, oral papers--top 3% of all papers--reveal advanced research results

LOS ALAMITOS, Calif., May 16, 2024 /PRNewswire/ -- Co-sponsored by the IEEE Computer Society (CS) and the Computer Vision Foundation (CVF), the 2024 Computer Vision and Pattern Recognition (CVPR) Conference is the preeminent event for research and development (R&D) in the hot topic areas of computer vision, artificial intelligence (AI), machine learning (ML), augmented, virtual and mixed reality (AR/VR/MR), deep learning, and related fields. Over the past decade, these areas have seen significant growth, and the emphasis on this sector by the science and engineering community has fueled an increasingly competitive technical program.

This year, the CVPR Program Committee received 11,532 paper submissions--a 26% increase over 2023--but only 2,719 were accepted, resulting in an acceptance rate of just 23.6%. Of those accepted papers, only 3.3% were slotted for oral presentations based on nominations from the area chairs and senior area chairs overseeing the program.

"CVPR is not only the premiere conference in computer vision, but it's also among the highest-impact publication venues in all of science," said David Crandall, Professor of Computer Science at Indiana University, Bloomington, Ind., U.S.A., and CVPR 2024 Program Co-Chair. "Having one's paper accepted to CVPR is already a major achievement, and then having it selected as an oral presentation is a very rare honor that reflects its high quality and potential impact."

Taking place 17-21 June at the Seattle Convention Center in Seattle, Wash., U.S.A., CVPR offers oral presentations that speak to both fundamental and applied research in areas as diverse as healthcare applications, robotics, consumer electronics, autonomous vehicles, and more. Examples include:

    --  Pathology: Transcriptomics-guided Slide Representation Learning in
        Computational Pathology*- Training computer systems for pathology
        requires a multi-modal approach for efficiency and accuracy. New work
        from a multi-disciplinary team at Harvard University (Cambridge, Mass.,
        U.S.A.), the Massachusetts Institute of Technology (MIT; Cambridge,
        Mass., U.S.A.), Emory University (Atlanta, Ga., U.S.A.) and others
        employs modality-specific encoders, and when applied on liver, breast,
        and lung samples from two different species, they demonstrated
        significantly better performance when compared to current baselines.
    --  Robotics: SceneFun3D: Fine-Grained Functionality and Affordance
        Understanding in 3D Scenes - Creating realistic interactions in 3D
        scenes has been troublesome from a technology perspective because it has
        been difficult to manipulate objects in the scene context. Research from
        ETH Zürich (Zürich, Switzerland), Google (Mountainview, Calif.,
        U.S.A.), Technical University of Munich (TUM; Munich, Germany), and
        Microsoft (Redmond, Wash., U.S.A.) has begun bridging that divide by
        creating a large-scale dataset with more than 14.8k highly accurate
        interaction annotations for 710 high-resolution real-world 3D indoor
        scenes. This work, as the paper concludes, has the potential to
        "stimulate advancements in embodied AI, robotics, and realistic
        human-scene interaction modeling."
    --  Virtual Reality: URHand: Universal Relightable Hands - Teams from Codec
        Avatars Lab at Meta (Menlo Park, Calif., U.S.A.) and Nanyang
        Technological University (Singapore) unveil a hand model that
        generalizes to novel viewpoints, poses, identities, and illuminations,
        which enables quick personalization from a phone scan. The resulting
        images make for a more realistic experience of reaching, grabbing, and
        interacting in a virtual environment.
    --  Human Avatars: Semantic Human Mesh Reconstruction with Textures -
        Working to create realistic human models, teams at Nanjing University
        (Nanjing, China) and Texas A&M University (College Station, Texas,
        U.S.A.) designed a method of 3-D human mesh reconstruction that is
        capable of producing high-fidelity and robust semantic renderings that
        outperform state-of-the-art methods. The paper concludes, "This approach
        bridges existing monocular reconstruction work and downstream industrial
        applications, and we believe it can promote the development of human
        avatars."
    --  Text-to-Image Systems: Ranni: Taming Text-to-Image Diffusion for
        Accurate Instruction - Existing text-to-image models can misinterpret
        more difficult prompts, but now, new research from Alibaba Group
        (Hangzhou, Zhejiang, China) and Ant Group (Hangzhou, Zhejiang, China)
        has made strides in addressing that issue via a middleware layer. This
        approach, which they have dubbed Ranni, supports the text-to-image
        generator in better following instructions. As the paper sums up, "Ranni
        shows potential as a flexible chat-based image creation system, where
        any existing diffusion model can be incorporated as the generator for
        interactive generation."
    --  Autonomous Driving: Producing and Leveraging Online Map Uncertainty in
        Trajectory Prediction - To enable autonomous driving, vehicles must be
        pre-trained on the geographic region and potential pitfalls.
        High-definition (HD) maps have become a standard part of a vehicle's
        technology stack, but current approaches to those maps are siloed in
        their programming. Now, work from a research team from the University of
        Toronto (Toronto, Ontario, Canada), Vector Institute (Toronto, Ontario,
        Canada), NVIDIA Research (Santa Clara, Calif., U.S.A.), and Stanford
        University (Palo Alto, Calif., U.S.A.) enhances current methodologies by
        incorporating uncertainty, resulting in up to 50% faster training
        convergence and up to 15% better prediction performance.

"As the field's leading event, CVPR introduces the latest research in all areas of computer vision," said Crandall. "In addition to the oral paper presentations, there will be thousands of posters, dozens of workshops and tutorials, several keynotes and panels, and countless opportunities for learning and networking. You really have to attend the conference to get the full scope of what's next for computer vision and AI technology."

Digital copies of all final technical papers* will be available on the conference website by the week of 10 June to allow attendees to prepare their schedules. To register for CVPR 2024 as a member of the press and/or request more on a specific paper, visit https://cvpr.thecvf.com/Conferences/2024/MediaPass or email media@computer.org. For more information on the conference, visit https://cvpr.thecvf.com/.

*Papers linked in this press release refer to pre-print publications. Final, citable papers will be available just prior to the conference.

About CVPR 2024
The Computer Vision and Pattern Recognition Conference (CVPR) is the preeminent computer vision event for new research in support of artificial intelligence (AI), machine learning (ML), augmented, virtual and mixed reality (AR/VR/MR), deep learning, and much more. Sponsored by the IEEE Computer Society (CS) and the Computer Vision Foundation (CVF), CVPR delivers the important advances in all areas of computer vision and pattern recognition and the various fields and industries they impact. With a first-in-class technical program, including tutorials and workshops, a leading-edge expo, and robust networking opportunities, CVPR, which is annually attended by more than 10,000 scientists and engineers, creates a one-of-a-kind opportunity for networking, recruiting, inspiration, and motivation.

CVPR 2024 takes place 17-21 June at the Seattle Convention Center in Seattle, Wash., U.S.A., and participants may also access sessions virtually. For more information about CVPR 2024, visit cvpr.thecvf.com.

About the Computer Vision Foundation
The Computer Vision Foundation (CVF) is a non-profit organization whose purpose is to foster and support research on all aspects of computer vision. Together with the IEEE Computer Society, it co-sponsors the two largest computer vision conferences, CVPR and the International Conference on Computer Vision (ICCV). Visit thecvf.com for more information.

About the IEEE Computer Society
Engaging computer engineers, scientists, academia, and industry professionals from all areas and levels of computing, the IEEE Computer Society (CS) serves as the world's largest and most established professional organization of its type. IEEE CS sets the standard for the education and engagement that fuels continued global technological advancement. Through conferences, publications, and programs that inspire dialogue, debate, and collaboration, IEEE CS empowers, shapes, and guides the future of not only its 375,000+ community members, but the greater industry, enabling new opportunities to better serve our world. Visit computer.org for more information.

View original content to download multimedia:https://www.prnewswire.com/news-releases/cvpr-technical-program-features-presentations-on-the-latest-ai-and-computer-vision-research-for-healthcare-robotics-virtual-reality-autonomous-vehicles-and-beyond-302147330.html

SOURCE IEEE Computer Society