Meta introduces new AI Segment Anything Model 2 (SAM 2)



 Meta Blog:

Today, we’re announcing the Meta Segment Anything Model 2 (SAM 2), the next generation of the Meta Segment Anything Model, now supporting object segmentation in videos and images. We’re releasing SAM 2 under an Apache 2.0 license, so anyone can use it to build their own experiences. We’re also sharing SA-V, the dataset we used to build SAM 2 under a CC BY 4.0 license and releasing a web-based demo experience where everyone can try a version of our model in action.

Object segmentation—identifying the pixels in an image that correspond to an object of interest—is a fundamental task in the field of computer vision. The Meta Segment Anything Model (SAM) released last year introduced a foundation model for this task on images.

Our latest model, SAM 2, is the first unified model for real-time, promptable object segmentation in images and videos, enabling a step-change in the video segmentation experience and seamless use across image and video applications. SAM 2 exceeds previous capabilities in image segmentation accuracy and achieves better video segmentation performance than existing work, while requiring three times less interaction time. SAM 2 can also segment any object in any video or image (commonly described as zero-shot generalization), which means that it can be applied to previously unseen visual content without custom adaptation.

Before SAM was released, creating an accurate object segmentation model for specific image tasks required highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data. SAM revolutionized this space, enabling application to a wide variety of real-world image segmentation and out-of-the-box use cases via prompting techniques—similar to how large language models can perform a range of tasks without requiring custom data or expensive adaptations.

In the year since we launched SAM, the model has made a tremendous impact across disciplines. It has inspired new AI-enabled experiences in Meta’s family of apps, such as Backdrop and Cutouts on Instagram, and catalyzed diverse applications in science, medicine, and numerous other industries. Many of the largest data annotation platforms have integrated SAM as the default tool for object segmentation annotation in images, saving millions of hours of human annotation time. SAM has also been used in marine science to segment Sonar images and analyze coral reefs, in satellite imagery analysis for disaster relief, and in the medical field, segmenting cellular images and aiding in detecting skin cancer.

As Mark Zuckerberg noted in an open letter last week, open source AI “has more potential than any other modern technology to increase human productivity, creativity, and quality of life,” all while accelerating economic growth and advancing groundbreaking medical and scientific research. We’ve been tremendously impressed by the progress the AI community has made using SAM, and we envisage SAM 2 unlocking even more exciting possibilities.



SAM 2 can be applied out of the box to a diverse range of real-world use cases—for example, tracking objects to create video effects (left) or segmenting moving cells in videos captured from a microscope to aid in scientific research (right).

In keeping with our open science approach, we’re sharing our research on SAM 2 with the community so they can explore new capabilities and use cases. The artifacts we’re sharing today include:
  • The SAM 2 code and weights, which are being open sourced under a permissive Apache 2.0 license. We’re sharing our SAM 2 evaluation code under a BSD-3 license.
  • The SA-V dataset, which has 4.5 times more videos and 53 times more annotations than the existing largest video segmentation dataset. This release includes ~51k real-world videos with more than 600k masklets. We’re sharing SA-V under a CC BY 4.0 license.
  • A web demo, which enables real-time interactive segmentation of short videos and applies video effects on the model predictions.
As a unified model, SAM 2 can power use cases seamlessly across image and video data and be extended to previously unseen visual domains. For the AI research community and others, SAM 2 could be a component as part of a larger AI system for a more general multimodal understanding of the world. In industry, it could enable faster annotation tools for visual data to train the next generation of computer vision systems, such as those used in autonomous vehicles. SAM 2’s fast inference capabilities could inspire new ways of selecting and interacting with objects in real time or live video. For content creators, SAM 2 could enable creative applications in video editing and add controllability to generative video models. SAM 2 could also be used to aid research in science and medicine—for example, tracking endangered animals in drone footage or localizing regions in a laparoscopic camera feed during a medical procedure. We believe the possibilities are broad, and we’re excited to share this technology with the AI community to see what they build and learn.


 Read more:

 
Why does this all feel like gratutious self back slapping. I read it several times and I concluded I should get more ink for my quill pen.
 

My Computer

System One

  • OS
    Windows 11 Pro + Win11 Canary VM.
    Computer type
    Laptop
    Manufacturer/Model
    ASUS Zenbook 14
    CPU
    I9 13th gen i9-13900H 2.60 GHZ
    Motherboard
    Yep, Laptop has one.
    Memory
    16 GB soldered
    Graphics Card(s)
    Integrated Intel Iris XE
    Sound Card
    Realtek built in
    Monitor(s) Displays
    laptop OLED screen
    Screen Resolution
    2880x1800 touchscreen
    Hard Drives
    1 TB NVME SSD (only weakness is only one slot)
    PSU
    Internal + 65W thunderbolt USB4 charger
    Case
    Yep, got one
    Cooling
    Stella Artois (UK pint cans - 568 ml) - extra cost.
    Keyboard
    Built in UK keybd
    Mouse
    Bluetooth , wireless dongled, wired
    Internet Speed
    900 mbs (ethernet), wifi 6 typical 350-450 mb/s both up and down
    Browser
    Edge
    Antivirus
    Defender
    Other Info
    TPM 2.0, 2xUSB4 thunderbolt, 1xUsb3 (usb a), 1xUsb-c, hdmi out, 3.5 mm audio out/in combo, ASUS backlit trackpad (inc. switchable number pad)

    Macrium Reflect Home V8
    Office 365 Family (6 users each 1TB onedrive space)
    Hyper-V (a vm runs almost as fast as my older laptop)

Latest Support Threads

Back
Top Bottom