Meta introduces new AI Segment Anything Model 2 (SAM 2)



 Meta Blog:

Today, we’re announcing the Meta Segment Anything Model 2 (SAM 2), the next generation of the Meta Segment Anything Model, now supporting object segmentation in videos and images. We’re releasing SAM 2 under an Apache 2.0 license, so anyone can use it to build their own experiences. We’re also sharing SA-V, the dataset we used to build SAM 2 under a CC BY 4.0 license and releasing a web-based demo experience where everyone can try a version of our model in action.

Object segmentation—identifying the pixels in an image that correspond to an object of interest—is a fundamental task in the field of computer vision. The Meta Segment Anything Model (SAM) released last year introduced a foundation model for this task on images.

Our latest model, SAM 2, is the first unified model for real-time, promptable object segmentation in images and videos, enabling a step-change in the video segmentation experience and seamless use across image and video applications. SAM 2 exceeds previous capabilities in image segmentation accuracy and achieves better video segmentation performance than existing work, while requiring three times less interaction time. SAM 2 can also segment any object in any video or image (commonly described as zero-shot generalization), which means that it can be applied to previously unseen visual content without custom adaptation.

Before SAM was released, creating an accurate object segmentation model for specific image tasks required highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data. SAM revolutionized this space, enabling application to a wide variety of real-world image segmentation and out-of-the-box use cases via prompting techniques—similar to how large language models can perform a range of tasks without requiring custom data or expensive adaptations.

In the year since we launched SAM, the model has made a tremendous impact across disciplines. It has inspired new AI-enabled experiences in Meta’s family of apps, such as Backdrop and Cutouts on Instagram, and catalyzed diverse applications in science, medicine, and numerous other industries. Many of the largest data annotation platforms have integrated SAM as the default tool for object segmentation annotation in images, saving millions of hours of human annotation time. SAM has also been used in marine science to segment Sonar images and analyze coral reefs, in satellite imagery analysis for disaster relief, and in the medical field, segmenting cellular images and aiding in detecting skin cancer.

As Mark Zuckerberg noted in an open letter last week, open source AI “has more potential than any other modern technology to increase human productivity, creativity, and quality of life,” all while accelerating economic growth and advancing groundbreaking medical and scientific research. We’ve been tremendously impressed by the progress the AI community has made using SAM, and we envisage SAM 2 unlocking even more exciting possibilities.



SAM 2 can be applied out of the box to a diverse range of real-world use cases—for example, tracking objects to create video effects (left) or segmenting moving cells in videos captured from a microscope to aid in scientific research (right).

In keeping with our open science approach, we’re sharing our research on SAM 2 with the community so they can explore new capabilities and use cases. The artifacts we’re sharing today include:
  • The SAM 2 code and weights, which are being open sourced under a permissive Apache 2.0 license. We’re sharing our SAM 2 evaluation code under a BSD-3 license.
  • The SA-V dataset, which has 4.5 times more videos and 53 times more annotations than the existing largest video segmentation dataset. This release includes ~51k real-world videos with more than 600k masklets. We’re sharing SA-V under a CC BY 4.0 license.
  • A web demo, which enables real-time interactive segmentation of short videos and applies video effects on the model predictions.
As a unified model, SAM 2 can power use cases seamlessly across image and video data and be extended to previously unseen visual domains. For the AI research community and others, SAM 2 could be a component as part of a larger AI system for a more general multimodal understanding of the world. In industry, it could enable faster annotation tools for visual data to train the next generation of computer vision systems, such as those used in autonomous vehicles. SAM 2’s fast inference capabilities could inspire new ways of selecting and interacting with objects in real time or live video. For content creators, SAM 2 could enable creative applications in video editing and add controllability to generative video models. SAM 2 could also be used to aid research in science and medicine—for example, tracking endangered animals in drone footage or localizing regions in a laparoscopic camera feed during a medical procedure. We believe the possibilities are broad, and we’re excited to share this technology with the AI community to see what they build and learn.


 Read more:

 
Why does this all feel like gratutious self back slapping. I read it several times and I concluded I should get more ink for my quill pen.
 

My Computer My Computer

At a glance

Windows 11 Pro + Win11 Canary VM.I9 13th gen i9-13900H 2.60 GHZ16 GB solderedIntegrated Intel Iris XE
OS
Windows 11 Pro + Win11 Canary VM.
Computer type
Laptop
Manufacturer/Model
ASUS Zenbook 14
CPU
I9 13th gen i9-13900H 2.60 GHZ
Motherboard
Yep, Laptop has one.
Memory
16 GB soldered
Graphics Card(s)
Integrated Intel Iris XE
Sound Card
Realtek built in
Monitor(s) Displays
laptop OLED screen
Screen Resolution
2880x1800 touchscreen
Hard Drives
1 TB NVME SSD (only weakness is only one slot)
PSU
Internal + 65W thunderbolt USB4 charger
Case
Yep, got one
Cooling
Stella Artois (UK pint cans - 568 ml) - extra cost.
Keyboard
Built in UK keybd
Mouse
Bluetooth , wireless dongled, wired
Internet Speed
900 mbs (ethernet), wifi 6 typical 350-450 mb/s both up and down
Browser
Edge
Antivirus
Defender
Other Info
TPM 2.0, 2xUSB4 thunderbolt, 1xUsb3 (usb a), 1xUsb-c, hdmi out, 3.5 mm audio out/in combo, ASUS backlit trackpad (inc. switchable number pad)

Macrium Reflect Home V8
Office 365 Family (6 users each 1TB onedrive space)
Hyper-V (a vm runs almost as fast as my older laptop)
Back
Top Bottom