Search your image collection using text descriptions or find visually similar images using CLIP embeddings.
Prerequisites
- Antfly running with Termite and ONNX Runtime (CLIP requires ONNX)
- CLIP model:
antflycli termite pull openai/clip-vit-base-patch32
Step 1: Create the Table
Create a table with a CLIP embeddings index. The template combines the image URL and caption for multimodal embedding:
1 // Step 2: Create the table with embeddings index
2 fmt.Println("Creating table 'images' with CLIP embeddings index...")
3
4 // Build the embedder config (union type)
5 var embedderConfig oapi.EmbedderConfig
6 embedderConfig.Provider = oapi.EmbedderProviderTermite
7 embedderConfig.FromTermiteEmbedderConfig(oapi.TermiteEmbedderConfig{
8 Model: "openai/clip-vit-base-patch32",
9 })
10
11 // Build the index config (union type)
12 var indexConfig oapi.IndexConfig
13 indexConfig.Name = "embeddings"
14 indexConfig.Type = oapi.IndexTypeEmbeddings
15 indexConfig.FromEmbeddingsIndexConfig(oapi.EmbeddingsIndexConfig{
16 Dimension: 512,
17 Template: "{{media url=image_url}}{{caption}}",
18 Embedder: embedderConfig,
19 })
20
21 err = client.CreateTable(ctx, "images", antfly.CreateTableRequest{
22 Indexes: map[string]oapi.IndexConfig{
23 "embeddings": indexConfig,
24 },
25 })Note: ONNX Runtime is experimental. If you encounter issues like "model not found" errors, empty results, or embeddings not being computed, try restarting Antfly. Check
antfly.logfor errors if problems persist.
Step 2: Add a Sample Image
Let's add the famous Utah teapot:
1 // Step 3: Add a sample image (Utah teapot)
2 fmt.Println("\nAdding Utah teapot sample image...")
3 _, err = client.Batch(ctx, "images", antfly.BatchRequest{
4 Inserts: map[string]any{
5 "utah_teapot": map[string]any{
6 "caption": "Utah teapot",
7 "image_url": "https://upload.wikimedia.org/wikipedia/commons/e/e7/Utah_teapot_simple_2.png",
8 },
9 },
10 })
11 if err != nil {
12 log.Printf("Warning: Failed to add teapot: %v", err)
13 } else {
14 fmt.Println("Added Utah teapot")
15 }Antfly fetches and embeds the image automatically when using a URL.
Step 3: Search with Text
1 // Step 4: Search with text
2 fmt.Println("\nSearching for '3D model teapot'...")
3 results, err := client.Query(ctx, antfly.QueryRequest{
4 Table: "images",
5 SemanticSearch: "3D model teapot",
6 Indexes: []string{"embeddings"},
7 Limit: 5,
8 })
9 if err != nil {
10 log.Fatalf("Query failed: %v", err)
11 }
12
13 fmt.Println("\nSearch results:")
14 for _, resp := range results.Responses {
15 for _, hit := range resp.Hits.Hits {
16 fmt.Printf(" Score: %.4f, ID: %s\n", hit.Score, hit.ID)
17 }
18 }The Utah teapot should appear as the top result:
Score: 0.0164, ID: utah_teapot
Score: 0.0161, ID: mmir_3bc4b3613ed9
Score: 0.0159, ID: mmir_83ca037bd2ad
...Batch Import with Timing
For larger datasets, here's how to import images in bulk. This example uses the MMIR dataset from Google Research:
1func batchImport(ctx context.Context, client *antfly.AntflyClient) {
2 numImages := 100
3 fmt.Printf("Importing first %d images...\n", numImages)
4 startTime := time.Now()
5 successCount := 0
6
7 f, err := os.Open("mmir_dataset.tsv.gz")
8 if err != nil {
9 log.Printf("Failed to open dataset: %v", err)
10 return
11 }
12 defer f.Close()
13
14 gz, err := gzip.NewReader(f)
15 if err != nil {
16 log.Printf("Failed to create gzip reader: %v", err)
17 return
18 }
19 defer gz.Close()
20
21 reader := csv.NewReader(gz)
22 reader.Comma = '\t'
23 reader.FieldsPerRecord = -1
24 reader.Read() // Skip header
25Example output:
Imported: 100 / 100
Imported 100 images in 45.2s (2.2 images/sec)Running the Example
# From the repository root
go run ./examples/image-search
# Or build and run
go build -o examples/image-search/image-search ./examples/image-search
./examples/image-search/image-searchTo run the batch import, first download the MMIR dataset:
curl -o mmir_dataset.tsv.gz "https://storage.googleapis.com/gresearch/wit-retrieval/mmir_dataset_train-00000-of-00005.tsv.gz"Tips
Use visual descriptions: CLIP responds better to concrete visual concepts ("red sports car", "snowy mountain") than brand names or abstract terms.
Captions affect results: The template combines image + caption. For pure visual search, use "template": "{{media url=image_url}}" instead.
Related
- Multimodal Guide - PDFs, audio, and remote content
- Termite Models - Available CLIP variants