Search your image collection using text descriptions or find visually similar images using CLIP embeddings.

Prerequisites#

  • Antfly running with Termite and ONNX Runtime (CLIP requires ONNX)
  • CLIP model: antflycli termite pull openai/clip-vit-base-patch32

Step 1: Create the Table#

Create a table with a CLIP embeddings index. The template combines the image URL and caption for multimodal embedding:

48 lines
1	// Step 2: Create the table with embeddings index
2	fmt.Println("Creating table 'images' with CLIP embeddings index...")
3
4	// Build the embedder config (union type)
5	var embedderConfig oapi.EmbedderConfig
6	embedderConfig.Provider = oapi.EmbedderProviderTermite
7	embedderConfig.FromTermiteEmbedderConfig(oapi.TermiteEmbedderConfig{
8		Model: "openai/clip-vit-base-patch32",
9	})
10
11	// Build the index config (union type)
12	var indexConfig oapi.IndexConfig
13	indexConfig.Name = "embeddings"
14	indexConfig.Type = oapi.IndexTypeEmbeddings
15	indexConfig.FromEmbeddingsIndexConfig(oapi.EmbeddingsIndexConfig{
16		Dimension: 512,
17		Template:  "{{media url=image_url}}{{caption}}",
18		Embedder:  embedderConfig,
19	})
20
21	err = client.CreateTable(ctx, "images", antfly.CreateTableRequest{
22		Indexes: map[string]oapi.IndexConfig{
23			"embeddings": indexConfig,
24		},
25	})

Note: ONNX Runtime is experimental. If you encounter issues like "model not found" errors, empty results, or embeddings not being computed, try restarting Antfly. Check antfly.log for errors if problems persist.

Step 2: Add a Sample Image#

Let's add the famous Utah teapot:

15 lines
1	// Step 3: Add a sample image (Utah teapot)
2	fmt.Println("\nAdding Utah teapot sample image...")
3	_, err = client.Batch(ctx, "images", antfly.BatchRequest{
4		Inserts: map[string]any{
5			"utah_teapot": map[string]any{
6				"caption":   "Utah teapot",
7				"image_url": "https://upload.wikimedia.org/wikipedia/commons/e/e7/Utah_teapot_simple_2.png",
8			},
9		},
10	})
11	if err != nil {
12		log.Printf("Warning: Failed to add teapot: %v", err)
13	} else {
14		fmt.Println("Added Utah teapot")
15	}

Antfly fetches and embeds the image automatically when using a URL.

Step 3: Search with Text#

18 lines
1	// Step 4: Search with text
2	fmt.Println("\nSearching for '3D model teapot'...")
3	results, err := client.Query(ctx, antfly.QueryRequest{
4		Table:          "images",
5		SemanticSearch: "3D model teapot",
6		Indexes:        []string{"embeddings"},
7		Limit:          5,
8	})
9	if err != nil {
10		log.Fatalf("Query failed: %v", err)
11	}
12
13	fmt.Println("\nSearch results:")
14	for _, resp := range results.Responses {
15		for _, hit := range resp.Hits.Hits {
16			fmt.Printf("  Score: %.4f, ID: %s\n", hit.Score, hit.ID)
17		}
18	}

The Utah teapot should appear as the top result:

Score: 0.0164, ID: utah_teapot
Score: 0.0161, ID: mmir_3bc4b3613ed9
Score: 0.0159, ID: mmir_83ca037bd2ad
...

Batch Import with Timing#

For larger datasets, here's how to import images in bulk. This example uses the MMIR dataset from Google Research:

65 lines
1func batchImport(ctx context.Context, client *antfly.AntflyClient) {
2	numImages := 100
3	fmt.Printf("Importing first %d images...\n", numImages)
4	startTime := time.Now()
5	successCount := 0
6
7	f, err := os.Open("mmir_dataset.tsv.gz")
8	if err != nil {
9		log.Printf("Failed to open dataset: %v", err)
10		return
11	}
12	defer f.Close()
13
14	gz, err := gzip.NewReader(f)
15	if err != nil {
16		log.Printf("Failed to create gzip reader: %v", err)
17		return
18	}
19	defer gz.Close()
20
21	reader := csv.NewReader(gz)
22	reader.Comma = '\t'
23	reader.FieldsPerRecord = -1
24	reader.Read() // Skip header
25

Example output:

Imported: 100 / 100
Imported 100 images in 45.2s (2.2 images/sec)

Running the Example#

# From the repository root
go run ./examples/image-search

# Or build and run
go build -o examples/image-search/image-search ./examples/image-search
./examples/image-search/image-search

To run the batch import, first download the MMIR dataset:

curl -o mmir_dataset.tsv.gz "https://storage.googleapis.com/gresearch/wit-retrieval/mmir_dataset_train-00000-of-00005.tsv.gz"

Tips#

Use visual descriptions: CLIP responds better to concrete visual concepts ("red sports car", "snowy mountain") than brand names or abstract terms.

Captions affect results: The template combines image + caption. For pure visual search, use "template": "{{media url=image_url}}" instead.