Imagine you're an artist, and you've been given the task of creating word art. Your goal is to visually represent each word in a sentence uniquely, highlighting its essence and meaning. Now, each word is like a paint color on your palette, and your sentence is the canvas.
In this analogy, a vector is like a unique recipe that defines the proportions and combinations of each paint color for every word. These recipes ensure that each word's representation is distinct and captures its significance in the sentence.
In the digital realm, turning words into vectors involves assigning each word a set of numbers based on its context and meaning. Think of these numbers as ingredients in our recipe for word art.
For example, the word "happy" might be represented by a vector recipe like [0.8, 0.5, -0.2], while "sad" could have a different recipe like [-0.6, 0.3, -0.9]. Each recipe encapsulates the distinct essence of the word.
In real-life applications, vectors generally consist of a list of a few hundred such numbers. This is defined by the vector’s dimension. The vector for “happy” as shown above has 3 dimensions.
You might be wondering why you would ever want to use vectors. Let’s imagine a world where vectors have just 2 dimensions. So each vector is basically a point on a 2D graph. When you plot the vectors for a few different sentences, you get something like this:
From this, you can visually see which sentences relate to each other more closely! This has huge implications in search, where you can find things you need without needing to explicitly spell something. If you search for “ocean”, you will also get results which contain “sea” but have no mention of the word “ocean”. This works because “sea” and “ocean” have semantically the same meaning.
The process of finding how similar two vectors are, is somewhat computationally intensive. We can use cosine similarity, Euclidean distance, etc. In production when you have a ton of vectors and each vector has a large dimension (OpenAI’s vector embeddings have 1536 dimensions), searching can take up some time.
citrus stores the vectors more efficiently so search takes up less time. It removes all the headache that comes with storing vectors. developers sign up, store and search across vectors using simple API calls. It’s open-source, so you can self-host it and even contribute to it!
Sign up now.