Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Dot is, at its most simple, an app you chat with on iOS. You can send it words, voice memos, pictures, PDFs, and it’s thrilled to search the web for you, too. Communicating through written text (Dot’s voice is coming next year)" [0].

It looks quite "ambitious"[1]:

- Automated File Management: Dot creates, organizes, and retrieves both structured and unstructured information.

- Adaptive Intelligence: It learns from patterns in your behavior, plus any guidance you decide to share with it

- Internet Browsing: It has access to up-to-date information (and eventually, tools and services)

- Contextual Multimodal Understanding: It interprets text, audio, visuals, and links, informed by the context it already has on you

- Self-Programming: Dot proactively writes and stores routines, anticipating your future needs

- Personalized Display and Retrieval: It transforms information into the most compelling format for each user

- Conceptual Synthesis: It doesn’t just store information — it connects the dots between topics, ideas, and themes in your life

- Theory of Mind: Dot synthesizes a deeper understanding of your motivations and goals, while reflecting on how it can best help you to achieve them.

[0] https://www.fastcompany.com/90975882/meet-dot-an-ai-companio...

[1] https://new.computer/about



I'm a bit perturbed by the, uh, inflation here. Looks like a product manager wishlist for a team of 1000 over 7 years


Most of those bullets are "we are using a LLM with some basic LLM-interfacing techniques"


Not really. Putting this all together is ambitious but individually, they're all things that have been realized to some degree.


Each is possible but having been adjacent to Google projects in this vein for 7 years, and having worked on something similar, lead me to have a substantial discount rate on what I call "magic wand" AI design work.

It would be very odd and transformative indeed to have this all coalesce and work.

But don't listen to me, I'm probably going to go all out on my pitch now too.


1 is retrieval augmented generation.

2 is just fancy talk for what high perfoming LLMs do

3 has been done many times over

4 is achievable with Imagebind if we're going for an exotic solution. Otherwise GPT-4V with AudioToText and TTS will do just fine (Open AI have something similar set up)

5 is as simple as timed prompts sent by the company unbeknownst to the user.

6 is the probably the most bespoke thing here. I'm guessing this is the "information as a quiz" thing they try to demonstrate.

7 is the same as 2

8 is High perfoming LLM with a specific prompt.

I'm not trying to discount your experience but the technology that is making any of this possible is a few years old and the new state of the art version which is far ahead of everything else is ~8 months old so unless you just worked on something like this then I'm not sure it's much indication on what is achievable.

I guess we'll see.


That's exactly what I'm saying - I've just worked on something like this since January. I throttled back from universal file manager and stuff, I think they're kinda hinting at it too through the "probabilistic computing" thing. Staying focused on RAG + as much local compute as possible, user perception of citations is through the roof compared to practically anything else.

The problem I saw over and over again at Google wasn't, like, how do we do this. It's like the "draw the rest of the f*king owl" meme. You can pull some stuff off probabistically or presenting choices, but it gets real nasty real quick when you screw up (the most public version I can think of here is how immensely frustrating it is if a voice speaker mishears your query, especially twice in a row)

Failures being a series of products is something else I feel hurts, ex. getting 95% of the files right x 95% of your appointments x 95% of your sleep schedule, honestly, each individual prompts reliability is top-tier. Yet, 15% of the time it kinda has no idea what you're doing.

But I'm sort of devil's advocating here, to a point. I think you can get there, but boy it's a lot of work and they're 3-4 years away if they have 100 employees and a lot of work, non-trivial on several fronts, even just the GPT billing is a nightmare. I am 99.9% sure I am the only person on earth besides OpenAI employees who know how to bill functions correctly to the token (to be fair, getting within ~4 tokens is publicly available, just not from OpenAI, incredible reverse engineering work by a few people on a forum thread)

And then I guess there's a smidge of "if you're chasing all these things simultaneously you might not grok it" on my end. You're far better off making a slick interaction for moving files between buckets and creating buckets, than promising you can reverse-engineer from scratch, all the right buckets, with all the right files in them.


> incredible reverse engineering work by a few people on a forum thread

which thread please? sounds like some fascinating stuff to learn here


Or a VC pitch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: