Developing sleep app using Flutter, LLAMA 3.3 and Piper TTS
In a search of what could help me fall asleep more easily I have came across a dozen of different apps but one that hit the spot was MySleepButton.
Simply put, the app speaks the words with a short delay in between and all you have to do is to imagine them as they come. There is science behind it, which works for me.
However, the android app was not stable and crashing for a long time.
So I decided to combine opensource tools to build myself a new one and make it’s source available.
App development
We need to make a list of features to develop and we always start with some proof of concept. It can be called a demo or a prototype, but what it means it’s not meant for production.
I prefer to use Kanban style board following Agile and Devops practices to write down — requirements, design, architecture, development, testing, deployment and operations tasks.
No matter if one is indie developer, one time contractor or just an employee, we still end up going through similar process.
Proof of concept
In Agile terms here I’m both the user and product owner so I have idea of what I want. But only once we start to use the product we will see what functionality is really important. We will also identify all underlying technical tasks that I — developer/tester/integrator will do.
In this stage the most important part is to have very close collaboration between a product owner, UX designers and development team.
Even if here it’s just me I have to write down all tasks as I’m going through these stages so that I’m always able to reference back to what’s left.
Most importantly I can prioritize and see what is next thing to work on.
Contrary to some belief, engineers are not threads and cannot develop multiple things in parallel. It is always done concurrently meaning there is expensive context switch when working on multiple tasks — so avoid that.
It’s easy to make a working concept as we can use Flutter with Hot Reload on our work platform. This means it’s possible to see changes in the code reflected on the UI of the app in a few seconds in-between.
During this phase it became clear to me that fancy visual design is not important, as here you actually don’t look at it — idea is to use the app with your eyes closed.
Requirements
After finalizing concept it’s time to write down MVP requirements.
Only 1 language is needed — English — even though it’s not my native it’s my primary language for the past 6 years.
Background should be darker as app is used before sleep so little brightness needed.
List of words should be comfortable and suited for sleeping.
Voice speaking the words should be soft and comfortable.
Easy to choose delay in seconds (default 5) between words can be set.
App should stop playing after configurable time in minutes (default 10).
App should work on Android 14.
Ability to shuffle words to avoid repetitiveness and necessity of frequent updates — nice to have.
Background image that is night and sleep related — nice to have.
It’s easy to fall into a feature creep and scope creep. There are so many ideas we get at this stage it is easy to forget what the target was.
With that in mind I reset to think about MVP scope as I want to start using the app as soon as possible on my phone.
Design
Or in our case a mockup. Driven by idea to have it straight to the point I came with following concept after few iterations.
This is enough so UI development could start as we plan to use Material style. From UI requirements and design we will also quickly figure out what data we need and where we want to get it from.
Architecture
We have two parts here to think about.
Making data pipeline to produce the list of the words using Local LLM and a text to speech service to get audio files from the words.
And there is building application itself with those audio files and required UI functionality which I’ll explain along the way.
In simple terms, we produce the list of audio files and those are packed as part of the app binary during release. It’s important that we can run it when building app to avoid any runtime requirements for the application.
If this seems too early or too much thinking about architecture, keep in mind that system and software architecture is always there, regardless if it was intentionally made or not, so it’s better to already write something down so we don’t go into bad patterns.
Development
In other stories I went more into frontend or backend development.
Here we will start with the service or better said, data preparation.
Backend “service”
For flutter app to play audios, we are going to store them together with the app as part of assets
folder. It is not requirement but I prefer if we do not increase binary size significantly, e.g. not more than 1–2MB for all audio assets for 1 language. For reference compressed release apk
for android has around 18MB without assets.
But first, we have to get a list of 120 nice words. With default delay of 5 seconds, this gives us up to 10 minutes of runtime before the words would start repeating.
We could get a random word list or do this manually but I want something that can give us this quickly and produce new words easily.
The service itself could be written in almost anything, even Dart itself, as we only use Docker and REST API.
But for now we shall do it using shell tools directly that can be part of CI/CD pipeline.
First we need to have LLM running locally for GenAI capabilities. We will do this as in previous article so I’ll skip details — difference is that this time we use latest opensource Lama 3.3 70B model.
We use a simple prompt shown below. Note that prompt engineering is an area on it’s own.
Prompt instruction we send to OpenAI compatible REST API of our LLM to get the results back and with jq
we extract produced words, which are stored into a file, space separated.
curl http://local.lan:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Produce a list of 120 single space separated words that are warm and easy to imagine and could be used to help bring people into sleeping. Use simple English non abstract words that represent objects or things that non native speakers can easily understand. Return content directly."}]
}' \
| jq -r '.choices[0].message.content' \
> words.en
Next is to feed this word list to the Text to Speech (TTS) service to produce audio files.
As I use Home Assistant I’m aware of it’s TTS solution — Piper — and how well it works for more complex commands.
To be able to use it here, we need to select a language and voice model which we do by checking out model voice samples.
I’ve chosen en_US-amy-medium
so the next step is to produce the audio files from the words list.
By default that would be to produce an WAV audio file. And that is great, however, it results in ~40KB per word which is not ideal. Piper can also produce raw audio which we can use with ffmpeg
to transcode it to compressed aac format. This results in ~5kb file which has comparable quality and is supported by all modern devices.
for f in $(cat words.en)
do echo $f | \
docker exec -i piper-en /usr/share/piper/piper \
--model en_US-amy-medium.onnx \
--output_raw | \
ffmpeg -f s16le -ar 22050 -i pipe: -y -c:a libfdk_aac assets/$f.aac
done
Discussion of transcoding options can be found on this github issue.
Fronted app
For implementation details, check the source code.
Flutter makes UI development straightforward. We use standard widgets like Center, Column, Row for alignment, IgnorePointer and AnimatedOpacity for disabling input in a nice way and Switch, Text, Slider, and ElevatedButton to actually allow user to perform some action.
For background we use BoxDecoration and DecorationImage. For styling we use default Material 3 and we force Dark theme as we always go with dark background.
After few development iterations we end up with a working app.
And the last bit was to add periodic audio playback with Flutter audioPlayers package.
Testing
Flutter is great when it comes to all options of testing, covering unit, widget and integration testing.
What manual testing showed me on Android that if app is suspended (e.g. during screen lock after display timeout) the playback can also stop.
This is a known behavior where apps are not kept active in the background to save the battery and both Android and iOS are very aggressive with this behavior.
Back to Development
This brings us to why estimating software development time in advance is not going to give you what you think it will.
Having audio working in background is platform dependent and here we are first targeting Android platform. To get what we want can use Android service which is well described in how to do background playback on android.
MediaSessionService
allows the media session to run separately from the app's activity (source)For Flutter there is background_service that abstracts this up to a point but this is not a real solution for iOS as it’s not really supported. On Android this is also not enough. We should develop notifications so that user is aware there is a service running and can stop it. However that is not the case for fully background service, as there we should ask user to disable battery optimizations for the app — allowing app to run in the background without being killed.
So while I have rewritten the logic to use a separate service, a familiar picture came to my mind.
This is actually another feature creep. Given the usage of the app, there is no need for a background service. When app is open it is used. You shouldn’t start sleeping app and then go and scroll over social media.
If someone kills the app intentionally, the app should stop playing without looking for the service.
The only behavior we want is that when screen turns off and is locked we want to keep the app running. This means we can just ask user to disable battery optimization for our app which gives us wanted behavior.
This is as simple as code below:
var status = await Permission.ignoreBatteryOptimizations.status;
if (status != PermissionStatus.granted) {
status = await Permission.ignoreBatteryOptimizations.request();
} ...
Which results in a dialog such as:
And our MVP is ready for publishing to internal testers — me.
Next steps
While we achieved what I wanted and it serves me without issues, we are far from finished product that I and others could use worry free. Both technically and user facing.
But we cannot do them all of them at the same time so again we need to look at priority of what should be picked up next.
Documentation, QA and Automated testing, Infrastructure, CI/CD pipeline, Monitoring
If you worked in a Startup you know how important all of them really are.
The better we do this the less work we have once we release the app to production. Even now I had to be connecting manually to my server to update the backend script which is a no-go for production app.
Product Features
Better quality audio
Some words are not clearly pronounced, we should always have clean list of words either by using a different model or different words — or both.
New words
I don’t mind repeated words, it’s a feature of it’s own but adding new words should be easy and straightforward.
Multi-language support
English gets the job done but to share this with more people around me I need to support more languages.
Notifications if background usage is disabled
This can be easy to miss if one dismissed popup this quickly so it’s good to show some kind of popup that battery optimization is disabled.
Keeping settings
There is not a lot to configure at this moment but if you don’t like default timeout values or shuffle, you have to change this every time when you open the app.
Logo and icons
We need proper application branding before it can be pushed to Play Store.
Full Android Support
Currently it works on Android 14 but we need to support most used Android platforms if we are to release this to wider audience.
iOS support
Obviously iOS is quite used and some members in my close circle have iPhones, which means I need to add iOS platform support.
And the list goes on. And this is just me being a real user! Whew…
Check the source code but keep in mind no warranty is given on the app — it may eat your hamster.
Happy holidays!
Not a word in the article has been written by LLM/AI ;)