This is the first blog post, the idea is to give you some insights about what Cassiny is and can do for you and also what we plan to do in the future.
Coding Cassiny has been a great and rewarding experience and one of the most interesting (and complex) part was to put different technologies together.
But the hardest part still has to come: it's a long way to the top if you wanna rock 'n' roll.
Cassiny is an open source data science platform where you can run your jupyter notebook instances, your jobs and deploy your machine learning models as APIs.
You can also use Cassiny inside your school or university, wherever you want to run Jupyter notebooks Cassiny can be a good solution.
Cassiny is completely focused on the Python stack of data science, we don't plan to run Spark clusters, we want to make super easy to run Python clusters and make Python a first class citizen when you have to deal with big data.
Which problem is Cassiny trying to solve?
Have you ever dealt with dependencies or trying to easily scale your data infrastructure?
Well, in my case I was dealing more with packages, virtual envs, devops and how to make things work instead of focusing on the
data side of the analysis.
I was probably enjoying this part so the equation was clear to me:
problem + fun = let's go for it!
We also want to simplify interactive computing and to have a better abstraction of the computational resources that you need.
Our hybrid infrastructure
Cassiny managed solution is based on an hybrid infrastructure, we use both cloud instances and bare machines.
We use the cloud because we can easily scale up and down and we are not yet very sure about users' consumes, and we can avoid committing with dedicated machines. But one of our goal was to make easier and cheaper to do data science and while a cloud provider can accomplish the first (make it easier) when they are compared to dedicated or owned servers they are much, much more expensive. Our requirements were:
- It has to be cheap
- We don't need to be distributed around the World (for now)
- Easy to scale
- We want to control our hardware because is a part of our business (we sell machine/minute)
And an hybrid solution meets all these requirements.
We care about Privacy
One of the thing that we took extremely seriously is privacy and concern about your data (and our data).
We decided to keep our servers in Europe, using 2 data centers located in France and Germany. We know that this is just a small part of the puzzle, but we liked to start with the right foot.
These are some of the features we are working on/thinking about:
- Easy clustering solution based on Dask/Ray
- Better docs
- Private Docker images registry
- Improving coverage (right know ~60%)
Tell us what you think!
Feedback or advices are more than welcome (firstname.lastname@example.org) :)
With Pythonic cheers!