Sunday, November 29, 2020

Why I am starting to believe that time is an illusion

We have refined our understanding of gravity from being a force of nature that attracts 2 masses, to gravity being merely the curvature of space-time to gravity being interpreted as time slowing down in space.

The other important ongoing discovery has to do with dark matter and dark energy which constitutes around 95% of our observable universe. Commensurate with evolving insights of dark matter and energy, dark energy at least seems to be the omnipresent energy in the vacuum of space, spread out over such large spaces that its detection is faint due to its density being very less.

I imagine, that dark matter may also on similar lines differ from ordinary matter, in that, it is much less dense and spread out over the vast expanse of inter galactic spaces. The current tug of war between our expanding universe v/s gravity, could then be seen as a war of equilibrium between normal matter/energy vs dark matter/energy, and clearly normal matter is getting rarefied, on the overall scale of the universe.

Next, lets for a moment assume that the nature of reality(matter and energy), truly is made up of E8 crystals or any other unit of reality, for that matter. So ordinary matter could be nothing but "densely packed E8 crystals" and dark matter could be "very rarefied" E8 crystals. 

Time as we know it, is perceived by us, as beings made up of matter. Our perception of time primarily is from the point of view of how it appears in the "ordinary matter" portion of the universe. In the vacuum of space and in the dark matter part of the universe, is time just as relevant?

The perception of time is linked to causality, as experienced by us in the macro universe. 

Time in the macro universe might seem ubiquitous but at sub atomic and quantum scales, where wave nature, duality, uncertainty and randomness are the norm, does the linear flow of time, as in the macro universe even hold relevance?

Time needs a frame of reference and an observer, without these, being clear to us, over the expanse of the universe and dark matter, is time a relevant variable for study, of particle physics itself?

Now can you blame me for tending to think, that time might be an illusion of our perception of the macro universe.


-GG

Friday, August 21, 2020

Throttling Kafka Consumption - Dispelling Myths

Kafka as a messaging broker was designed for high throughput publish and subscribe. At times this very fact can prove to be an issue, when being used in more traditionally designed transactional systems. For example if an application is consuming messages from a kafka topic and inserting, updating in a relational database, the high rate of consumption of kafka messages, can put pressure on transactional persistence in the RDBMS and cause exceptions. Hence there arises a need to throttle message consumption from a kafka topic.

Starting out, lets assume, that we are only considering the message consumption inside a single instance of a consumer (which may be a part of a bigger consumer group). So we have a single consumer instance, picking up kafka messages from a single partition of a topic.

Most kafka client frameworks, will implement a decent kafka consumer poll-loop, for reading kafka messages from the partition as soon as possible and then hand these messages over to client application, as fast as possible, but without waiting/blocking for the client application to process each message.

For single thread technologies like node js, the above can be an issue,in the sense that, the result is that the application starts processing large number of messages in parallel. Many will resort to kafka configuration parameters like below, to control the rate of consumption of messages from kafka, with very little success, for the reasons I mention below:

fetch.min.bytes - This property allows a consumer to specify the minimum amount of data that it wants to receive from the broker when fetching records. If a broker receives a request for records from a consumer but the new records amount to fewer bytes than fetch.min.bytes, the broker will wait until more messages are available before sending the records back to the consumer. 

fetch.max.wait.ms - By setting fetch.min.bytes, you tell Kafka to wait until it has enough data to send before responding to the consumer.If you set fetch.max.wait.ms to 100 ms and fetch.min.bytes to 1 MB, Kafka will receive a fetch request from the consumer and will respond with data either when it has 1 MB of data to return or after 100 ms, whichever happens first.

max.partition.fetch.bytes - This property controls the maximum number of bytes the server will return per partition.

session.timeout.ms - The amount of time a consumer can be out of contact with the brokers while still considered alive defaults to 10 seconds.

max.poll.records - This controls the maximum number of records that a single call to poll() will return.

receive.buffer.bytes and send.buffer.bytes - These are the sizes of the TCP send and receive buffers used by the sockets when writing and reading data.

If we manipulate above values, messages will be fetched more frequently OR less frequently. But if  already 1000s of messages are lying in the topic, it wont matter a lot, given that fetch from topic to consumer is very quick(micro seconds). The consumer will still feel a steady supply of almost parallel messages.

Even playing around with commit offsets, will never block consumption of messages but only influence recovery, guaranteed delivery, once only delivery, etc. This will not be able to restrict super-fast, reading of messages by the consumer. (Its not like late acknowledgements, block, consumption of messages in traditional message brokers like JMS, MQ, etc)

The bottom line is that, the challenge of parallel consumption of messages in a kafka instance consumer, needs to be handled by the process/application and not by kafka configurations.

Consumer Side Technology

A much more important factor while dealing with throttling of kafka messages, is which tech/architecture you use, on the consumer side application.

Using Java / SpringBoot based consumer, will ensure that the consumer process runtime will be multi-thread-based and will be synchronous first or synchronous by default. The use of thread-pools in Java/Spring and given the fact that almost all calls in Java are synchronous by default, it is much easier to control the kafka message consumption in Java

Node JS based consumers are inherently single threaded an hence asynchronous first or asynchronous by default, using await, promises and callbacks in a way, that near-parallel consumption of kafka messages will be challenge, especially for less-experienced developers.